Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.

PyPI version Python versions License CI codecov Docs

What is this?

process-improve is the companion package to the online textbook Process Improvement using Data, and powers the statistical engine behind factori.al. It bundles the methods practitioners actually reach for on real plant and lab data - PCA / PLS with proper missing-data handling, designed experiments with a multi-stage strategy recommender, control charts, and batch-data tooling - behind an API that is sklearn-compatible where it makes sense and pandas-native throughout.

Highlights

🧪 Designed Experiments

  • Full-factorial, fractional-factorial, and response-surface designs (built on pyDOE3)
  • A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
  • ANOVA, main-effects plots, linear-model fitting, and response optimization

📊 Latent Variable Methods

  • PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
  • PLS regression with a fully sklearn-compatible API
  • TPLS - PLS for T-shaped data structures
  • Diagnostics: Hotelling's T², SPE, score contributions, and ESD-based outlier detection
  • Component selection via PRESS / Wold's criterion
  • Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models

📈 Process Monitoring

  • Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
  • Process-capability index Cpk

🔄 Batch Data Analysis

  • DTW-based batch alignment, reference-batch selection, resampling
  • 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
  • Format conversions between wide, melted, and dict-of-frames batch layouts

📐 Univariate & Robust Regression

  • t-tests (paired, independent, plus DataFrame-aware helpers)
  • ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
  • Robust regression: repeated-median slope and friends, for outlier-resistant fits

🎨 Visualization

  • Plotly-backed plots that attach to fitted PCA / PLS models
  • A backend-agnostic ChartSpec layer with Plotly and ECharts adapters
  • DOE-specific plots: main-effects, design visualization

Installation

pip install process-improve

Requires Python 3.10 or newer.

Quick start

PCA - Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)        # cumulative R² per component
pca.score_plot()                  # interactive Plotly plot

PLS - Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

X_s = MCUVScaler().fit_transform(X)
Y_s = MCUVScaler().fit_transform(Y)

pls = PLS(n_components=3).fit(X_s, Y_s)
result = pls.predict(X_s)
print(result.y_hat, result.spe, result.hotellings_t2)

DOE - multi-stage experimental strategy

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])

Longer, fully-worked versions of each example live in the Quickstart guide and the process_improve/notebooks_examples/ folder.

New to designed experiments? The Applied DoE tutorial is an eight-module worked-solution series that mirrors the 12-week DoE short course and shows the same workflow in Python with process-improve end to end - from a 2x2 first design through factorial scaling, fractional factorials, blocking, and 1-D / 2-D response-surface optimization.

API design

PCA and PLS follow scikit-learn conventions: fit() returns self, fitted attributes end with a trailing underscore (scores_, loadings_, spe_, hotellings_t2_, r2_cumulative_, …), and predict() returns an sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …). Inputs are accepted as pandas.DataFrame, and index/column labels are preserved through fit and transform.

Documentation & learning resources

Contributing

Bug reports and feature requests are welcome on the issue tracker.

License

MIT - see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.16.5.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.16.5-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.16.5.tar.gz.

File metadata

  • Download URL: process_improve-1.16.5.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.16.5.tar.gz
Algorithm Hash digest
SHA256 79a73e568c1c09309f79dc6a18f1174ad9f54f210aa7745ea032a1f4144f1067
MD5 edf77d389e29c758d76cd48caab83b76
BLAKE2b-256 1c4c6b1fd02e094eb7d4794209235b0ce408e4eb8e47f84d666b733f067cbbd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.16.5.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.16.5-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.16.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0c67cd34f12a92ab35dea7e4f8177210b811586d8673e5f61a71ddb03a25c008
MD5 c173a27b6e1d467320349231f071e57a
BLAKE2b-256 d123b8f385bed5e8827c4baa8a4977026d002ca6d6953d9f2f0fecce266c4362

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.16.5-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page