Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.
Project description
Process Improvement using Data
A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.
What is this?
process-improve is the companion package to the online textbook
Process Improvement using Data, and powers the
statistical engine behind factori.al. It bundles the
methods practitioners actually reach for on real plant and lab data -
PCA / PLS with proper missing-data handling, designed experiments with a
multi-stage strategy recommender, control charts, and batch-data tooling -
behind an API that is sklearn-compatible where it makes sense and
pandas-native throughout.
Highlights
🧪 Designed Experiments
- Full-factorial, fractional-factorial, and response-surface designs (built on
pyDOE3) - A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
- ANOVA, main-effects plots, linear-model fitting, and response optimization
📊 Latent Variable Methods
- PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
- PLS regression with a fully sklearn-compatible API
- TPLS - PLS for T-shaped data structures
- Diagnostics: Hotelling's T², SPE, score contributions, and ESD-based outlier detection
- Component selection via PRESS / Wold's criterion
- Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models
📈 Process Monitoring
- Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
- Process-capability index
Cpk
🔄 Batch Data Analysis
- DTW-based batch alignment, reference-batch selection, resampling
- 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
- Format conversions between wide, melted, and dict-of-frames batch layouts
📐 Univariate & Robust Regression
- t-tests (paired, independent, plus DataFrame-aware helpers)
- ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
- Robust regression: repeated-median slope and friends, for outlier-resistant fits
🎨 Visualization
- Plotly-backed plots that attach to fitted PCA / PLS models
- A backend-agnostic
ChartSpeclayer with Plotly and ECharts adapters - DOE-specific plots: main-effects, design visualization
Installation
pip install process-improve
Requires Python 3.10 or newer.
Quick start
PCA - Principal Component Analysis
import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler
X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)
pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_) # cumulative R² per component
pca.score_plot() # interactive Plotly plot
PLS - Projection to Latent Structures
from process_improve.multivariate.methods import PLS, MCUVScaler
X_s = MCUVScaler().fit_transform(X)
Y_s = MCUVScaler().fit_transform(Y)
pls = PLS(n_components=3).fit(X_s, Y_s)
result = pls.predict(X_s)
print(result.y_hat, result.spe, result.hotellings_t2)
DOE - multi-stage experimental strategy
from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy
factors = [
Factor(name="Temperature", low=25, high=40, units="degC"),
Factor(name="pH", low=5.0, high=7.5),
Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
factors=factors,
responses=[Response(name="Yield", goal="maximize", units="g/L")],
budget=40,
domain="fermentation",
)
for s in strategy["stages"]:
print(s["stage_number"], s["design_type"], s["estimated_runs"])
Longer, fully-worked versions of each example live in the
Quickstart guide
and the notebooks_examples/ folder.
API design
PCA and PLS follow scikit-learn conventions: fit() returns self, fitted
attributes end with a trailing underscore (scores_, loadings_, spe_,
hotellings_t2_, r2_cumulative_, …), and predict() returns an
sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …).
Inputs are accepted as pandas.DataFrame, and index/column labels are
preserved through fit and transform.
Documentation & learning resources
- API reference & user guide: https://kgdunn.github.io/process-improve/
- Companion textbook: Process Improvement using Data
- Hosted experiment-design tool: factori.al
- Local docs build:
cd docs && make html
Contributing
Bug reports and feature requests are welcome on the issue tracker.
License
MIT - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file process_improve-1.13.6.tar.gz.
File metadata
- Download URL: process_improve-1.13.6.tar.gz
- Upload date:
- Size: 3.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9564fa224944dda6f2d13d077d0fb8c8967206e749da3bc8c91a67fe4b4f5801
|
|
| MD5 |
3437c842171e736a4332e86ddd416e48
|
|
| BLAKE2b-256 |
1a046ca01266531cee9591c73813cb7e076d71c37cb4967a2f306689afc28a2b
|
Provenance
The following attestation bundles were made for process_improve-1.13.6.tar.gz:
Publisher:
publish.yml on kgdunn/process-improve
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
process_improve-1.13.6.tar.gz -
Subject digest:
9564fa224944dda6f2d13d077d0fb8c8967206e749da3bc8c91a67fe4b4f5801 - Sigstore transparency entry: 1434452045
- Sigstore integration time:
-
Permalink:
kgdunn/process-improve@c10dd00869a193e96abb7a175a7dd9d02c62c50e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kgdunn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c10dd00869a193e96abb7a175a7dd9d02c62c50e -
Trigger Event:
push
-
Statement type:
File details
Details for the file process_improve-1.13.6-py3-none-any.whl.
File metadata
- Download URL: process_improve-1.13.6-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25d0899730f32ddb5cdb44e412e7a7a3b00b0643e8414344bf30665621c83a50
|
|
| MD5 |
6547864c27b50c6a21fe4129565f74a6
|
|
| BLAKE2b-256 |
9d6ef9371202de9f2b7de6b073755673947f62459d5739621483aeb42ac77718
|
Provenance
The following attestation bundles were made for process_improve-1.13.6-py3-none-any.whl:
Publisher:
publish.yml on kgdunn/process-improve
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
process_improve-1.13.6-py3-none-any.whl -
Subject digest:
25d0899730f32ddb5cdb44e412e7a7a3b00b0643e8414344bf30665621c83a50 - Sigstore transparency entry: 1434452133
- Sigstore integration time:
-
Permalink:
kgdunn/process-improve@c10dd00869a193e96abb7a175a7dd9d02c62c50e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kgdunn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c10dd00869a193e96abb7a175a7dd9d02c62c50e -
Trigger Event:
push
-
Statement type: