Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.
Project description
Process Improvement using Data
A Python package for multivariate data analysis, designed experiments, and process monitoring. Companion to the online textbook Process Improvement using Data. This package also powers the statistical engine behind factori.al.
Installation
pip install process-improve
Quick Start
PCA — Principal Component Analysis
import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler
# Load and scale your data
X = pd.read_csv("your_data.csv", index_col=0)
scaler = MCUVScaler().fit(X)
X_scaled = scaler.transform(X)
# Fit a PCA model
pca = PCA(n_components=3).fit(X_scaled)
# Inspect results
print(pca.scores_) # Score matrix (N x A)
print(pca.loadings_) # Loading matrix (K x A)
print(pca.r2_cumulative_) # Cumulative R² per component
# Detect outliers
outliers = pca.detect_outliers(conf_level=0.95)
# Contribution analysis
contrib = pca.score_contributions(pca.scores_.iloc[0].values)
# Select number of components via cross-validation
result = PCA.select_n_components(X_scaled, max_components=10)
print(result.n_components)
# Built-in plots
pca.score_plot()
pca.spe_plot()
pca.t2_plot()
pca.loading_plot()
PLS — Projection to Latent Structures
from process_improve.multivariate.methods import PLS, MCUVScaler
# Scale X and Y separately
scaler_x = MCUVScaler().fit(X)
scaler_y = MCUVScaler().fit(Y)
# Fit a PLS model
pls = PLS(n_components=3).fit(scaler_x.transform(X), scaler_y.transform(Y))
# Inspect results
print(pls.scores_) # X scores (N x A)
print(pls.beta_coefficients_) # Regression coefficients (K x M)
print(pls.r2_cumulative_) # Cumulative R² for Y
# Predict new observations
result = pls.predict(scaler_x.transform(X_new))
print(result.y_hat) # Predicted Y values
print(result.spe) # SPE for new data
print(result.hotellings_t2) # Hotelling's T² for new data
# Detect outliers and analyze contributions
outliers = pls.detect_outliers(conf_level=0.95)
contrib = pls.score_contributions(pls.scores_.iloc[0].values)
DOE — Experimental Strategy Recommendation
Plan a complete multi-stage experimental program before running any experiments:
from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy
# Define factors for a fermentation optimization
factors = [
Factor(name="Temperature", low=25, high=40, units="degC"),
Factor(name="pH", low=5.0, high=7.5),
Factor(name="Glucose", low=10, high=50, units="g/L"),
Factor(name="Yeast extract", low=1, high=10, units="g/L"),
Factor(name="Agitation", low=100, high=400, units="rpm"),
Factor(name="Aeration", low=0.5, high=2.0, units="vvm"),
Factor(name="Inoculum", low=2, high=10, units="%v/v"),
]
responses = [Response(name="Yield", goal="maximize", units="g/L")]
# Get a complete experimental plan
strategy = recommend_strategy(
factors=factors,
responses=responses,
budget=40,
domain="fermentation",
)
# Inspect the multi-stage strategy
for stage in strategy["stages"]:
print(f"Stage {stage['stage_number']}: {stage['stage_name']}")
print(f" Design: {stage['design_type']}, Runs: {stage['estimated_runs']}")
print(f" Purpose: {stage['purpose']}")
# Review reasoning, risks, and alternatives
print(strategy["budget_allocation"])
print(strategy["reasoning"])
The engine applies ~50 deterministic rules (from Montgomery, NIST, Stat-Ease) to recommend screening, optimization, and confirmation stages — with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other application domains.
Features
- PCA with SVD, NIPALS, and missing data (TSR) algorithms
- PLS regression with sklearn-compatible API
- TPLS (Total PLS) for multi-block data
- Missing data handling via TSR and NIPALS algorithms
- Outlier detection combining Hotelling's T² and SPE with robust ESD test
- Score contributions for variable-level diagnostics
- Cross-validation for component selection (PRESS with Wold's criterion)
- Interactive plots (Plotly) for scores, loadings, SPE, and T²
- Designed experiments — full factorial, fractional factorial, response surface
- DOE strategy recommender — multi-stage experimental planning (screening, optimization, confirmation) with budget-aware allocation and 8 application domains
- Process monitoring — Shewhart, CUSUM, EWMA control charts
- Batch data analysis — alignment, feature extraction, multivariate batch monitoring
API Design
Both PCA and PLS follow sklearn conventions:
- Fitted attributes end with
_(e.g.,scores_,loadings_,spe_) fit()returnsselfpredict()returns aBunchobject with named fieldsscore()is compatible withsklearn.model_selection.cross_val_score- Works with
pandas.DataFrameinputs (preserves index and column names)
Documentation
Full documentation is available at https://kgdunn.github.io/process-improve/.
To build the documentation locally:
cd docs
make html
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file process_improve-1.7.1.tar.gz.
File metadata
- Download URL: process_improve-1.7.1.tar.gz
- Upload date:
- Size: 3.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82d1737a4f1ab7c624c8246139e08b610a3160f58a964d7874d005ec0596660d
|
|
| MD5 |
7c2ae118d7218bbc6e6acef9108e280c
|
|
| BLAKE2b-256 |
190d67320696676f710174ba757e613f3cd18abde8a6c1b3bf57c0c781c7b277
|
Provenance
The following attestation bundles were made for process_improve-1.7.1.tar.gz:
Publisher:
publish.yml on kgdunn/process-improve
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
process_improve-1.7.1.tar.gz -
Subject digest:
82d1737a4f1ab7c624c8246139e08b610a3160f58a964d7874d005ec0596660d - Sigstore transparency entry: 1392749671
- Sigstore integration time:
-
Permalink:
kgdunn/process-improve@f8da23dba4aa8209d9a7f436bd80eb9d3e21f500 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kgdunn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8da23dba4aa8209d9a7f436bd80eb9d3e21f500 -
Trigger Event:
push
-
Statement type:
File details
Details for the file process_improve-1.7.1-py3-none-any.whl.
File metadata
- Download URL: process_improve-1.7.1-py3-none-any.whl
- Upload date:
- Size: 3.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
901957ee187cd56dbe9d65ed0d8f5d10cee930557b69dd5357b51e28ff4f3508
|
|
| MD5 |
15f1484593052d13f7cc66ede91e6251
|
|
| BLAKE2b-256 |
0d89a65f23c870b5a7de0f87caedaf6ab653f213e16f8996bba281cea47bd836
|
Provenance
The following attestation bundles were made for process_improve-1.7.1-py3-none-any.whl:
Publisher:
publish.yml on kgdunn/process-improve
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
process_improve-1.7.1-py3-none-any.whl -
Subject digest:
901957ee187cd56dbe9d65ed0d8f5d10cee930557b69dd5357b51e28ff4f3508 - Sigstore transparency entry: 1392749683
- Sigstore integration time:
-
Permalink:
kgdunn/process-improve@f8da23dba4aa8209d9a7f436bd80eb9d3e21f500 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kgdunn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8da23dba4aa8209d9a7f436bd80eb9d3e21f500 -
Trigger Event:
push
-
Statement type: