Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.

PyPI version Python versions License Downloads Downloads per month CI codecov Docs

What is this?

process-improve is the companion package to the online textbook Process Improvement using Data, and powers the statistical engine behind factori.al. It bundles the methods practitioners actually reach for on real plant and lab data - PCA / PLS with proper missing-data handling, designed experiments with a multi-stage strategy recommender, control charts, and batch-data tooling - behind an API that is sklearn-compatible where it makes sense and pandas-native throughout.

Highlights

🧪 Designed Experiments

  • Full-factorial, fractional-factorial, and response-surface designs (built on pyDOE3)
  • A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
  • ANOVA, main-effects plots, linear-model fitting, and response optimization

📊 Latent Variable Methods

  • PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
  • PLS regression with a fully sklearn-compatible API
  • TPLS - PLS for T-shaped data structures
  • Diagnostics: Hotelling's T², SPE, score contributions, VIP, and ESD-based outlier detection
  • Cross-validation: PRESS / Wold's component selection, PLS RMSECV with validated explained variance, and PLS beta-coefficient error bars
  • Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models

📈 Process Monitoring

  • Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
  • Process-capability index Cpk

🔄 Batch Data Analysis

  • DTW-based batch alignment, reference-batch selection, resampling
  • 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
  • Format conversions between wide, melted, and dict-of-frames batch layouts

📐 Univariate & Robust Regression

  • t-tests (paired, independent, plus DataFrame-aware helpers)
  • ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
  • Robust regression: repeated-median slope and friends, for outlier-resistant fits

🎨 Visualization

  • Plotly-backed plots that attach to fitted PCA / PLS models
  • A backend-agnostic ChartSpec layer with Plotly and ECharts adapters
  • DOE-specific plots: main-effects, design visualization

Installation

pip install process-improve

Requires Python 3.10 or newer.

Quick start

PCA - Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)        # cumulative R² per component
pca.score_plot()                  # interactive Plotly plot

PLS - Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

# Scale X and Y separately
scaler_x = MCUVScaler().fit(X)
scaler_y = MCUVScaler().fit(Y)
X_s, Y_s = scaler_x.transform(X), scaler_y.transform(Y)

# Fit a PLS model
pls = PLS(n_components=3).fit(X_s, Y_s)

# Inspect results
print(pls.scores_)  # X scores (N x A)
print(pls.beta_coefficients_)  # Regression coefficients (K x M)
print(pls.r2_cumulative_)  # Cumulative R² for Y

# Predict new observations
result = pls.predict(scaler_x.transform(X_new))
print(result.y_hat)  # Predicted Y values
print(result.spe)  # SPE for new data
print(result.hotellings_t2)  # Hotelling's T² for new data

# Detect outliers and analyze contributions
outliers = pls.detect_outliers(conf_level=0.95)
contrib = pls.score_contributions(pls.scores_.iloc[0].values)

# Variable importance
print(pls.vip())  # VIP scores per X variable

# Cross-validated component selection
cv_select = PLS.select_n_components(X_s, Y_s, max_components=6)
print(cv_select.n_components)  # Recommended number of components
print(cv_select.rmsecv)        # RMSECV per component count

# Cross-validation with beta coefficient error bars
cv = pls.cross_validate(X_s, Y_s, cv="loo")
print(cv.beta_mean)       # Mean beta across LOO resamples
print(cv.beta_ci_lower)   # Lower 95% CI for each beta
print(cv.beta_ci_upper)   # Upper 95% CI for each beta
print(cv.significant)     # Which betas are significantly != 0
print(cv.q_squared)       # Cross-validated R² (Q²)
print(cv.rmse_cv)         # Cross-validated RMSE per Y variable

DOE - multi-stage experimental strategy

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])

Longer, fully-worked versions of each example live in the Quickstart guide and the process_improve/notebooks_examples/ folder.

New to designed experiments? The Applied DoE tutorial is an eight-module worked-solution series that mirrors the 12-week DoE short course and shows the same workflow in Python with process-improve end to end - from a 2x2 first design through factorial scaling, fractional factorials, blocking, and 1-D / 2-D response-surface optimization.

API design

PCA and PLS follow scikit-learn conventions: fit() returns self, fitted attributes end with a trailing underscore (scores_, loadings_, spe_, hotellings_t2_, r2_cumulative_, …), and predict() returns an sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …). Inputs are accepted as pandas.DataFrame, and index/column labels are preserved through fit and transform.

Documentation & learning resources

Contributing

Bug reports and feature requests are welcome on the issue tracker.

License

MIT - see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.17.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.17.0-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.17.0.tar.gz.

File metadata

  • Download URL: process_improve-1.17.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.17.0.tar.gz
Algorithm Hash digest
SHA256 40e2ede9657a7989342771ee6dd9f0e46a1502d46b023e2f779679a9310315fe
MD5 8c5c82825025c243a21932a6e81def29
BLAKE2b-256 a245194714dde18f0ad3c43cf25fd00a05013c01e99c7bc4def86f0cc938bae6

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.17.0.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.17.0-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9edbc18f64066599c7a22737f6030502db46376cb4ca45e911915f941cca0965
MD5 ec911524314e426a74b9499446ef0b59
BLAKE2b-256 eae363d3bd2b0f9eab91374fd4b0c2ed6b1df5b528c266d12a19ec96295dd3bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.17.0-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page