Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.

PyPI version Python versions License CI codecov Docs

What is this?

process-improve is the companion package to the online textbook Process Improvement using Data, and powers the statistical engine behind factori.al. It bundles the methods practitioners actually reach for on real plant and lab data - PCA / PLS with proper missing-data handling, designed experiments with a multi-stage strategy recommender, control charts, and batch-data tooling - behind an API that is sklearn-compatible where it makes sense and pandas-native throughout.

Highlights

🧪 Designed Experiments

  • Full-factorial, fractional-factorial, and response-surface designs (built on pyDOE3)
  • A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
  • ANOVA, main-effects plots, linear-model fitting, and response optimization

📊 Latent Variable Methods

  • PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
  • PLS regression with a fully sklearn-compatible API
  • TPLS - PLS for T-shaped data structures
  • Diagnostics: Hotelling's T², SPE, score contributions, and ESD-based outlier detection
  • Component selection via PRESS / Wold's criterion
  • Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models

📈 Process Monitoring

  • Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
  • Process-capability index Cpk

🔄 Batch Data Analysis

  • DTW-based batch alignment, reference-batch selection, resampling
  • 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
  • Format conversions between wide, melted, and dict-of-frames batch layouts

📐 Univariate & Robust Regression

  • t-tests (paired, independent, plus DataFrame-aware helpers)
  • ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
  • Robust regression: repeated-median slope and friends, for outlier-resistant fits

🎨 Visualization

  • Plotly-backed plots that attach to fitted PCA / PLS models
  • A backend-agnostic ChartSpec layer with Plotly and ECharts adapters
  • DOE-specific plots: main-effects, design visualization

Installation

pip install process-improve

Requires Python 3.10 or newer.

Quick start

PCA - Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)        # cumulative R² per component
pca.score_plot()                  # interactive Plotly plot

PLS - Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

X_s = MCUVScaler().fit_transform(X)
Y_s = MCUVScaler().fit_transform(Y)

pls = PLS(n_components=3).fit(X_s, Y_s)
result = pls.predict(X_s)
print(result.y_hat, result.spe, result.hotellings_t2)

DOE - multi-stage experimental strategy

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])

Longer, fully-worked versions of each example live in the Quickstart guide and the notebooks_examples/ folder.

API design

PCA and PLS follow scikit-learn conventions: fit() returns self, fitted attributes end with a trailing underscore (scores_, loadings_, spe_, hotellings_t2_, r2_cumulative_, …), and predict() returns an sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …). Inputs are accepted as pandas.DataFrame, and index/column labels are preserved through fit and transform.

Documentation & learning resources

Contributing

Bug reports and feature requests are welcome on the issue tracker.

License

MIT - see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.13.3.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.13.3-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.13.3.tar.gz.

File metadata

  • Download URL: process_improve-1.13.3.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.13.3.tar.gz
Algorithm Hash digest
SHA256 e18a555da7a1c779234dfb87f03464ed13f0b953ea3345e2b948958744633707
MD5 e9d58ae7e99acf2a7bdb982d9e2710f8
BLAKE2b-256 429d79c0239a87ddebed5836ccbb678b4310a9a437f0054e7912fcf346cf153c

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.13.3.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.13.3-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.13.3-py3-none-any.whl
Algorithm Hash digest
SHA256 18e35e4cbd5de87b54d59fb51584c026c18177a58bcf2dee57d8cf24649ab981
MD5 bba0af5cad7056afe3e3a6b1f200bea6
BLAKE2b-256 8b613e720b9610c204033992eb700700bf36bb583e3eaada03585fe236e65ff9

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.13.3-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page