Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.

PyPI version Python versions License CI codecov Docs

What is this?

process-improve is the companion package to the online textbook Process Improvement using Data, and powers the statistical engine behind factori.al. It bundles the methods practitioners actually reach for on real plant and lab data - PCA / PLS with proper missing-data handling, designed experiments with a multi-stage strategy recommender, control charts, and batch-data tooling - behind an API that is sklearn-compatible where it makes sense and pandas-native throughout.

Highlights

🧪 Designed Experiments

  • Full-factorial, fractional-factorial, and response-surface designs (built on pyDOE3)
  • A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
  • ANOVA, main-effects plots, linear-model fitting, and response optimization

📊 Latent Variable Methods

  • PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
  • PLS regression with a fully sklearn-compatible API
  • TPLS - PLS for T-shaped data structures
  • Diagnostics: Hotelling's T², SPE, score contributions, and ESD-based outlier detection
  • Component selection via PRESS / Wold's criterion
  • Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models

📈 Process Monitoring

  • Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
  • Process-capability index Cpk

🔄 Batch Data Analysis

  • DTW-based batch alignment, reference-batch selection, resampling
  • 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
  • Format conversions between wide, melted, and dict-of-frames batch layouts

📐 Univariate & Robust Regression

  • t-tests (paired, independent, plus DataFrame-aware helpers)
  • ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
  • Robust regression: repeated-median slope and friends, for outlier-resistant fits

🎨 Visualization

  • Plotly-backed plots that attach to fitted PCA / PLS models
  • A backend-agnostic ChartSpec layer with Plotly and ECharts adapters
  • DOE-specific plots: main-effects, design visualization

Installation

pip install process-improve

Requires Python 3.10 or newer.

Quick start

PCA - Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)        # cumulative R² per component
pca.score_plot()                  # interactive Plotly plot

PLS - Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

X_s = MCUVScaler().fit_transform(X)
Y_s = MCUVScaler().fit_transform(Y)

pls = PLS(n_components=3).fit(X_s, Y_s)
result = pls.predict(X_s)
print(result.y_hat, result.spe, result.hotellings_t2)

DOE - multi-stage experimental strategy

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])

Longer, fully-worked versions of each example live in the Quickstart guide and the process_improve/notebooks_examples/ folder.

New to designed experiments? The Applied DoE tutorial is an eight-module worked-solution series that mirrors the 12-week DoE short course and shows the same workflow in Python with process-improve end to end - from a 2x2 first design through factorial scaling, fractional factorials, blocking, and 1-D / 2-D response-surface optimization.

API design

PCA and PLS follow scikit-learn conventions: fit() returns self, fitted attributes end with a trailing underscore (scores_, loadings_, spe_, hotellings_t2_, r2_cumulative_, …), and predict() returns an sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …). Inputs are accepted as pandas.DataFrame, and index/column labels are preserved through fit and transform.

Documentation & learning resources

Contributing

Bug reports and feature requests are welcome on the issue tracker.

License

MIT - see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.16.4.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.16.4-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.16.4.tar.gz.

File metadata

  • Download URL: process_improve-1.16.4.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.16.4.tar.gz
Algorithm Hash digest
SHA256 01db2732ca56c7dce9c078e4be04109d9f3bd55bdb8c3805664b0e9d65e6b7e8
MD5 70ad96a0e8a752e26e8553af901d9e5a
BLAKE2b-256 01c376db1910be5fb83c3028452eee20cff372261b3f3622d4c7555ae8796e24

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.16.4.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.16.4-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.16.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eb1e6b634b32736f608ccd3978ba405acb70b8146952db6d77e546eee6b45d8e
MD5 54ad1cd01cabb68e99c0b423f68df26b
BLAKE2b-256 574864949746a1547d1c4c073f198b7be7b5e0ed3065893b829563d84c26453f

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.16.4-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page