Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

A pragmatic Python toolkit for industrial process data - multivariate analysis, designed experiments, and process monitoring, in one place.

PyPI version Python versions License CI codecov Docs

What is this?

process-improve is the companion package to the online textbook Process Improvement using Data, and powers the statistical engine behind factori.al. It bundles the methods practitioners actually reach for on real plant and lab data - PCA / PLS with proper missing-data handling, designed experiments with a multi-stage strategy recommender, control charts, and batch-data tooling - behind an API that is sklearn-compatible where it makes sense and pandas-native throughout.

Highlights

🧪 Designed Experiments

  • Full-factorial, fractional-factorial, and response-surface designs (built on pyDOE3)
  • A DOE strategy recommender that plans a complete multi-stage program - screening → optimization → confirmation - from ~50 deterministic rules, with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other domains
  • ANOVA, main-effects plots, linear-model fitting, and response optimization

📊 Latent Variable Methods

  • PCA with SVD and NIPALS algorithms, plus missing-data via Trimmed Score Regression
  • PLS regression with a fully sklearn-compatible API
  • TPLS - PLS for T-shaped data structures
  • Diagnostics: Hotelling's T², SPE, score contributions, and ESD-based outlier detection
  • Component selection via PRESS / Wold's criterion
  • Interactive Plotly score, loading, SPE, and T² plots, bound directly to fitted models

📈 Process Monitoring

  • Shewhart, CUSUM, and Holt-Winters control charts (regular and robust variants)
  • Process-capability index Cpk

🔄 Batch Data Analysis

  • DTW-based batch alignment, reference-batch selection, resampling
  • 15+ batch feature extractors (mean, slope, area, elbow, rupture, crossings, robust variants, …)
  • Format conversions between wide, melted, and dict-of-frames batch layouts

📐 Univariate & Robust Regression

  • t-tests (paired, independent, plus DataFrame-aware helpers)
  • ESD outliers, Sn estimator, MAD, normality tests, variance decomposition
  • Robust regression: repeated-median slope and friends, for outlier-resistant fits

🎨 Visualization

  • Plotly-backed plots that attach to fitted PCA / PLS models
  • A backend-agnostic ChartSpec layer with Plotly and ECharts adapters
  • DOE-specific plots: main-effects, design visualization

Installation

pip install process-improve

Requires Python 3.10 or newer.

Quick start

PCA - Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)        # cumulative R² per component
pca.score_plot()                  # interactive Plotly plot

PLS - Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

X_s = MCUVScaler().fit_transform(X)
Y_s = MCUVScaler().fit_transform(Y)

pls = PLS(n_components=3).fit(X_s, Y_s)
result = pls.predict(X_s)
print(result.y_hat, result.spe, result.hotellings_t2)

DOE - multi-stage experimental strategy

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])

Longer, fully-worked versions of each example live in the Quickstart guide and the notebooks_examples/ folder.

API design

PCA and PLS follow scikit-learn conventions: fit() returns self, fitted attributes end with a trailing underscore (scores_, loadings_, spe_, hotellings_t2_, r2_cumulative_, …), and predict() returns an sklearn.utils.Bunch with named fields (y_hat, spe, hotellings_t2, …). Inputs are accepted as pandas.DataFrame, and index/column labels are preserved through fit and transform.

Documentation & learning resources

Contributing

Bug reports and feature requests are welcome on the issue tracker.

License

MIT - see LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.13.8.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.13.8-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.13.8.tar.gz.

File metadata

  • Download URL: process_improve-1.13.8.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.13.8.tar.gz
Algorithm Hash digest
SHA256 9b37201a0d1d3857827a910322ccd7979fc1f2607ba982a7a4547ffd0b72b6a8
MD5 422dbc0cb1d8dbdee3b041c399f2d45c
BLAKE2b-256 441ee522fd3effa5d90442298c1b1d7722ffb69cae5066e5dca651e53c1bfea3

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.13.8.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.13.8-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.13.8-py3-none-any.whl
Algorithm Hash digest
SHA256 38c263de2d852b5ced5b64e0c8f721fb0e58515a00ec27c03dbb5b0c494fb816
MD5 045b99488423f0eaec438737d26b91eb
BLAKE2b-256 2ebfbf384cb6a3046313a7c914b45ff6c85e630a81f8c5d3eff7a61cfc3a406d

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.13.8-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page