Skip to main content

Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.

Project description

Process Improvement using Data

codecov

A Python package for multivariate data analysis, designed experiments, and process monitoring. Companion to the online textbook Process Improvement using Data. This package also powers the statistical engine behind factori.al.

Installation

pip install process-improve

Quick Start

PCA — Principal Component Analysis

import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

# Load and scale your data
X = pd.read_csv("your_data.csv", index_col=0)
scaler = MCUVScaler().fit(X)
X_scaled = scaler.transform(X)

# Fit a PCA model
pca = PCA(n_components=3).fit(X_scaled)

# Inspect results
print(pca.scores_)  # Score matrix (N x A)
print(pca.loadings_)  # Loading matrix (K x A)
print(pca.r2_cumulative_)  # Cumulative R² per component

# Detect outliers
outliers = pca.detect_outliers(conf_level=0.95)

# Contribution analysis
contrib = pca.score_contributions(pca.scores_.iloc[0].values)

# Select number of components via cross-validation
result = PCA.select_n_components(X_scaled, max_components=10)
print(result.n_components)

# Built-in plots
pca.score_plot()
pca.spe_plot()
pca.t2_plot()
pca.loading_plot()

PLS — Projection to Latent Structures

from process_improve.multivariate.methods import PLS, MCUVScaler

# Scale X and Y separately
scaler_x = MCUVScaler().fit(X)
scaler_y = MCUVScaler().fit(Y)

# Fit a PLS model
pls = PLS(n_components=3).fit(scaler_x.transform(X), scaler_y.transform(Y))

# Inspect results
print(pls.scores_)  # X scores (N x A)
print(pls.beta_coefficients_)  # Regression coefficients (K x M)
print(pls.r2_cumulative_)  # Cumulative R² for Y

# Predict new observations
result = pls.predict(scaler_x.transform(X_new))
print(result.y_hat)  # Predicted Y values
print(result.spe)  # SPE for new data
print(result.hotellings_t2)  # Hotelling's T² for new data

# Detect outliers and analyze contributions
outliers = pls.detect_outliers(conf_level=0.95)
contrib = pls.score_contributions(pls.scores_.iloc[0].values)

DOE — Experimental Strategy Recommendation

Plan a complete multi-stage experimental program before running any experiments:

from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

# Define factors for a fermentation optimization
factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
    Factor(name="Yeast extract", low=1, high=10, units="g/L"),
    Factor(name="Agitation", low=100, high=400, units="rpm"),
    Factor(name="Aeration", low=0.5, high=2.0, units="vvm"),
    Factor(name="Inoculum", low=2, high=10, units="%v/v"),
]
responses = [Response(name="Yield", goal="maximize", units="g/L")]

# Get a complete experimental plan
strategy = recommend_strategy(
    factors=factors,
    responses=responses,
    budget=40,
    domain="fermentation",
)

# Inspect the multi-stage strategy
for stage in strategy["stages"]:
    print(f"Stage {stage['stage_number']}: {stage['stage_name']}")
    print(f"  Design: {stage['design_type']}, Runs: {stage['estimated_runs']}")
    print(f"  Purpose: {stage['purpose']}")

# Review reasoning, risks, and alternatives
print(strategy["budget_allocation"])
print(strategy["reasoning"])

The engine applies ~50 deterministic rules (from Montgomery, NIST, Stat-Ease) to recommend screening, optimization, and confirmation stages — with budget-aware allocation and domain-specific advice for fermentation, cell culture, pharma, and 5 other application domains.

Features

  • PCA with SVD, NIPALS, and missing data (TSR) algorithms
  • PLS regression with sklearn-compatible API
  • TPLS (Total PLS) for multi-block data
  • Missing data handling via TSR and NIPALS algorithms
  • Outlier detection combining Hotelling's T² and SPE with robust ESD test
  • Score contributions for variable-level diagnostics
  • Cross-validation for component selection (PRESS with Wold's criterion)
  • Interactive plots (Plotly) for scores, loadings, SPE, and T²
  • Designed experiments — full factorial, fractional factorial, response surface
  • DOE strategy recommender — multi-stage experimental planning (screening, optimization, confirmation) with budget-aware allocation and 8 application domains
  • Process monitoring — Shewhart, CUSUM, EWMA control charts
  • Batch data analysis — alignment, feature extraction, multivariate batch monitoring

API Design

Both PCA and PLS follow sklearn conventions:

  • Fitted attributes end with _ (e.g., scores_, loadings_, spe_)
  • fit() returns self
  • predict() returns a Bunch object with named fields
  • score() is compatible with sklearn.model_selection.cross_val_score
  • Works with pandas.DataFrame inputs (preserves index and column names)

Documentation

Full documentation is available at https://kgdunn.github.io/process-improve/.

To build the documentation locally:

cd docs
make html

License

MIT License. See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

process_improve-1.7.1.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

process_improve-1.7.1-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file process_improve-1.7.1.tar.gz.

File metadata

  • Download URL: process_improve-1.7.1.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for process_improve-1.7.1.tar.gz
Algorithm Hash digest
SHA256 82d1737a4f1ab7c624c8246139e08b610a3160f58a964d7874d005ec0596660d
MD5 7c2ae118d7218bbc6e6acef9108e280c
BLAKE2b-256 190d67320696676f710174ba757e613f3cd18abde8a6c1b3bf57c0c781c7b277

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.7.1.tar.gz:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file process_improve-1.7.1-py3-none-any.whl.

File metadata

File hashes

Hashes for process_improve-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 901957ee187cd56dbe9d65ed0d8f5d10cee930557b69dd5357b51e28ff4f3508
MD5 15f1484593052d13f7cc66ede91e6251
BLAKE2b-256 0d89a65f23c870b5a7de0f87caedaf6ab653f213e16f8996bba281cea47bd836

See more details on using hashes here.

Provenance

The following attestation bundles were made for process_improve-1.7.1-py3-none-any.whl:

Publisher: publish.yml on kgdunn/process-improve

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page