Skip to main content

Incremental machine learning in Python — learn one observation at a time

Project description

incre-ml

Incremental machine learning in Python. Every algorithm processes one observation at a time — no batches, no retraining, no accumulated history.

from incre_ml.forecasting import HoltWinters

model = HoltWinters(season_length=24)

for x, y in stream:
    prediction = model.predict_one(x)
    model.learn_one(x, y)

Why incre-ml?

Traditional ML libraries require batches. When data arrives continuously — sensor telemetry, financial ticks, patient vitals, API logs — you need models that update incrementally and predict instantly. incre-ml provides a complete ecosystem for this: forecasting, anomaly detection, classification, clustering, drift detection, uncertainty quantification, federated learning, and physics-informed constraints.

Core API contract — every model implements:

Method Purpose
model.learn_one(x, y) Update on one observation
model.predict_one(x) Predict from one observation
model.explain_one(x) Feature contributions for one observation
model.clear() Reset internal state

All models use x: dict[str, Any] for features — sparse, heterogeneous, schema-agnostic by design.

Installation

pip install incre-ml

With optional connectors:

pip install "incre-ml[kafka]"       # Confluent Kafka
pip install "incre-ml[mqtt]"        # MQTT / IoT
pip install "incre-ml[connectors]"  # All connectors
pip install "incre-ml[dashboard]"   # Streamlit demo app

Capabilities

Forecasting

8 forecasters: Naive, Holt-Winters, AR/SNARIMAX, Kalman Filter, RLS, Croston/TSB, Bootstrapped ensembles, and model selection.

from incre_ml.forecasting import BootstrappedRegressor, HoltWinters

model = BootstrappedRegressor(HoltWinters(season_length=24), n_models=5)

pred, uncertainty = model.predict_with_uncertainty(x)
model.learn_one(x, y)

Anomaly Detection

Statistical (Z-score), geometric (Half-Space Trees), and predictive detectors — composable via weighted ensemble voting. Includes CUSUM and EWMA industrial detectors.

from incre_ml.anomaly import AnomalyEnsemble, ZScoreDetector, PredictiveAnomalyDetector
from incre_ml.forecasting import HoltWinters

ensemble = AnomalyEnsemble({
    "stat": ZScoreDetector(feature_name="temperature"),
    "pred": PredictiveAnomalyDetector(
        model=HoltWinters(season_length=96),
        feature_name="temperature",
    ),
})

score = ensemble.score_one({"temperature": 95.2})  # 0.0 (normal) to 1.0 (anomalous)
ensemble.learn_one({"temperature": 95.2})

Streaming Classification

Hoeffding Tree, Logistic Regression, Naive Bayes, SGD, Adaptive Random Forest, and Windowed KNN — all incremental.

from incre_ml.classification import HoeffdingTreeClassifier

clf = HoeffdingTreeClassifier(grace_period=10)

proba = clf.predict_proba_one(x)    # class probabilities
explanation = clf.explain_one(x)     # feature contributions
clf.learn_one(x, y)

Pipelines

Chain transformers and predictors into unified streaming workflows.

from incre_ml.compose import Pipeline
from incre_ml.preprocessing import StandardScaler, SelectKBest

pipe = Pipeline([StandardScaler(), SelectKBest(k=5), model])
pipe.learn_one(x, y)

Drift Detection

ADWIN (exponential histogram, O(log n) memory), statistical detectors, and DriftAdaptiveWrapper for automatic model adaptation via reset, decay, or replacement strategies.

Federated Learning

FederatedEnsemble trains local models per site/region and aggregates via averaging or median — without centralizing raw data.

from incre_ml.federated import FederatedEnsemble
from incre_ml.linear import LinearRegression

fed = FederatedEnsemble(LinearRegression(), ["site_a", "site_b", "site_c"])

fed.learn_one("site_a", x, y)
global_pred = fed.predict_global(x)
fed.sync()  # aggregate local models

Physics-Informed Constraints

Wrap any regressor with domain constraints to prevent physically implausible predictions.

from incre_ml.physics.thermal import NewtonCoolingConstraint
from incre_ml.base.physics import PhysicsInformedWrapper

guard = NewtonCoolingConstraint(k=0.05, ambient_temp=15.0, max_deviation=3.0)
safe_model = PhysicsInformedWrapper(model, guard)

Also Included

  • Clustering — OnlineKMeans, DBSTREAM (density-based stream clustering)
  • Uncertainty — Conformal prediction, adaptive conformal intervals (ACI), bootstrapped wrappers
  • Preprocessing — Welford's scalers, online feature selection, temporal features, encoders
  • Evaluation — Prequential (test-then-train) scoring protocol
  • Active Learning — Uncertainty sampling
  • Explainability — Per-prediction feature contributions
  • Metrics — Online regression and classification metrics
  • Simulation — Synthetic data generators for manufacturing, clinical, demand, finance, traffic, building, and plant workforce scenarios
  • Serving — Production serving utilities
  • I/O — CSV, Kafka, and MQTT connectors
  • Model Selection — Bandit-based AutoML for streaming

Interactive Dashboard

Explore all capabilities through 8 real-world scenarios with live streaming data:

pip install "incre-ml[dashboard]"
streamlit run app.py

Scenarios: Manufacturing quality, supply chain demand, sales anomaly monitoring, clinical triage, connected vehicle safety, API security monitoring, smart building energy, and plant workforce orchestration.

Plant Workforce Orchestration

The workforce scenario demonstrates closed-loop control for a multi-line manufacturing plant. When a production line degrades or goes down, the system detects the failure via online anomaly detection (CUSUM + Z-Score), selects a labor reallocation strategy using an epsilon-greedy bandit that learns from throughput-vs-cost rewards, and redistributes workers across remaining lines — all incrementally, one 15-minute tick at a time. Rush orders, quality crises, and sudden equipment failures trigger real-time priority shifts with cost impact tracking.

from incre_ml.simulation.generators import PlantWorkforceGenerator

gen = PlantWorkforceGenerator(total_steps=1200, total_workers=48)

for obs in gen:
    # obs contains per-line health, throughput, quality, crew, status
    # plus plant-level shift, rush orders, and worker pool
    print(obs["A_health"], obs["B_status"], obs["rush_order"])

Design Principles

  • Welford's algorithm everywhere — all statistics use O(1) memory incremental computation
  • Lazy state initialization — internal state created on first learn_one(), not __init__
  • Composition over inheritance — shallow hierarchies (1-2 levels), compose via Pipeline and ensembles
  • Strict typingmypy --strict on all library code
  • Return self from learn_one() — enables method chaining

Development

git clone https://github.com/pespila/incre-ml.git
cd incre-ml
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pre-commit install
ruff check . && ruff format .   # lint + format
mypy src                        # strict type checking
pytest                          # tests with coverage

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incre_ml-0.2.0.tar.gz (107.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

incre_ml-0.2.0-py3-none-any.whl (111.8 kB view details)

Uploaded Python 3

File details

Details for the file incre_ml-0.2.0.tar.gz.

File metadata

  • Download URL: incre_ml-0.2.0.tar.gz
  • Upload date:
  • Size: 107.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for incre_ml-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3238d692ff8961bc649571f94a21bbf0b342fbc1f5cad01d5715b8d15bea48a3
MD5 d2644a61858989261e18df93202f7c2a
BLAKE2b-256 401d1792629872051f6598ad95a17a056b8e4e7ae98ccbea451dd7976b504807

See more details on using hashes here.

File details

Details for the file incre_ml-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: incre_ml-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 111.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for incre_ml-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 038d4abb1900990e5e400d04142a7e38af233211d117fb4e7ee0d00cc7ff651e
MD5 3c0ef4217592a9e8837af73bfa4c7e86
BLAKE2b-256 979295b03c849b08033ffa7b70aba4b8844d0a69d4a136fc0cd64174633ca713

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page