Skip to main content

Automatic concept drift detection for streaming datasets

Project description

pattern-drift

Automatic concept drift detection for streaming datasets.

pattern-drift is a Python library for data scientists and ML engineers working with time-sensitive models. It continuously monitors incoming data distributions, detects statistical drift, and recommends optimal retraining windows — keeping models accurate without manual monitoring.


Installation

pip install pattern-drift

Optional extras:

pip install pattern-drift[viz]      # adds matplotlib for drift timeline visualisation
pip install pattern-drift[alerts]   # adds requests for Slack and webhook callbacks
pip install pattern-drift[all]      # everything

Quick Start

from pattern_drift import DriftMonitor

monitor = DriftMonitor(method="ADWIN", sensitivity=0.002)

for record in stream:                    # dict, pandas Series, or single-row DataFrame
    result = monitor.update(record)
    if result.drift_detected:
        print(f"Drift! type={result.drift_type}")
        print(f"Features: {result.drifted_features}")
        print(f"Score: {result.drift_score:.4f}")
        if result.retraining_window:
            rw = result.retraining_window
            print(f"Retrain on records {rw.start}{rw.end} (confidence {rw.confidence:.2%})")

Detection Algorithms

Algorithm Mechanism Best For
ADWIN (default) Variable-length window split testing on mean differences Gradual drift — adapts window size dynamically
PageHinkley Cumulative sum of deviations from the running mean Sudden drift — extremely fast and memory-efficient
KSWIN Kolmogorov-Smirnov test comparing recent vs. reference window Distribution shape changes beyond just mean shifts
DDM Monitors prediction error rate vs. historical minimum Classifier performance monitoring post-deployment

Switch algorithms with a single parameter — no other code changes required:

monitor = DriftMonitor(method="PageHinkley")
monitor = DriftMonitor(method="KSWIN")
monitor = DriftMonitor(method="DDM")

API Reference

DriftMonitor

DriftMonitor(
    method="ADWIN",        # Detection algorithm
    sensitivity=0.002,     # Drift threshold — lower = more sensitive
    min_window=30,         # Minimum history before drift can be reported
    max_window=10_000,     # Maximum records retained in memory
    features=None,         # List of columns to monitor (None = auto-detect all numeric)
    callbacks=None,        # List of callables fired on drift
)

Methods

Method Description
monitor.update(data) Feed a single row (dict/Series) or micro-batch (DataFrame). Returns DriftResult.
monitor.reset() Reset all internal detector state and history.
monitor.plot_drift_timeline() Render an interactive drift score timeline chart.
monitor.export_report(path) Export full drift history to JSON or CSV.
monitor.set_reference(data) Manually set the reference distribution for comparison.
DriftMonitor.from_config(path) Class method — instantiate from a YAML config file.

DriftResult Fields

Field Type Description
drift_detected bool True if drift was found in any monitored feature
drift_type str | None sudden · gradual · incremental · recurring
drifted_features list[str] Names of all features where drift was detected
drift_score float Maximum drift score across all features (0.0–1.0+)
retraining_window RetrainingWindowResult | None Suggested retraining window with start, end, n_samples, confidence
timestamp datetime UTC datetime when the drift event was recorded

Alerts & Callbacks

from pattern_drift import DriftMonitor
from pattern_drift.dispatcher import AlertDispatcher

monitor = DriftMonitor(
    callbacks=[
        AlertDispatcher.slack_callback("https://hooks.slack.com/..."),
        AlertDispatcher.webhook_callback("https://my-service/drift"),
        AlertDispatcher.log_callback(level="warning"),
        lambda result: print(result),          # custom inline callback
    ]
)

YAML Configuration

# drift_config.yaml
method: ADWIN
sensitivity: 0.002
min_window: 30
max_window: 10000
features:
  - age
  - income
  - session_duration
monitor = DriftMonitor.from_config("drift_config.yaml")

scikit-learn Pipeline Integration

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from pattern_drift.sklearn_wrapper import DriftDetector

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("drift",  DriftDetector(method="ADWIN", sensitivity=0.002)),
])

pipe.fit(X_train)

for batch in stream:
    X_out = pipe.transform(batch)   # data passes through unchanged

Visualisation

monitor.plot_drift_timeline()           # interactive chart (requires matplotlib)
monitor.export_report("report.json")    # or "report.csv"

Architecture

Each incoming record flows through five sequential stages:

  1. Feature Extractor — splits each row into per-column numeric signals
  2. Detector Pool — maintains one statistical detector per feature; computes drift score on every update
  3. Drift Classifier — labels drift as sudden / gradual / incremental / recurring based on signal shape
  4. Retraining Window Engine — scans history to find the last stable data window; returns a confidence-scored recommendation
  5. Alert Dispatcher — fires registered callbacks (Slack, webhook, log, email, or custom)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pattern_drift-0.1.1.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pattern_drift-0.1.1-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file pattern_drift-0.1.1.tar.gz.

File metadata

  • Download URL: pattern_drift-0.1.1.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pattern_drift-0.1.1.tar.gz
Algorithm Hash digest
SHA256 890383b71e4acac1811aaae6dcfcc81cbe1aaafa7f40b029fadf3bb58acd3adf
MD5 62c67302f7842f026b6f2052fd413dcd
BLAKE2b-256 c0d30cc78e5ab4be7d041beb1b040625b666088565e10f78d2545d38cbef72ac

See more details on using hashes here.

File details

Details for the file pattern_drift-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pattern_drift-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pattern_drift-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 252e094f6251321cdc824b976e4155772899693e7fe406a9b0df5cbe4a000b58
MD5 fdaea81a3f6917e5843afb785b5ab645
BLAKE2b-256 f44325e41ffae414149dc8ba7ce61419d83ee2cb8a564cb0775a3b16157af067

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page