Automatic concept drift detection for streaming datasets
Project description
pattern-drift
Automatic concept drift detection for streaming datasets.
pattern-drift is a Python library for data scientists and ML engineers working with time-sensitive models. It continuously monitors incoming data distributions, detects statistical drift, and recommends optimal retraining windows — keeping models accurate without manual monitoring.
Installation
pip install pattern-drift
Optional extras:
pip install pattern-drift[viz] # adds matplotlib for drift timeline visualisation
pip install pattern-drift[alerts] # adds requests for Slack and webhook callbacks
pip install pattern-drift[all] # everything
Quick Start
from pattern_drift import DriftMonitor
monitor = DriftMonitor(method="ADWIN", sensitivity=0.002)
for record in stream: # dict, pandas Series, or single-row DataFrame
result = monitor.update(record)
if result.drift_detected:
print(f"Drift! type={result.drift_type}")
print(f"Features: {result.drifted_features}")
print(f"Score: {result.drift_score:.4f}")
if result.retraining_window:
rw = result.retraining_window
print(f"Retrain on records {rw.start}–{rw.end} (confidence {rw.confidence:.2%})")
Detection Algorithms
| Algorithm | Mechanism | Best For |
|---|---|---|
ADWIN (default) |
Variable-length window split testing on mean differences | Gradual drift — adapts window size dynamically |
PageHinkley |
Cumulative sum of deviations from the running mean | Sudden drift — extremely fast and memory-efficient |
KSWIN |
Kolmogorov-Smirnov test comparing recent vs. reference window | Distribution shape changes beyond just mean shifts |
DDM |
Monitors prediction error rate vs. historical minimum | Classifier performance monitoring post-deployment |
Switch algorithms with a single parameter — no other code changes required:
monitor = DriftMonitor(method="PageHinkley")
monitor = DriftMonitor(method="KSWIN")
monitor = DriftMonitor(method="DDM")
API Reference
DriftMonitor
DriftMonitor(
method="ADWIN", # Detection algorithm
sensitivity=0.002, # Drift threshold — lower = more sensitive
min_window=30, # Minimum history before drift can be reported
max_window=10_000, # Maximum records retained in memory
features=None, # List of columns to monitor (None = auto-detect all numeric)
callbacks=None, # List of callables fired on drift
)
Methods
| Method | Description |
|---|---|
monitor.update(data) |
Feed a single row (dict/Series) or micro-batch (DataFrame). Returns DriftResult. |
monitor.reset() |
Reset all internal detector state and history. |
monitor.plot_drift_timeline() |
Render an interactive drift score timeline chart. |
monitor.export_report(path) |
Export full drift history to JSON or CSV. |
monitor.set_reference(data) |
Manually set the reference distribution for comparison. |
DriftMonitor.from_config(path) |
Class method — instantiate from a YAML config file. |
DriftResult Fields
| Field | Type | Description |
|---|---|---|
drift_detected |
bool |
True if drift was found in any monitored feature |
drift_type |
str | None |
sudden · gradual · incremental · recurring |
drifted_features |
list[str] |
Names of all features where drift was detected |
drift_score |
float |
Maximum drift score across all features (0.0–1.0+) |
retraining_window |
RetrainingWindowResult | None |
Suggested retraining window with start, end, n_samples, confidence |
timestamp |
datetime |
UTC datetime when the drift event was recorded |
Alerts & Callbacks
from pattern_drift import DriftMonitor
from pattern_drift.dispatcher import AlertDispatcher
monitor = DriftMonitor(
callbacks=[
AlertDispatcher.slack_callback("https://hooks.slack.com/..."),
AlertDispatcher.webhook_callback("https://my-service/drift"),
AlertDispatcher.log_callback(level="warning"),
lambda result: print(result), # custom inline callback
]
)
YAML Configuration
# drift_config.yaml
method: ADWIN
sensitivity: 0.002
min_window: 30
max_window: 10000
features:
- age
- income
- session_duration
monitor = DriftMonitor.from_config("drift_config.yaml")
scikit-learn Pipeline Integration
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from pattern_drift.sklearn_wrapper import DriftDetector
pipe = Pipeline([
("scaler", StandardScaler()),
("drift", DriftDetector(method="ADWIN", sensitivity=0.002)),
])
pipe.fit(X_train)
for batch in stream:
X_out = pipe.transform(batch) # data passes through unchanged
Visualisation
monitor.plot_drift_timeline() # interactive chart (requires matplotlib)
monitor.export_report("report.json") # or "report.csv"
Architecture
Each incoming record flows through five sequential stages:
- Feature Extractor — splits each row into per-column numeric signals
- Detector Pool — maintains one statistical detector per feature; computes drift score on every update
- Drift Classifier — labels drift as
sudden/gradual/incremental/recurringbased on signal shape - Retraining Window Engine — scans history to find the last stable data window; returns a confidence-scored recommendation
- Alert Dispatcher — fires registered callbacks (Slack, webhook, log, email, or custom)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pattern_drift-0.1.0.tar.gz.
File metadata
- Download URL: pattern_drift-0.1.0.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
344a74b5de2898845239907af790824385dd9102982c14d23bc4946035077c7b
|
|
| MD5 |
2c21e1053a18ec1f6fbd4fd921007452
|
|
| BLAKE2b-256 |
230004ef41a196a33181f6ffc9aba423712b4068433dad733f3b78df3162c3ef
|
File details
Details for the file pattern_drift-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pattern_drift-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36644dfd36097e3cd263c38791e39e16f9c7e122f0d7717f8817c849eca6c446
|
|
| MD5 |
fc7fceeb2daf89eb2a56d1d03913d52b
|
|
| BLAKE2b-256 |
91d85359ade8e6e809f329de0d84179c7b5342eb764bdd4936bf46b74d07218e
|