Data drift detection toolkit for ML pipelines — PSI, KS, KL divergence, chi-square, and more.
Project description
DriftGuardAI
Data drift detection toolkit for ML pipelines. Monitor feature-level distribution shifts using PSI, KS test, KL divergence, chi-square test, and more.
DriftGuardAI helps ML engineers and data scientists detect when production data drifts away from training data — a leading indicator of model degradation.
Installation
pip install datadriftguard
Optional extras:
# Include the FastAPI server
pip install datadriftguard[api]
# Include the Streamlit dashboard
pip install datadriftguard[dashboard]
# Install everything for development
pip install datadriftguard[dev,api,dashboard]
Quick Start
import pandas as pd
from driftguardai import DriftDetector, ThresholdSettings
# Load your baseline (training) and incoming (production) data
baseline = pd.read_csv("baseline.csv")
incoming = pd.read_csv("incoming.csv")
# Create a detector with default or custom thresholds
detector = DriftDetector(
baseline_dataset=baseline,
incoming_dataset=incoming,
thresholds=ThresholdSettings(psi=0.20, ks_significance_level=0.05),
)
# Generate a drift report
report = detector.generate_report(dataset_name="production_model_v2")
# Inspect results
print(f"Total features: {report.total_features}")
print(f"Drifted features: {len(report.drifted_features)}")
for feature in report.drifted_features:
print(f" ⚠ {feature.feature_name} ({feature.feature_type}): drift detected")
if feature.metrics.psi:
print(f" PSI: {feature.metrics.psi.value:.4f} (threshold: {feature.metrics.psi.threshold})")
if feature.metrics.ks:
print(f" KS: {feature.metrics.ks.value:.4f} (p={feature.metrics.ks.p_value:.4f})")
Drift Metrics
| Metric | Type | Description |
|---|---|---|
| PSI (Population Stability Index) | Numerical | Measures distribution shift between baseline and incoming data |
| KS Test (Kolmogorov-Smirnov) | Numerical | Non-parametric test for distribution equality |
| KL Divergence (Kullback-Leibler) | Numerical | Information-theoretic measure of distribution difference |
| Chi-Square Test | Categorical | Tests independence between categorical distributions |
| Distribution Difference | Categorical | Total variation distance between category frequencies |
Alerting
DriftGuardAI includes an alert system that can dispatch drift notifications via logging, webhooks, or Slack:
from driftguardai import AlertManager, DriftDetector
from driftguardai.core.config import AlertSettings
detector = DriftDetector(baseline, incoming)
report = detector.generate_report(dataset_name="production")
alert_manager = AlertManager(
settings=AlertSettings(
enabled=True,
log_alerts=True,
slack_webhook_url="https://hooks.slack.com/services/...",
)
)
dispatch_report = alert_manager.dispatch(report)
print(f"Dispatched {dispatch_report.total_alerts} alerts")
Retraining Triggers
Automatically evaluate whether model retraining should be triggered based on drift severity:
from driftguardai import RetrainingManager
from driftguardai.core.config import RetrainingSettings
manager = RetrainingManager(
settings=RetrainingSettings(
enabled=True,
trigger_severity="critical",
min_alert_count=2,
)
)
result = manager.evaluate(report)
if result.triggered:
print(f"Retraining triggered: {result.reason}")
print(f"Affected features: {result.affected_features}")
Configuration
DriftGuardAI can be configured programmatically or via a config.yaml file:
thresholds:
psi: 0.20
ks_significance_level: 0.05
kl_divergence: 0.10
categorical_distance: 0.10
categorical_chi_square_significance_level: 0.05
histogram_bins: 10
histogram_strategy: quantile
alerts:
enabled: true
log_alerts: true
minimum_severity: warning
slack_webhook_url: https://hooks.slack.com/services/...
retraining:
enabled: true
trigger_severity: critical
min_alert_count: 1
Place config.yaml in your working directory, or set the DRIFT_GUARD_CONFIG_PATH environment variable:
export DRIFT_GUARD_CONFIG_PATH=/path/to/your/config.yaml
Optional: API Server
Run a FastAPI server for drift detection over HTTP:
pip install datadriftguard[api]
uvicorn driftguardai.api.app:app --reload
Endpoints:
GET /api/v1/health— Health checkPOST /api/v1/drift/analyze— Analyze drift from file pathsPOST /api/v1/drift/analyze/files— Analyze drift from uploaded CSVs
Optional: Streamlit Dashboard
Visualize drift metrics with an interactive dashboard:
pip install datadriftguard[dashboard]
streamlit run src/driftguardai/dashboard/app.py
Architecture
driftguardai/
├── core/ # Domain models, config, exceptions, interfaces, use cases
├── drift/ # Drift detection implementations and statistical metrics
├── data/ # Data ingestion and repository adapters
├── utils/ # Logging and dataset validation utilities
├── api/ # Optional FastAPI HTTP layer
└── dashboard/ # Optional Streamlit visualization
Development
git clone https://github.com/suryanandanbabbar/DriftGuardAI.git
cd DriftGuardAI
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,api,dashboard]"
pytest
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datadriftguard-0.1.0.tar.gz.
File metadata
- Download URL: datadriftguard-0.1.0.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
812ef203ff5f71247bb389848d36fd542bc5a9208fe66463cc46920b6dec57bf
|
|
| MD5 |
4cb0074ad50728044fb5dfa1310e0f2a
|
|
| BLAKE2b-256 |
3ccfb56253d59fc1459c9749141d4670cd2e1b8d62f9fcd4f018eff70b611417
|
File details
Details for the file datadriftguard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datadriftguard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b6cbaa39ee3955164c8646cbbf4a56e25b83301332f81e9b7c1703500cde5b
|
|
| MD5 |
4e521443b1455190a529579011c0164c
|
|
| BLAKE2b-256 |
45bb70d32671de88894ec8fc994b63776c3329ce9c3377d0dcf7840ba665d34a
|