Skip to main content

Data drift detection toolkit for ML pipelines — PSI, KS, KL divergence, chi-square, and more.

Project description

DriftGuardAI

PyPI version Python versions License: MIT

Data drift detection toolkit for ML pipelines. Monitor feature-level distribution shifts using PSI, KS test, KL divergence, chi-square test, and more.

DriftGuardAI helps ML engineers and data scientists detect when production data drifts away from training data — a leading indicator of model degradation.

Installation

pip install datadriftguard

Optional extras:

# Include the FastAPI server
pip install datadriftguard[api]

# Include the Streamlit dashboard
pip install datadriftguard[dashboard]

# Install everything for development
pip install datadriftguard[dev,api,dashboard]

Quick Start

import pandas as pd
from driftguardai import DriftDetector, ThresholdSettings

# Load your baseline (training) and incoming (production) data
baseline = pd.read_csv("baseline.csv")
incoming = pd.read_csv("incoming.csv")

# Create a detector with default or custom thresholds
detector = DriftDetector(
    baseline_dataset=baseline,
    incoming_dataset=incoming,
    thresholds=ThresholdSettings(psi=0.20, ks_significance_level=0.05),
)

# Generate a drift report
report = detector.generate_report(dataset_name="production_model_v2")

# Inspect results
print(f"Total features: {report.total_features}")
print(f"Drifted features: {len(report.drifted_features)}")

for feature in report.drifted_features:
    print(f"  ⚠ {feature.feature_name} ({feature.feature_type}): drift detected")
    if feature.metrics.psi:
        print(f"    PSI: {feature.metrics.psi.value:.4f} (threshold: {feature.metrics.psi.threshold})")
    if feature.metrics.ks:
        print(f"    KS:  {feature.metrics.ks.value:.4f} (p={feature.metrics.ks.p_value:.4f})")

Drift Metrics

Metric Type Description
PSI (Population Stability Index) Numerical Measures distribution shift between baseline and incoming data
KS Test (Kolmogorov-Smirnov) Numerical Non-parametric test for distribution equality
KL Divergence (Kullback-Leibler) Numerical Information-theoretic measure of distribution difference
Chi-Square Test Categorical Tests independence between categorical distributions
Distribution Difference Categorical Total variation distance between category frequencies

Alerting

DriftGuardAI includes an alert system that can dispatch drift notifications via logging, webhooks, or Slack:

from driftguardai import AlertManager, DriftDetector
from driftguardai.core.config import AlertSettings

detector = DriftDetector(baseline, incoming)
report = detector.generate_report(dataset_name="production")

alert_manager = AlertManager(
    settings=AlertSettings(
        enabled=True,
        log_alerts=True,
        slack_webhook_url="https://hooks.slack.com/services/...",
    )
)
dispatch_report = alert_manager.dispatch(report)
print(f"Dispatched {dispatch_report.total_alerts} alerts")

Retraining Triggers

Automatically evaluate whether model retraining should be triggered based on drift severity:

from driftguardai import RetrainingManager
from driftguardai.core.config import RetrainingSettings

manager = RetrainingManager(
    settings=RetrainingSettings(
        enabled=True,
        trigger_severity="critical",
        min_alert_count=2,
    )
)
result = manager.evaluate(report)
if result.triggered:
    print(f"Retraining triggered: {result.reason}")
    print(f"Affected features: {result.affected_features}")

Configuration

DriftGuardAI can be configured programmatically or via a config.yaml file:

thresholds:
  psi: 0.20
  ks_significance_level: 0.05
  kl_divergence: 0.10
  categorical_distance: 0.10
  categorical_chi_square_significance_level: 0.05
  histogram_bins: 10
  histogram_strategy: quantile

alerts:
  enabled: true
  log_alerts: true
  minimum_severity: warning
  slack_webhook_url: https://hooks.slack.com/services/...

retraining:
  enabled: true
  trigger_severity: critical
  min_alert_count: 1

Place config.yaml in your working directory, or set the DRIFT_GUARD_CONFIG_PATH environment variable:

export DRIFT_GUARD_CONFIG_PATH=/path/to/your/config.yaml

Optional: API Server

Run a FastAPI server for drift detection over HTTP:

pip install datadriftguard[api]
uvicorn driftguardai.api.app:app --reload

Endpoints:

  • GET /api/v1/health — Health check
  • POST /api/v1/drift/analyze — Analyze drift from file paths
  • POST /api/v1/drift/analyze/files — Analyze drift from uploaded CSVs

Optional: Streamlit Dashboard

Visualize drift metrics with an interactive dashboard:

pip install datadriftguard[dashboard]
streamlit run src/driftguardai/dashboard/app.py

Architecture

driftguardai/
├── core/          # Domain models, config, exceptions, interfaces, use cases
├── drift/         # Drift detection implementations and statistical metrics
├── data/          # Data ingestion and repository adapters
├── utils/         # Logging and dataset validation utilities
├── api/           # Optional FastAPI HTTP layer
└── dashboard/     # Optional Streamlit visualization

Development

git clone https://github.com/suryanandanbabbar/DriftGuardAI.git
cd DriftGuardAI
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,api,dashboard]"
pytest

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datadriftguard-0.1.0.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datadriftguard-0.1.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file datadriftguard-0.1.0.tar.gz.

File metadata

  • Download URL: datadriftguard-0.1.0.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for datadriftguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 812ef203ff5f71247bb389848d36fd542bc5a9208fe66463cc46920b6dec57bf
MD5 4cb0074ad50728044fb5dfa1310e0f2a
BLAKE2b-256 3ccfb56253d59fc1459c9749141d4670cd2e1b8d62f9fcd4f018eff70b611417

See more details on using hashes here.

File details

Details for the file datadriftguard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datadriftguard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for datadriftguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11b6cbaa39ee3955164c8646cbbf4a56e25b83301332f81e9b7c1703500cde5b
MD5 4e521443b1455190a529579011c0164c
BLAKE2b-256 45bb70d32671de88894ec8fc994b63776c3329ce9c3377d0dcf7840ba665d34a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page