Skip to main content

Lightweight ML model drift detection — CLI, Prometheus metrics, and alerts

Project description

drift-watchdog 🐕

Lightweight ML model drift detection — CLI, Prometheus metrics, and alerts. No platform required.

PyPI version License: Apache 2.0 Python 3.9+ Prometheus


The problem

Your model was accurate last month. Now it's quietly wrong — and you don't know why.

Input distributions shift, upstream data pipelines change schema, feature encodings drift. Most teams have no drift detection at all, or rely on heavyweight ML platforms that take weeks to set up. drift-watchdog fills that gap: a single binary or Python sidecar that monitors your model's input/output distributions and fires alerts when something goes wrong.


Features

  • Statistical drift detection — PSI, KS-test, Jensen-Shannon divergence, and Wasserstein distance out of the box
  • Concept drift detection — monitor model outputs and label distributions for performance degradation
  • HTML report export — generate beautiful, shareable HTML reports for drift analysis
  • Prometheus exporter — exposes /metrics endpoint, plug straight into your existing Grafana stack
  • CLI first — run ad-hoc drift checks in CI/CD or cron without writing any code
  • Alert integrations — Slack, PagerDuty, and webhook support
  • Framework agnostic — works with scikit-learn, XGBoost, PyTorch, TensorFlow, or any model that takes tabular input
  • Reference baseline management — store, version, and compare against baselines in local files, S3, or GCS
  • Lightweight — no database, no server, no orchestrator required

Quickstart

pip install drift-watchdog

1. Capture a reference baseline

drift-watchdog baseline create \
  --data reference_data.csv \
  --output baselines/v1.json \
  --name "production-v1"

2. Run a drift check

drift-watchdog check \
  --baseline baselines/v1.json \
  --current current_batch.csv \
  --threshold 0.2
✓ feature: age           PSI=0.04  [OK]
✓ feature: income        PSI=0.09  [OK]
⚠ feature: loan_amount   PSI=0.31  [DRIFT DETECTED]
✗ feature: credit_score  PSI=0.58  [SEVERE DRIFT]

Overall drift score: 0.43 — ALERT

3. Run as a Prometheus exporter

drift-watchdog serve \
  --baseline baselines/v1.json \
  --data-source s3://my-bucket/inference-logs/ \
  --port 9090 \
  --interval 300

Metrics are now available at http://localhost:9090/metrics.

4. Generate HTML report

drift-watchdog check \
  --baseline baselines/v1.json \
  --current current_batch.csv \
  --threshold 0.2 \
  --report drift_report.html

This generates a beautiful, shareable HTML report with detailed drift analysis.

5. Concept drift detection

Monitor model outputs and label distributions for performance degradation:

drift-watchdog concept-check \
  --baseline-predictions baseline_preds.csv \
  --baseline-labels baseline_labels.csv \
  --current-predictions current_preds.csv \
  --current-labels current_labels.csv \
  --threshold 0.2 \
  --report concept_drift_report.html

Python API

from drift_watchdog import DriftDetector, BaselineStore

store = BaselineStore("baselines/v1.json")
detector = DriftDetector(baseline=store.load())

result = detector.check(current_df)

for feature, report in result.features.items():
    print(f"{feature}: PSI={report.psi:.3f}, drift={report.is_drift}")

if result.overall_drift:
    result.alert()  # fires configured alert channels

Configuration

Create a watchdog.yaml in your project root:

baseline:
  path: baselines/v1.json
  storage: s3                      # local | s3 | gcs
  bucket: my-model-baselines

detection:
  methods: [psi, ks_test]
  thresholds:
    psi: 0.2                       # 0.1 = slight, 0.2 = moderate, 0.25+ = severe
    ks_pvalue: 0.05
  features:
    exclude: [id, timestamp]       # columns to skip

alerts:
  slack:
    webhook_url: ${SLACK_WEBHOOK_URL}
    channel: "#ml-alerts"
  pagerduty:
    routing_key: ${PD_ROUTING_KEY}
    severity: warning
  webhook:
    url: https://your-endpoint.com/drift-event

exporter:
  port: 9090
  interval_seconds: 300

Prometheus metrics

Metric Type Description
drift_watchdog_psi Gauge PSI score per feature
drift_watchdog_ks_statistic Gauge KS-test statistic per feature
drift_watchdog_feature_drift Gauge 1 if drift detected, 0 if not
drift_watchdog_overall_drift Gauge 1 if any feature is drifting
drift_watchdog_check_duration_seconds Histogram Time taken per drift check
drift_watchdog_last_check_timestamp Gauge Unix timestamp of last check

All metrics carry feature, model, and baseline_version labels.


Kubernetes deployment

Run drift-watchdog as a sidecar alongside your model serving pod:

# drift-watchdog-sidecar.yaml
containers:
  - name: drift-watchdog
    image: ghcr.io/your-org/drift-watchdog:latest
    args:
      - serve
      - --baseline
      - /baselines/v1.json
      - --data-source
      - $(INFERENCE_LOG_PATH)
      - --port
      - "9090"
    env:
      - name: SLACK_WEBHOOK_URL
        valueFrom:
          secretKeyRef:
            name: drift-watchdog-secrets
            key: slack-webhook
    ports:
      - containerPort: 9090
        name: metrics

Add the pod annotation and Prometheus will scrape it automatically:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9090"

Grafana dashboard

Import the pre-built dashboard from dashboards/drift-watchdog.json.

It includes panels for:

  • Per-feature PSI over time
  • Drift event timeline
  • Feature distribution histograms (current vs baseline)
  • Alert history

Detection methods

Method Best for Threshold guidance
PSI (Population Stability Index) Categorical and continuous features < 0.1 stable, 0.1–0.2 monitor, > 0.2 alert
KS test Continuous distributions p-value < 0.05 signals drift
Jensen-Shannon divergence Probability distributions > 0.1 worth alerting
Wasserstein distance Ordinal/numeric features Domain-dependent
Chi-squared test Categorical features p-value < 0.05

Roadmap

  • v1.0 — CLI, PSI + KS detection, local/S3/GCS baselines, Slack/PagerDuty/webhook alerts, Prometheus exporter, Grafana dashboard, Kubernetes sidecar example, watchdog.yaml config
  • v1.1 — Concept drift detection (output/label distribution monitoring), HTML report export
  • v1.2 — GitHub Actions integration, CI drift gate
  • v1.3 — Multi-model support

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

git clone https://github.com/your-username/drift-watchdog
cd drift-watchdog
pip install -e ".[dev]"
pytest tests/

See CONTRIBUTING.md for guidelines.


License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drift_watchdog-1.1.0.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drift_watchdog-1.1.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file drift_watchdog-1.1.0.tar.gz.

File metadata

  • Download URL: drift_watchdog-1.1.0.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for drift_watchdog-1.1.0.tar.gz
Algorithm Hash digest
SHA256 cf23fcaa52ee8a19ddebc1b44ff4da2da1855a41728174eda7e5634b0d6195cc
MD5 be5a3d32d443e492d95548aa18545850
BLAKE2b-256 c1007cd8d2df309c126fa74eeb064273ca4f1dc5c498e56827e0a39fde47acb1

See more details on using hashes here.

File details

Details for the file drift_watchdog-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: drift_watchdog-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for drift_watchdog-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8bd8bc75d8351afda1663c3637bc7d764ae41adc0a4b88256983a13219f404a3
MD5 8c7d5f3c2d92a5b1f21911a4cee4b9de
BLAKE2b-256 34c076e0af7ceeb49733a146ac3c4e2d89391c4bc269be59bd0f5fa3f89ee882

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page