Skip to main content

Minimalist, zero-config data drift detection for Python

Project description

Driftveil 🌌

Minimalist, zero-config data drift detection for Python. Protect your ML models and data pipelines from silent distribution shifts in 3 lines of code.

Open In Colab


⚡ 30-Second Quickstart

import pandas as pd
from driftveil import DriftVeil

# 1. Define your statistical contract on the reference data
pact = DriftVeil(reference_df)

pact.column("age").is_normal(tolerance=0.1)
pact.column("income").is_lognormal()
pact.column("revenue").mean_stable(tolerance=0.15)
pact.column("category").category_freq_stable(chi2_pvalue=0.05)
pact.pair("clicks", "conversions").correlation_above(0.5)
pact.dataset.row_count_stable(tolerance=0.2)

# 2. Enforce the contract on your production data batch
report = pact.enforce(new_df, raise_on_fail=False)

# 3. Print the summary report
print(report.summary())
# ✓ age            normal distribution   (KS p=0.43)
# ✓ income         lognormal             (KS p=0.61)
# ✗ revenue        mean drift detected   (ref=4200 → 6800, +62%)
# ✓ category       PSI=0.08              (below 0.20 threshold)
# ✓ clicks/conv.   correlation=0.71      (above 0.50)
# ✓ row count      9821 rows             (within ±20%)

📊 Comparison: Why Driftveil?

Most data quality tools check only the schema (types, nulls, value ranges). Driftveil checks statistical behavior without the heavy configuration overhead.

Feature Driftveil Great Expectations Evidently Deepchecks
Schema / null checks Yes Yes Partial Yes
Distribution contracts Yes No Reports only Reports only
Fluent Python API Yes YAML/JSON config Class-heavy Class-heavy
Lightweight (≤3 deps) Yes Very heavy Heavy Heavy
Correlation contracts Yes No No Partial
Saveable pacts (JSON) Yes Yes No No

🛠️ Key Design Principles

  1. Pure functions in tests/ Every statistical test is a standalone function (ref, new) -> TestResult. No classes, no side effects. This makes adding custom tests extremely easy.

  2. Lazy fluent builder Contracts accumulate lazily — nothing is computed until .enforce() or .save() is called. Defining a pact is fast even with 50+ columns.

  3. Statistics-only serialization Saved pacts store reference statistics (mean, std, bin edges, quantiles, value counts) — not the full reference DataFrame. A saved pact is a tiny JSON file you can commit to git and load in CI without needing the original dataset.

  4. Pandas-first, Polars via adapter Automatically converts any supported DataFrame type (like Polars) to a Pandas-compatible interface before running tests.


📖 Full API Capabilities

Column Metrics

# --- Distribution shape ---
pact.column("x").is_normal(tolerance=0.10)
pact.column("x").is_lognormal(tolerance=0.10)
pact.column("x").is_exponential(tolerance=0.15)
pact.column("x").fits_distribution("pareto")       # Fits any scipy.stats distribution

# --- Central tendency and spread ---
pact.column("x").mean_stable(tolerance=0.15)       # Welch t-test
pact.column("x").variance_stable(tolerance=0.20)   # Levene's test
pact.column("x").median_stable(tolerance=0.15)     # Mann-Whitney U test
pact.column("x").quantile_stable(                  # Checks quantiles within tolerance
    [0.25, 0.5, 0.75, 0.95], tolerance=0.10
)

# --- Drift detection ---
pact.column("x").psi_below(0.20)                   # Population Stability Index
pact.column("x").ks_pvalue_above(0.05)             # KS two-sample test
pact.column("x").js_divergence_below(0.10)         # Jensen-Shannon divergence

# --- Structural & Categorical ---
pact.column("x").null_rate_stable(tolerance=0.05)
pact.column("x").outlier_rate_stable(method="iqr", tolerance=0.10)
pact.column("x").stays_in_range(min=0, max=100)
pact.column("category").no_new_categories()
pact.column("category").category_freq_stable(chi2_pvalue=0.05)

Pair Metrics

pact.pair("price", "qty").correlation_above(0.3)
pact.pair("price", "qty").correlation_below(-0.1)
pact.pair("rev", "cost").ratio_stable("rev/cost", tolerance=0.15)
pact.pair("A", "B").mutual_information_stable(tolerance=0.20)

Dataset Metrics

pact.dataset.row_count_stable(tolerance=0.20)
pact.dataset.no_new_columns()
pact.dataset.no_dropped_columns()

Reporting & CI/CD Integration

Python API

report = pact.enforce(new_df, raise_on_fail=False)

# Check status programmatically
if not report.passed:
    print(f"Failed checks count: {len(report.failed)}")

# Enforce and raise error in pipelines
report.assert_passed() # raises ContractViolationError

# Save & Load pacts for CI
pact.save("production.pact.json")
pact2 = DriftVeil.load("production.pact.json", reference_df)

# Export interactive report dashboards (automatically embeds plots inline)
report.to_html("drift_report.html")

# Plot drift metrics
fig = report.plot_drifts()
fig.savefig("drift_summary.png")

# Log directly to active MLflow run
report.to_mlflow()

Command Line Interface (CLI)

Run checks directly on files (CSV, Parquet, or JSON) using saved pact files in bash or scheduler pipelines:

# Exit code is 0 on pass, or 1 on contract failure (when --raise-on-fail is set)
driftveil check new_data.csv --pact production.pact.json --raise-on-fail

🧪 Installation

Install Driftveil from PyPI (when published):

pip install driftveil

Or install it with plotting and Polars support:

pip install driftveil[all]

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftveil-0.2.1.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftveil-0.2.1-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file driftveil-0.2.1.tar.gz.

File metadata

  • Download URL: driftveil-0.2.1.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for driftveil-0.2.1.tar.gz
Algorithm Hash digest
SHA256 431c970b2c0f511e96336cf29798dd173023409a3b7888b6d372b2221fc4233d
MD5 f1a61033bf26b6fcac443d4a2122ee9e
BLAKE2b-256 d2cc4d8602f7015360551a00881dbb559ad6d62d4e4e9087325aa92d8007150c

See more details on using hashes here.

File details

Details for the file driftveil-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: driftveil-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for driftveil-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0ea98ff6726a5392681897a58d2105e9b2e4b5298d8ac8283b69b1801d971ab
MD5 fd130a515abd2f3e7fb28eb55b715e55
BLAKE2b-256 fa776a058f448e1f6627a1133d5a26ddad5e1ab856c2e13cf4277c34cd43c555

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page