Minimalist, zero-config data drift detection for Python
Project description
Driftveil 🌌
Minimalist, zero-config data drift detection for Python. Protect your ML models and data pipelines from silent distribution shifts in 3 lines of code.
⚡ 30-Second Quickstart
import pandas as pd
from driftveil import DriftVeil
# 1. Define your statistical contract on the reference data
pact = DriftVeil(reference_df)
pact.column("age").is_normal(tolerance=0.1)
pact.column("income").is_lognormal()
pact.column("revenue").mean_stable(tolerance=0.15)
pact.column("category").category_freq_stable(chi2_pvalue=0.05)
pact.pair("clicks", "conversions").correlation_above(0.5)
pact.dataset.row_count_stable(tolerance=0.2)
# 2. Enforce the contract on your production data batch
report = pact.enforce(new_df, raise_on_fail=False)
# 3. Print the summary report
print(report.summary())
# ✓ age normal distribution (KS p=0.43)
# ✓ income lognormal (KS p=0.61)
# ✗ revenue mean drift detected (ref=4200 → 6800, +62%)
# ✓ category PSI=0.08 (below 0.20 threshold)
# ✓ clicks/conv. correlation=0.71 (above 0.50)
# ✓ row count 9821 rows (within ±20%)
📊 Comparison: Why Driftveil?
Most data quality tools check only the schema (types, nulls, value ranges). Driftveil checks statistical behavior without the heavy configuration overhead.
| Feature | Driftveil | Great Expectations | Evidently | Deepchecks |
|---|---|---|---|---|
| Schema / null checks | Yes | Yes | Partial | Yes |
| Distribution contracts | Yes | No | Reports only | Reports only |
| Fluent Python API | Yes | YAML/JSON config | Class-heavy | Class-heavy |
| Lightweight (≤3 deps) | Yes | Very heavy | Heavy | Heavy |
| Correlation contracts | Yes | No | No | Partial |
| Saveable pacts (JSON) | Yes | Yes | No | No |
🛠️ Key Design Principles
-
Pure functions in
tests/Every statistical test is a standalone function(ref, new) -> TestResult. No classes, no side effects. This makes adding custom tests extremely easy. -
Lazy fluent builder Contracts accumulate lazily — nothing is computed until
.enforce()or.save()is called. Defining a pact is fast even with 50+ columns. -
Statistics-only serialization Saved pacts store reference statistics (mean, std, bin edges, quantiles, value counts) — not the full reference DataFrame. A saved pact is a tiny JSON file you can commit to git and load in CI without needing the original dataset.
-
Pandas-first, Polars via adapter Automatically converts any supported DataFrame type (like Polars) to a Pandas-compatible interface before running tests.
📖 Full API Capabilities
Column Metrics
# --- Distribution shape ---
pact.column("x").is_normal(tolerance=0.10)
pact.column("x").is_lognormal(tolerance=0.10)
pact.column("x").is_exponential(tolerance=0.15)
pact.column("x").fits_distribution("pareto") # Fits any scipy.stats distribution
# --- Central tendency and spread ---
pact.column("x").mean_stable(tolerance=0.15) # Welch t-test
pact.column("x").variance_stable(tolerance=0.20) # Levene's test
pact.column("x").median_stable(tolerance=0.15) # Mann-Whitney U test
pact.column("x").quantile_stable( # Checks quantiles within tolerance
[0.25, 0.5, 0.75, 0.95], tolerance=0.10
)
# --- Drift detection ---
pact.column("x").psi_below(0.20) # Population Stability Index
pact.column("x").ks_pvalue_above(0.05) # KS two-sample test
pact.column("x").js_divergence_below(0.10) # Jensen-Shannon divergence
# --- Structural & Categorical ---
pact.column("x").null_rate_stable(tolerance=0.05)
pact.column("x").outlier_rate_stable(method="iqr", tolerance=0.10)
pact.column("x").stays_in_range(min=0, max=100)
pact.column("category").no_new_categories()
pact.column("category").category_freq_stable(chi2_pvalue=0.05)
Pair Metrics
pact.pair("price", "qty").correlation_above(0.3)
pact.pair("price", "qty").correlation_below(-0.1)
pact.pair("rev", "cost").ratio_stable("rev/cost", tolerance=0.15)
pact.pair("A", "B").mutual_information_stable(tolerance=0.20)
Dataset Metrics
pact.dataset.row_count_stable(tolerance=0.20)
pact.dataset.no_new_columns()
pact.dataset.no_dropped_columns()
Reporting & CI/CD Integration
report = pact.enforce(new_df, raise_on_fail=False)
# Check status programmatically
if not report.passed:
print(f"Failed checks count: {len(report.failed)}")
# Enforce and raise error in pipelines
report.assert_passed() # raises ContractViolationError
# Save & Load pacts for CI
pact.save("production.pact.json")
pact2 = DriftVeil.load("production.pact.json", reference_df)
# Export interactive report dashboards
report.to_html("drift_report.html")
# Plot drift metrics
fig = report.plot_drifts()
fig.savefig("drift_summary.png")
🧪 Installation
Install Driftveil from PyPI (when published):
pip install driftveil
Or install it with plotting and Polars support:
pip install driftveil[all]
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file driftveil-0.1.0.tar.gz.
File metadata
- Download URL: driftveil-0.1.0.tar.gz
- Upload date:
- Size: 24.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9483ef1b40e11664ed51cae6ae047bc4418b4f9de753ee4fad581f60101ca486
|
|
| MD5 |
a1820bfe60582b0449bc91264a3975cf
|
|
| BLAKE2b-256 |
98c0ba1538725318c25a2a027f2fcd6dd533b25a07523349b71e5da938c4f3c2
|
File details
Details for the file driftveil-0.1.0-py3-none-any.whl.
File metadata
- Download URL: driftveil-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7716737c393fe90e2167af14eba2a5fb229a008bb26a85ebd4f15c19c23afcaa
|
|
| MD5 |
19826fc3424c7b25c472256dada9f92b
|
|
| BLAKE2b-256 |
b73f752cbda58741478dfe3e361fbaf7b0745a1f53abc4cdf8e9523a1ffef25e
|