No project description provided
Project description
drift-detect
drift-detect is a Python package that helps detect distributional drift between two datasets.
It provides functionality for drift detection using univariate statistical tests for both numerical and categorical features. The package also tracks if the distribution of NULL values has changed. Multiple hypothesis testing is handled via Bonferroni and False Discovery Rate (FDR) corrections.
✅ Identify Which Features Have Drifted
drift-detect now also identifies which specific features (numerical or categorical) are statistically different across datasets — allowing for more targeted diagnostics and root-cause analysis of data drift.
📌 Key Features
🔍 Non-Parametric Univariate Statistical Tests:
- Detects drift in numerical features using the Kolmogorov–Smirnov Test (KS Test)
- Detects drift in categorical features using the Chi-squared Test of Independence
- Detects changes in missingness using Fisher’s Exact Test
✅ Multiple Hypothesis Correction:
- Bonferroni Correction: Controls the family-wise error rate by adjusting significance thresholds.
- False Discovery Rate (FDR): Uses Benjamini–Hochberg procedure to control the proportion of false positives.
🧠 Feature-Level Drift Insights:
- Returns a summary table of test statistics, p-values, and corrected p-values
- Clearly indicates which features show significant distributional drift
- Helps in interpreting what has changed, not just that something has
🚀 Installation
pip install drift-detect
Usage/Examples
import pandas as pd
from detectdrift import DetectDrift
# Create Sample Datasets
sample_size = 1000
categories = ['A', 'B', 'C']
probabilities = [0.5, 0.3, 0.2]
data1 = pd.DataFrame({
'numerical_feature': np.random.normal(0, 1, 1000),
'categorical_feature' : np.random.choice(categories, size=sample_size, p=probabilities)
})
data2 = pd.DataFrame({
'numerical_feature': np.random.normal(0, 1, 1000),
'categorical_feature' : np.random.choice(categories, size=sample_size, p=probabilities)
})
# List columns to be tested
numerical_cols = ['numerical_feature']
categorical_cols = ['categorical_feature']
# Initialize DetectDrift with the data and feature columns
drift_detector = DetectDrift(data1, data2, numerical_cols, categorical_cols)
# Perform drift detection
drift_detected = drift_detector.detect_drift()
# Output the result
if drift_detected:
print("Distribution Drift Detected!")
else:
print("No Drift Detected.")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drift_detect-0.1.2.tar.gz.
File metadata
- Download URL: drift_detect-0.1.2.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7409ebca051ec36b59b1c7f994b73c8fe9dd789e521f1f172604fc2f8cd86efb
|
|
| MD5 |
5b6503a171bc8e2c4d43bf5b1d0dfa5c
|
|
| BLAKE2b-256 |
cabcd0173ca1b92f720d24f3ead5e5e3ba488dc6081e1928d3cc48f9308ea890
|
File details
Details for the file drift_detect-0.1.2-py3-none-any.whl.
File metadata
- Download URL: drift_detect-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9f4b41a4328f8d55780b9bba7e02a3d6bfb9c253e16051f20a21f7f053af307
|
|
| MD5 |
9d9a056e0a4d01f0caefb1a286b7034e
|
|
| BLAKE2b-256 |
323e0194c7ae92b3b23755a2b4086f11a6092a84d3d07ba9c44ed0e21acd71a8
|