Skip to main content

No project description provided

Project description

drift-detect

drift-detect is a Python package that helps detect distributional drift between two datasets.

It provides functionality for drift detection using univariate statistical tests for both numerical and categorical features. The package also tracks if the distribution of NULL values has changed. Multiple hypothesis testing is handled via Bonferroni and False Discovery Rate (FDR) corrections.

✅ Identify Which Features Have Drifted

drift-detect now also identifies which specific features (numerical or categorical) are statistically different across datasets — allowing for more targeted diagnostics and root-cause analysis of data drift.

📌 Key Features

🔍 Non-Parametric Univariate Statistical Tests:

  • Detects drift in numerical features using the Kolmogorov–Smirnov Test (KS Test)
  • Detects drift in categorical features using the Chi-squared Test of Independence
  • Detects changes in missingness using Fisher’s Exact Test

✅ Multiple Hypothesis Correction:

  • Bonferroni Correction: Controls the family-wise error rate by adjusting significance thresholds.
  • False Discovery Rate (FDR): Uses Benjamini–Hochberg procedure to control the proportion of false positives.

🧠 Feature-Level Drift Insights:

  • Returns a summary table of test statistics, p-values, and corrected p-values
  • Clearly indicates which features show significant distributional drift
  • Helps in interpreting what has changed, not just that something has

🚀 Installation

pip install drift-detect

Usage/Examples

import pandas as pd
from detectdrift import DetectDrift

# Create Sample Datasets
sample_size = 1000
categories = ['A', 'B', 'C']
probabilities = [0.5, 0.3, 0.2]  
data1 = pd.DataFrame({
            'numerical_feature': np.random.normal(0, 1, 1000), 
            'categorical_feature' :  np.random.choice(categories, size=sample_size, p=probabilities)
        })
data2 = pd.DataFrame({
    'numerical_feature': np.random.normal(0, 1, 1000),  
     'categorical_feature' :  np.random.choice(categories, size=sample_size, p=probabilities)
    })

# List columns to be tested
numerical_cols = ['numerical_feature']
categorical_cols = ['categorical_feature']

# Initialize DetectDrift with the data and feature columns
drift_detector = DetectDrift(data1, data2, numerical_cols, categorical_cols)

# Perform drift detection
drift_detected = drift_detector.detect_drift()

# Output the result
if drift_detected:
    print("Distribution Drift Detected!")
else:
    print("No Drift Detected.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drift_detect-0.1.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drift_detect-0.1.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file drift_detect-0.1.2.tar.gz.

File metadata

  • Download URL: drift_detect-0.1.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for drift_detect-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7409ebca051ec36b59b1c7f994b73c8fe9dd789e521f1f172604fc2f8cd86efb
MD5 5b6503a171bc8e2c4d43bf5b1d0dfa5c
BLAKE2b-256 cabcd0173ca1b92f720d24f3ead5e5e3ba488dc6081e1928d3cc48f9308ea890

See more details on using hashes here.

File details

Details for the file drift_detect-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: drift_detect-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for drift_detect-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f9f4b41a4328f8d55780b9bba7e02a3d6bfb9c253e16051f20a21f7f053af307
MD5 9d9a056e0a4d01f0caefb1a286b7034e
BLAKE2b-256 323e0194c7ae92b3b23755a2b4086f11a6092a84d3d07ba9c44ed0e21acd71a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page