Compare differences between 2 datasets to identify data drift
Project description
Data Drift Detector
This package contains some developmental tools to detect and compare statistical differences between 2 structurally similar pandas dataframes. The intended purpose is to detect data drift - where the statistical properties of an input variable change over time.
We provide a class DataDriftDetector
which takes in 2 pandas dataframes and provides a few useful methods to compare and analyze the differences between the 2 datasets.
Installation
Install the package with pip
pip install data-drift-detector
Example Usage
To compare 2 datasets:
from data_drift_detector import DataDriftDetector
# initialize detector
detector = DataDriftDetector(df_prior = df_1, df_post = df_2)
# methods to compare and analyze differences
detector.calculate_drift()
detector.plot_numeric_to_numeric()
detector.plot_categorical_to_numeric()
detector.plot_categorical()
detector.compare_ml_efficacy(target_column="some_target_column")
You may also view an example notebook in the following directory examples/example_usage.ipynb
to explore how it may be used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data-drift-detector-0.0.18.tar.gz
.
File metadata
- Download URL: data-drift-detector-0.0.18.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88df967f0e5ab300288619655d86b724bff33c435b93c6086f59db4ade0caf5b |
|
MD5 | 521af7e6558589b4bf383bce80265307 |
|
BLAKE2b-256 | 57ec6a9f6c234351deb1d238ff479cf8f030c79c588906cf6a4707e689933b7a |
File details
Details for the file data_drift_detector-0.0.18-py3-none-any.whl
.
File metadata
- Download URL: data_drift_detector-0.0.18-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f93b1713db9ae61584157a8504224bef084161f123c0f5a5260cbdad5f253448 |
|
MD5 | 1be2ed8a6bf43d3cb6e4bec0277f1dd1 |
|
BLAKE2b-256 | 0737ebbb859a578cc8080c3103ab08670d28002288ef74b9d292fb16d9361cd7 |