Compare differences between 2 datasets to identify data drift
Project description
Data Drift Detector
This package contains some developmental tools to detect and compare statistical differences between 2 structurally similar pandas dataframes. The intended purpose is to detect data drift - where the statistical properties of an input variable change over time.
We provide a class DataDriftDetector
which takes in 2 pandas dataframes and provides a few useful methods to compare and analyze the differences between the 2 datasets.
Installation
Install the package with pip
pip install data-drift-detector
Example Usage
To compare 2 datasets:
from data_drift_detector import DataDriftDetector
# initialize detector
detector = DataDriftDetector(df_prior = df_1, df_post = df_2)
# methods to compare and analyze differences
detector.calculate_drift()
detector.plot_numeric_to_numeric()
detector.plot_categorical_to_numeric()
detector.plot_categorical()
detector.compare_ml_efficacy(target_column="some_target_column")
You may also view an example notebook in the following directory examples/example_usage.ipynb
to explore how it may be used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data-drift-detector-0.0.17.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bd7cb35dae2ff383c23c960419265734ed0e74aaf26d996017b5fb68e9601e3 |
|
MD5 | 866430ec9acc5b0841a025c2f134d3cc |
|
BLAKE2b-256 | ff29a15399f6486552072a908995d207e96a36d8e26d73fe8b37f344f9bdaaae |
Hashes for data_drift_detector-0.0.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c3dc323a240f024641b481f5c88c113dc3b9a7f6592a3864390b4112f4221b3 |
|
MD5 | d6abb9d5ed5625fae53df728fe1d4ab6 |
|
BLAKE2b-256 | 4eba73418f36f73954dcda71e4d27d25f27d78b19f4e12ba35b1265fdc4d4e34 |