Skip to main content

Compare differences between 2 datasets to identify data drift

Project description

Data Drift Detector

PyPI version

This package contains some developmental tools to detect and compare statistical differences between 2 structurally similar pandas dataframes. The intended purpose is to detect data drift - where the statistical properties of an input variable change over time.

We provide a class DataDriftDetector which takes in 2 pandas dataframes and provides a few useful methods to compare and analyze the differences between the 2 datasets.

Installation

Install the package with pip

pip install data-drift-detector

Example Usage

To compare 2 datasets:

from data_drift_detector import DataDriftDetector

# initialize detector
detector = DataDriftDetector(df_prior = df_1, df_post = df_2)

# methods to compare and analyze differences
detector.calculate_drift()
detector.plot_numeric_to_numeric()
detector.plot_categorical_to_numeric()
detector.plot_categorical()
detector.compare_ml_efficacy(target_column="some_target_column")

You may also view an example notebook in the following directory examples/example_usage.ipynb to explore how it may be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-drift-detector-0.0.18.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

data_drift_detector-0.0.18-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file data-drift-detector-0.0.18.tar.gz.

File metadata

  • Download URL: data-drift-detector-0.0.18.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for data-drift-detector-0.0.18.tar.gz
Algorithm Hash digest
SHA256 88df967f0e5ab300288619655d86b724bff33c435b93c6086f59db4ade0caf5b
MD5 521af7e6558589b4bf383bce80265307
BLAKE2b-256 57ec6a9f6c234351deb1d238ff479cf8f030c79c588906cf6a4707e689933b7a

See more details on using hashes here.

File details

Details for the file data_drift_detector-0.0.18-py3-none-any.whl.

File metadata

File hashes

Hashes for data_drift_detector-0.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 f93b1713db9ae61584157a8504224bef084161f123c0f5a5260cbdad5f253448
MD5 1be2ed8a6bf43d3cb6e4bec0277f1dd1
BLAKE2b-256 0737ebbb859a578cc8080c3103ab08670d28002288ef74b9d292fb16d9361cd7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page