Skip to main content

Data drift detector for data

Project description

📊🚦 Driftmon: Data Drift Detection & Monitoring Tool

PyPI Python License

The idea for Driftmon was inspired while reading Fundamentals of Data Engineering, where the importance of monitoring data drift in production systems was emphasized. Driftmon aims to provide a practical, extensible solution for real-world data drift detection, alerting, and monitoring across multiple data platforms.


Driftmon is a robust tool for monitoring, detecting, and alerting on data drift in production datasets and database/data warehouse tables. It helps ensure data quality and model reliability by automatically profiling data, detecting unexpected changes, and notifying stakeholders via email and Slack. Driftmon also provides a dashboard for visualizing drift trends and data changes over time.


🚀 Features

  • Baseline Profiling: Profiles and stores baseline statistics for each column in your tables.
  • Automated Monitoring: Periodically monitors new data and compares it to historical baselines.
  • Drift Detection: Detects drift by comparing hashes and statistical summaries of new data against previously recorded baselines.
  • Multi-Database Support: Works with BigQuery, Snowflake, MySQL, and PostgreSQL across multiple schemas and datasets.
  • Alerting: Sends real-time alerts via Email and Slack when drift is detected.
  • Dashboard: Interactive dashboard (Streamlit) to visualize data distributions, drift events, and trends.
  • Configurable: Easily configure data sources, alerting methods, and monitoring targets via CLI.
  • CLI Interface: Simple command-line interface for setup, monitoring, drift detection, and dashboard launch.

📦 Installation

pip install driftmon

OR

git clone https://github.com/Human-Gechi/data_drift_detector.git
cd data_drift_detector
pip install -e .

To initialize dashbaord without entering interactive CLI, call driftmon-dashboard and streamlit dashbaord comes up

🛠️ CLI Commands

Command Description
configure Set up data source connection and alerting configuration
monitoring Profile baseline statistics and monitor for changes
detect-drift Detect drift and send alerts via email/Slack
dashboard Launch the Streamlit dashboard for visualization
help Show CLI help
exit/quit exit CLI

⚡️ Quick Start for CLI

alt text

  1. Configure Your Connection & Alerts Set up your database/data warehouse connection and alerting preferences:
driftmon configure

You will be prompted for:

  • Connection type (bigquery, snowflake, mysql, postgres)
  • Database credentials and details
  • Tables/schemas/datasets to monitor
  • Alerting method (email, slack, or both)
  • Email/Slack credentials
  1. Baseline Profiling & Monitoring Profile your data and store baseline statistics:
driftmon monitoring

This command computes and saves baseline statistics and hashes for your monitored tables.

  1. Detect Drift & Send Alerts Detect data drift by comparing new data to the baseline. Alerts are sent via your configured channels:
driftmon detect-drift

If drift is detected, notifications are sent to your email and/ slack channel.

  1. Launch the Dashboard Visualize drift events, data distributions, and trends:
driftmon dashboard

This launches a Streamlit dashboard in your browser.

🔔 Alerting

  • Email Alerts: Configure SMTP server, sender, and recipient. Driftmon sends detailed drift reports to your inbox.
  • Slack Alerts: Set up a Slack bot token and channel. Driftmon posts drift notifications directly to your Slack workspace.

🗄️ Supported Data Sources

  • Google BigQuery (multiple datasets)
  • Snowflake (multiple schemas)
  • MySQL
  • PostgreSQL You can monitor multiple tables across different schemas/datasets.

Example arguments for initializing connectors

# PostgreSQL Connector
from driftmon.connector.postgres_connector import PostgresConn

pg_conn = PostgresConn(
    host="your_host",
    port=5432,
    user="your_username",
    password="your_password",
    database="your_database"
)

# MySQL Connector
from driftmon.connector.mysql_connector import MySQLConn

mysql_conn = MySQLConn(
    host="your_host",
    port=3306,
    user="your_username",
    password="your_password",
    database="your_database"
)

# Snowflake Connector
from driftmon.connector.snowflake_connector import SnowflakeConn

sf_conn = SnowflakeConn(
    user="your_username",
    password="your_password",
    account="your_account",
    warehouse="your_warehouse",
    database="your_database",
    schema="your_schema"
)

🧪 Code Samples : Using Driftmon with Context Managers

This example demonstrates best practices using context managers and modular functions for connecting, profiling, drift detection, and sending alerts.

from driftmon.connector.bigquery_connector import BigQueryConn
from driftmon.detect.monitoring import save_profile
from driftmon.detect.drift_detector import detect_drift
from driftmon.alerts.email_alert import Email

def export_data(conn, dataset, tables):
    result = conn.get_group_data(datasets=dataset, table_names=tables)
    for key, df in result:
        df.to_csv(f"{key}.csv", index=False)

def profile_and_detect(conn, dataset, tables):
    save_profile(conn_type="bigquery", connector=conn, datasets=dataset, table_names=tables)
    return detect_drift(table_names=tables)

def send_drift_email(drift_report, sender, password, receiver):
    email = Email(
        sender=sender,
        password=password,
        receiver=receiver,
        drift_report=drift_report
    )
    email.send_email()

tables = "test_table2"
dataset = "1306_data"

with BigQueryConn(
    project="meta-spirit-494622-f5",
    credentials_path="meta-spirit-494622-f5-82b375b04e9e.json"
) as conn:
    export_data(conn, dataset, tables)
    drift_report = profile_and_detect(conn, dataset, tables)
    send_drift_email(
        drift_report,
        sender="sender@gmail.com",
        password="your-password",
        receiver="receiver@gmail.com"
    )

🧪 Example: Using Driftmon Without Context Managers (Using .connect() Method)

This example shows how to use Driftmon by explicitly calling the .connect() method, without context managers for the biquery connector

from driftmon.connector.bigquery_connector import BigQueryConn
from driftmon.detect.monitoring import save_profile
from driftmon.detect.drift_detector import detect_drift
from driftmon.alerts.email_alert import Email

tables = "test_table2"
dataset = "1306_data"
conn = BigQueryConn(
    project="meta-spirit-494622-f5",
    credentials_path="meta-spirit-494622-f5-82b375b04e9e.json"
)
conn.connect()
try:
    result = conn.get_group_data(datasets=dataset, table_names=tables)
    for key, df in result:
        print(key)
        print(df)
except Exception as e:
    print("Error:", e)

save_profile(conn_type="bigquery", connector=conn, datasets=dataset, table_names=tables)
drift_report = detect_drift(table_names=tables)
email = Email(
    sender="sender@gmail.com",
    password="your-password",
    receiver="receiver@gmail.com",
    drift_report=drift_report
)
email.send_email()

🤝 Contributing

Contributions are welcome and appreciated!

To contribute to Driftmon:

  1. Fork the repository on GitHub and clone your fork locally.
  2. Create a new branch for your feature or bugfix:
    git checkout -b feature/your-feature-name
    
  3. Make your changes and add tests if applicable.
  4. Commit your changes with clear messages.
  5. Push your branch to your fork:
    git push origin feature/your-feature-name
    
  6. Open a Pull Request on Github describing your changes

Guidelines to follow when contributing to driftmon

  1. Please ensure your code follows the existing style and passes linting as indicated in the pyproject.toml file
  2. Add or update documentation as needed.
  3. Write tests for new features or bug fixes.
  4. Be respectful and constructive in code reviews and discussions.
  5. If you find a bug or have a feature request, please open an issue.

Thank you for helping improve Driftmon!


👤 Author

Ogechukwu Okoli

GitHub: Human-Gechi

Email: okoliogechi74@gmail.com

Thank you for using Driftmon! If you have suggestions, questions, or want to contribute, feel free to reach out or open an issue. Stay ahead of data drift and keep your data pipelines reliable! 🚦📊

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftmon-0.1.1.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftmon-0.1.1-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file driftmon-0.1.1.tar.gz.

File metadata

  • Download URL: driftmon-0.1.1.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for driftmon-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0cd77816e82205d2c1ce61cdaad492897302d23e48de40e6591be1ace0678466
MD5 5cd0db12eadd209684de3abed1e87df6
BLAKE2b-256 639111254c63dfde91e5a1f6ab99a0f3cb7896e56915d40abdc6dcf4ef1bd877

See more details on using hashes here.

File details

Details for the file driftmon-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: driftmon-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for driftmon-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c10ce58eb7790c567c7907605433b974112e649c6aa4110e9f2271c78b07e73
MD5 dfda9401ab597f95747d4296f91fb345
BLAKE2b-256 6cc845c4474bad68c34f67af2c8b7cc02ca25623643d0d3ec0d126c80d443878

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page