Skip to main content

Easy-to-embed drift detectors

Project description

ml3-drift

Easy-to-embed drift detection

ml3-drift is an open source AI library that provides seamless integration of drift detection algorithms into existing Machine Learning and AI frameworks. The purpose is to simplify the implementation process and enable developers to easily incorporate drift detection into their pipelines.

✅ Supported Frameworks

These are the frameworks we currently support. We will add much more in the future! Let us know if you are interested in a specific framework!

Framework How Example
scikit-learn scikit-learn Provides a scikit-learn compatible drift detector that integrates easily into existing scikit-learn pipelines. Mixed data monitoring
huggingface transformers (by huggingface) A minimal wrapper for the Pipeline object that looks like a Pipeline, behaves like a Pipeline but also monitors the output of the wrapped Pipeline.. Works with any feature extraction pipeline, both images and text. Text data monitoring

🛠️ Usage

ml3-drift components are designed to be easily integrated into your existing code. You should be able to use them with minimal changes to your code.

Here is a simple example with scikit-learn:

import logging

import numpy as np
from ml3_drift.sklearn.univariate.ks import KSDriftDetector
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from ml3_drift.callbacks.base import logger_callback
from functools import partial

logger = logging.getLogger(__name__)

# Define your pipeline as usual, but also add a drift detector.
# The detector accepts a list of functions to be called when a drift is detected.
# The first argument of the function is a dataclass containing some information
# about the drift (check it out in ml3_drift/callbacks/models.py).
drift_detector = KSDriftDetector(
    callbacks=[
        partial(
            logger_callback,
            logger=logger,
            level=logging.CRITICAL,
        )
    ]
)

pipeline = Pipeline(
    steps=[
        ("preprocessor", StandardScaler()),
        ("monitoring", drift_detector),
        (
            "model",
            DecisionTreeRegressor(),
        ),
    ]
)

# When fitting the pipeline, the drift detector will
# save the training data as reference data.
# No effect on the model training.
pipeline = pipeline.fit(X_train, y_train)

# When making predictions, the drift detector will
# check if the incoming data is similar to the reference data
# and execute the callback you specified if a drift is detected.
predictions = pipeline.predict(X_test)

The example callback we provided will simply log a message when a drift is detected. For instance:

Drift detected on feature at index 0 by drift detector KSDriftDetector.
 p-value = 2.2027963703339932e-07
 Threshold = 0.005

You can find other examples in the examples folder. For more information, please refer to the documentation.

📦 Installation

ml3-drift is available on PyPI and supports Python versions from 3.10 to 3.13, included.

The integration with the different frameworks are managed through extra dependencies. The plain ml3-drift package comes without any dependency, which means that you need to specify the framework you want to use when installing the package. Otherwise, if you are just experimenting, you can install the package with all the available extras.

You can use pip:

pip install ml3-drift[all] # install all the dependencies
pip install ml3-drift[sklearn] # install only sklearn dependency
pip install ml3-drift[huggingface] # install huggingface dependency

or uv

uv add ml3-drift --all-extras # install all the dependencies
uv add ml3-drift --extra sklearn # install only sklearn dependency
uv add ml3-drift --extra huggingface # install only huggingface dependency

❓ What is drift detection? Why do we need it?

Machine Learning algorithms rely on the assumption that the data used during training comes from the same distribution as the data seen in production.

However, this assumption rarely holds true in the real world, where conditions are dynamic and constantly evolving. These distributional changes, if not addressed properly, can lead to a decline in model performance. This, in turn, can result in inaccurate predictions or estimations, potentially harming the business.

Drift Detection, often referred to as Monitoring, is the process of continuously tracking the performance of a model and the distribution of the data it is operating on. The objective is to quickly detect any changes in data distribution or behavior, so that corrective actions can be taken in a timely manner.

😅 Yet another drift detection library?

Not really. While there are many great open source drift detection libraries out there (nannyml, river, evidently just to name a few), we observed a lack of standardization in the API and misalignments with common ML interfaces. Our goal is to offer known drift detection algorithms behind a single unified API, tailored for relevant ML and AI frameworks such as scikit-learn and huggingface. Hopefully, this won't be the 15th competing standard 😉.

🚀 Contributing

We welcome contributions to ml3-drift! Since we are at a very early stage, we are looking forward to feedbacks, ideas and bug reports. Feel free to open an issue if you have any questions or suggestions.

Local Development

These are the steps you need to follow to set up your local development environment.

We use uv as package manager and just as command runner. Once you have both installed, you can clone the repository and run the following command to set up your development environment:

just dev-sync

The previous command will install all optional dependencies. If you want to install only one of them, run:

just dev-sync-extra extra-to-install
# for instance, just dev-sync-extra sklearn

Make sure you install the pre-commit hooks by running:

just install-hooks

To format your code, lint it and run tests, you can use the following command:

just validate

Notice that tests are run according to the installed libraries. If you don't have scikit-learn installed, all tests related to it will be skipped.

📜 License

This project is licensed under the terms of the Apache License Version 2.0. For more details, please refer to the LICENSE file. All contributions to this project will be distributed under the same license.

👥 Authors

This project was originally developed at ML cube and has been open-sourced to benefit the ML community, from which we deeply welcome contributions.

While ml3-drift provides easy to use and integrated drift detection algorithms, companies requiring enterprise-grade monitoring, advanced analytics and insights capabilities might be interested in trying out our product, the ML cube Platform.

The ML cube Platform (website, docs) is a comprehensive end-to-end ModelOps framework that helps you trust your AI models and GenAI applications by providing several functionalities, such as data and model monitoring, drift root cause analysis, performance-safe model retraining and LLM security. It can both be used during the development phase of your models and in production, to ensure that your models are performing as expected and quickly detect and understand any issues that may arise.

If you'd like to learn more about our product or wonder how we can help you with your AI projects, visit our websites or contact us at info@mlcube.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml3_drift-0.0.4.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml3_drift-0.0.4-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file ml3_drift-0.0.4.tar.gz.

File metadata

  • Download URL: ml3_drift-0.0.4.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.8

File hashes

Hashes for ml3_drift-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ffd2ef5bc8a4df483535aed10fa7d68882026172d8f1e58869e9366130ac25c3
MD5 9c55b55fb0dfc447bef7ad767cdb4851
BLAKE2b-256 5fa29aa132f0ebd3c56e3e98c6133f9f796878d7fc44ad884382c9a65bc7ce67

See more details on using hashes here.

File details

Details for the file ml3_drift-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: ml3_drift-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.8

File hashes

Hashes for ml3_drift-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37d8c6b9070e440db11a992c60b3bc98ad59c1504e7ee44b6b187d11391ac790
MD5 00e57ad4503659cfb4921b6a9aabd810
BLAKE2b-256 77e86f7c0eaf04ad2e702a9c276b88fe9c1e9d0cc3ac38eb1f1102e236549436

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page