A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.
Project description
mldebug
A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.
Why mldebug
Machine learning systems fail silently when data changes.
Common production issues include:
- feature distribution drift
- increasing missing values
- unseen categorical values
- training vs production mismatch
mldebug provides a unified way to detect these issues before they become model failures.
What it does
mldebug compares:
- a reference dataset (e.g. training data)
- a current dataset (e.g. production data)
It runs a suite of checks and returns a structured report of detected issues.
Installation
pip install mldebug
Quick Start
from mldebug import run_checks
import numpy as np
reference = {
"age": np.array([20, 21, 22]),
"income": np.array([1000, 1200, 1100]),
"country": np.array(["ES", "ES", "FR"]),
}
current = {
"age": np.array([30, 35, 40]),
"income": np.array([900, 800, 850]),
"country": np.array(["ES", "DE", "DE"]),
}
schema = {
"age": "numeric",
"income": "numeric",
"country": "categorical",
}
report = run_checks(reference=reference, current=current, schema=schema)
Inspect detected issues
Human-readable output
for issue in report.issues:
print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)
Summary
print(report.summary())
{
"total": 1,
"by_severity": {
"info": 0,
"warning": 1,
"critical": 0
},
"status": "issues_detected"
}
Structured output
print(report.to_dict())
{
"issues": [
{
"name": "psi_drift",
"metric": "psi",
"severity": "warning",
"message": "country: PSI drift detected (18.0152)",
"feature": "country",
"value": 18.01521528247136,
"threshold": 0.2
}
]
}
Logs
for line in report.to_logs():
print(line)
[WARNING] psi_drift - country: PSI drift detected (18.0152)
Supported Checks
mldebug runs a combination of:
Numeric features
- Kolmogorov–Smirnov test (KS test)
- missing value rate changes
Categorical features
- Population Stability Index (PSI)
- category distribution changes
Documentation
See documentation pages.
Status
Active development (v0.x). APIs may evolve before v1.0.0.
See CHANGELOG.md for version history and updates.
Development Setup
Requirements
- Ubuntu 24.04.4 (recommended) or WSL
- git
- direnv
Environment Setup
git clone https://github.com/anpenta/mldebug
cd mldebug
direnv allow
Development Workflow
Tasks are managed via poe (available in the project environment via direnv).
Run tests
poe test
Run linting
poe lint
Check linting
poe lint-check
Run full CI parity checks
poe test-all
poe lint-check-all
CI/CD
CI runs multi-Python version testing and linting. All pull requests must pass the checks before merging.
See CI workflow for details.
Contributing
We welcome contributions.
- Clone the repository
- Create a feature branch
- Make your changes
- Ensure all CI checks pass
- Open a pull request
Dependency Management
Dependencies are managed using uv and defined in pyproject.toml.
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mldebug-0.1.2.tar.gz.
File metadata
- Download URL: mldebug-0.1.2.tar.gz
- Upload date:
- Size: 76.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
606891c0eee07324a4ac882c089afbf660e7965e6c7e47fc8170abdb534b08d2
|
|
| MD5 |
fd8591b1f8799563c892c03d0abb37d5
|
|
| BLAKE2b-256 |
900621148f1723e3b44a15c14b9675195df1380e1a83047921a4f9c103872ebd
|
Provenance
The following attestation bundles were made for mldebug-0.1.2.tar.gz:
Publisher:
ci.yml on anpenta/mldebug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mldebug-0.1.2.tar.gz -
Subject digest:
606891c0eee07324a4ac882c089afbf660e7965e6c7e47fc8170abdb534b08d2 - Sigstore transparency entry: 1402574022
- Sigstore integration time:
-
Permalink:
anpenta/mldebug@cf4c8c10682dc93366679a9c600d7dc7d6b601a4 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/anpenta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cf4c8c10682dc93366679a9c600d7dc7d6b601a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mldebug-0.1.2-py3-none-any.whl.
File metadata
- Download URL: mldebug-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d6a61d982e7aba5c8a1bc216caeadcd63a9e36ab2ce36d81e562390ab442c10
|
|
| MD5 |
5e4748d0134f4ab2ea319426cf4f5860
|
|
| BLAKE2b-256 |
599405c3697db27c9e11d3b5eb4c7d1d93ea03b32ba1b208efeefa2f48d88553
|
Provenance
The following attestation bundles were made for mldebug-0.1.2-py3-none-any.whl:
Publisher:
ci.yml on anpenta/mldebug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mldebug-0.1.2-py3-none-any.whl -
Subject digest:
5d6a61d982e7aba5c8a1bc216caeadcd63a9e36ab2ce36d81e562390ab442c10 - Sigstore transparency entry: 1402574138
- Sigstore integration time:
-
Permalink:
anpenta/mldebug@cf4c8c10682dc93366679a9c600d7dc7d6b601a4 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/anpenta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cf4c8c10682dc93366679a9c600d7dc7d6b601a4 -
Trigger Event:
push
-
Statement type: