A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.
Project description
mldebug
A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.
Why mldebug
Machine learning systems fail silently when data changes.
Common production issues include:
- feature distribution drift
- increasing missing values
- unseen categorical values
- training vs production mismatch
mldebug provides a unified way to detect these issues before they become model failures.
What it does
mldebug compares:
- a reference dataset (e.g. training data)
- a current dataset (e.g. production data)
It runs a suite of checks and returns a structured report of detected issues.
Installation
pip install mldebug
Quick Start
from mldebug import run_checks
import numpy as np
reference = {
"age": np.array([20, 21, 22]),
"income": np.array([1000, 1200, 1100]),
"country": np.array(["ES", "ES", "FR"]),
}
current = {
"age": np.array([30, 35, 40]),
"income": np.array([900, 800, 850]),
"country": np.array(["ES", "DE", "DE"]),
}
schema = {
"age": "numeric",
"income": "numeric",
"country": "categorical",
}
report = run_checks(reference=reference, current=current, schema=schema)
Inspect results
for issue in report.issues:
print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)
Summary
print(report.summary())
{
"total": 1,
"by_severity": {
"info": 0,
"warning": 1,
"critical": 0
},
"status": "issues_detected"
}
Structured output
print(report.to_dict())
{
"issues": [
{
"name": "psi_drift",
"metric": "psi",
"severity": "warning",
"message": "country: PSI drift detected (18.0152)",
"feature": "country",
"value": 18.01521528247136,
"threshold": 0.2
}
]
}
Available checks
from mldebug import list_checks
checks = list_checks()
print(checks)
{
"numeric": [
"run_numeric_missing_value_check",
"run_numeric_ks_test_check"
],
"categorical": [
"run_categorical_psi_drift_check"
]
}
Documentation
See documentation pages.
Status
Active development (v0.x). APIs may evolve before v1.0.0.
See CHANGELOG.md for version history.
Development Setup
Requirements
Environment Setup
git clone https://github.com/anpenta/mldebug
cd mldebug
uv sync
Development Workflow
All tasks are managed via poe.
Run tests
uv run poe test
Run linting
uv run poe lint
Check linting
uv run poe lint-check
CI
This project uses CI to ensure:
- code quality (linting and type checking)
- correctness across supported Python versions
- test coverage thresholds
- reproducible builds
- automated publishing on release tags
Local development runs against the active Python environment only.
See CI workflow for details.
Contributing
We welcome contributions.
- Clone the repository
- Create a feature branch
- Make your changes
- Ensure all CI checks pass
- Open a pull request
Dependency Management
Dependencies are managed using uv and defined in pyproject.toml.
For local development:
uv sync
This installs dependencies and updates the environment as needed.
For CI and reproducible environments:
uv sync --frozen
This ensures the environment exactly matches the lock file without modifying it.
Citation
If you use mldebug in your work, please cite this software.
Preferred citation format is available in CITATION.cff or via GitHub's “Cite this repository” button.
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mldebug-0.4.0.tar.gz.
File metadata
- Download URL: mldebug-0.4.0.tar.gz
- Upload date:
- Size: 101.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0702d28e10fa44cb8eb5ea5f9144838d9ca06d2f3c33d282bf69986ed2b5c51f
|
|
| MD5 |
9052869d5b88507c96b6bd6cf3bb92c2
|
|
| BLAKE2b-256 |
4954ab7c6ecd8af84578fe338a696a8fd8ce4cb102d679c8cf9b3e325cbafb6d
|
Provenance
The following attestation bundles were made for mldebug-0.4.0.tar.gz:
Publisher:
ci.yml on anpenta/mldebug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mldebug-0.4.0.tar.gz -
Subject digest:
0702d28e10fa44cb8eb5ea5f9144838d9ca06d2f3c33d282bf69986ed2b5c51f - Sigstore transparency entry: 1440135920
- Sigstore integration time:
-
Permalink:
anpenta/mldebug@a9a477d00b836243ecef0a867a7d98335ec13ed4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/anpenta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@a9a477d00b836243ecef0a867a7d98335ec13ed4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mldebug-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mldebug-0.4.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe2729a418fbc1177e25942d899084b3f10635288e3b32cb9293b0de3991f0e4
|
|
| MD5 |
b1a239c5ff9a71a01ef339cfbad75f3e
|
|
| BLAKE2b-256 |
1b946af2f2a0cd4f22a69a8b62a99afd6d16d3daf2cfd424ec88c91891b02135
|
Provenance
The following attestation bundles were made for mldebug-0.4.0-py3-none-any.whl:
Publisher:
ci.yml on anpenta/mldebug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mldebug-0.4.0-py3-none-any.whl -
Subject digest:
fe2729a418fbc1177e25942d899084b3f10635288e3b32cb9293b0de3991f0e4 - Sigstore transparency entry: 1440135923
- Sigstore integration time:
-
Permalink:
anpenta/mldebug@a9a477d00b836243ecef0a867a7d98335ec13ed4 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/anpenta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@a9a477d00b836243ecef0a867a7d98335ec13ed4 -
Trigger Event:
push
-
Statement type: