Skip to main content

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Project description

mldebug

CI codecov

PyPI Python License

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Why mldebug

Machine learning systems fail silently when data changes.

Common production issues include:

  • feature distribution drift
  • increasing missing values
  • unseen categorical values
  • training vs production mismatch

mldebug provides a unified way to detect these issues before they become model failures.

What it does

mldebug compares:

  • a reference dataset (e.g. training data)
  • a current dataset (e.g. production data)

It runs a suite of checks and returns a structured report of detected issues.

Installation

pip install mldebug

Quick Start

from mldebug import run_checks
import numpy as np

reference = {
    "age": np.array([20, 21, 22]),
    "income": np.array([1000, 1200, 1100]),
    "country": np.array(["ES", "ES", "FR"]),
}

current = {
    "age": np.array([30, 35, 40]),
    "income": np.array([900, 800, 850]),
    "country": np.array(["ES", "DE", "DE"]),
}

schema = {
    "age": "numeric",
    "income": "numeric",
    "country": "categorical",
}

report = run_checks(reference=reference, current=current, schema=schema)

Inspect results

for issue in report.issues:
    print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)

Summary

print(report.summary())
{
  "total": 1,
  "by_severity": {
    "info": 0,
    "warning": 1,
    "critical": 0
  },
  "status": "issues_detected"
}

Structured output

print(report.to_dict())
{
  "issues": [
    {
      "name": "psi_drift",
      "metric": "psi",
      "severity": "warning",
      "message": "country: PSI drift detected (18.0152)",
      "feature": "country",
      "value": 18.01521528247136,
      "threshold": 0.2
    }
  ]
}

Available checks

from mldebug import list_checks

checks = list_checks()
print(checks)
{
  "numeric": [
    "run_numeric_missing_value_check",
    "run_numeric_ks_test_check"
  ],
  "categorical": [
    "run_categorical_psi_drift_check"
  ]
}

Documentation

See documentation pages.

Status

Active development (v0.x). APIs may evolve before v1.0.0.

See CHANGELOG.md for version history.

Development Setup

Requirements

Environment Setup

git clone https://github.com/anpenta/mldebug
cd mldebug
uv sync

Development Workflow

All tasks are managed via poe.

Run tests

uv run poe test

Run linting

uv run poe lint

Check linting

uv run poe lint-check

CI

This project uses CI to ensure:

  • code quality (linting and type checking)
  • correctness across supported Python versions
  • test coverage thresholds
  • reproducible builds
  • automated publishing on release tags

Local development runs against the active Python environment only.

See CI workflow for details.

Contributing

We welcome contributions.

  1. Clone the repository
  2. Create a feature branch
  3. Make your changes
  4. Ensure all CI checks pass
  5. Open a pull request

Dependency Management

Dependencies are managed using uv and defined in pyproject.toml.

For local development:

uv sync

This installs dependencies and updates the environment as needed.

For CI and reproducible environments:

uv sync --frozen

This ensures the environment exactly matches the lock file without modifying it.

Citation

If you use mldebug in your work, please cite this software.

Preferred citation format is available in CITATION.cff or via GitHub's “Cite this repository” button.

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldebug-0.4.0.tar.gz (101.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mldebug-0.4.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file mldebug-0.4.0.tar.gz.

File metadata

  • Download URL: mldebug-0.4.0.tar.gz
  • Upload date:
  • Size: 101.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.4.0.tar.gz
Algorithm Hash digest
SHA256 0702d28e10fa44cb8eb5ea5f9144838d9ca06d2f3c33d282bf69986ed2b5c51f
MD5 9052869d5b88507c96b6bd6cf3bb92c2
BLAKE2b-256 4954ab7c6ecd8af84578fe338a696a8fd8ce4cb102d679c8cf9b3e325cbafb6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.4.0.tar.gz:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mldebug-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mldebug-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe2729a418fbc1177e25942d899084b3f10635288e3b32cb9293b0de3991f0e4
MD5 b1a239c5ff9a71a01ef339cfbad75f3e
BLAKE2b-256 1b946af2f2a0cd4f22a69a8b62a99afd6d16d3daf2cfd424ec88c91891b02135

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.4.0-py3-none-any.whl:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page