Skip to main content

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Project description

mldebug

CI codecov

PyPI Python License

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Why mldebug

Machine learning systems fail silently when data changes.

Common production issues include:

  • feature distribution drift
  • increasing missing values
  • unseen categorical values
  • training vs production mismatch

mldebug provides a unified way to detect these issues before they become model failures.

What it does

mldebug compares:

  • a reference dataset (e.g. training data)
  • a current dataset (e.g. production data)

It runs a suite of checks and returns a structured report of detected issues.

Installation

pip install mldebug

Quick Start

from mldebug import run_checks
import numpy as np

reference = {
    "age": np.array([20, 21, 22]),
    "income": np.array([1000, 1200, 1100]),
    "country": np.array(["ES", "ES", "FR"]),
}

current = {
    "age": np.array([30, 35, 40]),
    "income": np.array([900, 800, 850]),
    "country": np.array(["ES", "DE", "DE"]),
}

schema = {
    "age": "numeric",
    "income": "numeric",
    "country": "categorical",
}

report = run_checks(reference=reference, current=current, schema=schema)

Inspect results

for issue in report.issues:
    print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)

Summary

print(report.summary())
{
  "total": 1,
  "by_severity": {
    "info": 0,
    "warning": 1,
    "critical": 0
  },
  "status": "issues_detected"
}

Structured output

print(report.to_dict())
{
  "issues": [
    {
      "name": "psi_drift",
      "metric": "psi",
      "severity": "warning",
      "message": "country: PSI drift detected (18.0152)",
      "feature": "country",
      "value": 18.01521528247136,
      "threshold": 0.2
    }
  ]
}

Available checks

mldebug provides runtime introspection of all available checks.

You can view the checks available in your installed version:

from mldebug import list_checks

checks = list_checks()
print(checks)
{
  "numeric": [
    "run_numeric_missing_value_check",
    "run_numeric_ks_test_check"
  ],
  "categorical": [
    "run_categorical_psi_drift_check"
  ]
}

Documentation

See documentation pages.

Status

Active development (v0.x). APIs may evolve before v1.0.0.

See CHANGELOG.md for version history and updates.

Development Setup

Requirements

Environment Setup

git clone https://github.com/anpenta/mldebug
cd mldebug
direnv allow

Development Workflow

Tasks are managed via poe (available in the project environment via direnv).

Run tests

poe test

Run linting

poe lint

Check linting

poe lint-check

Run full test matrix (all Python versions)

poe test-all

Run full CI checks

poe lint-check && poe test-all

CI/CD

CI runs multi-Python version testing and linting. All pull requests must pass the checks before merging.

See CI workflow for details.

Contributing

We welcome contributions.

  1. Clone the repository
  2. Create a feature branch
  3. Make your changes
  4. Ensure all CI checks pass
  5. Open a pull request

Dependency Management

Dependencies are managed using uv and defined in pyproject.toml.

Citation

If you use mldebug in your work, please cite this software.

Preferred citation format is available in CITATION.cff or via GitHub's “Cite this repository” button.

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldebug-0.3.0.tar.gz (80.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mldebug-0.3.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file mldebug-0.3.0.tar.gz.

File metadata

  • Download URL: mldebug-0.3.0.tar.gz
  • Upload date:
  • Size: 80.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d6ebba0657a02941ce2269259085e3130903bf494a64ea5f6b5925fde920b10d
MD5 81f5bcaf6b85113e45768b51731b82df
BLAKE2b-256 3fe2efd26f831a77709bca7d3f5bb4e30344c08e3bebab62ba20013cd241129f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.3.0.tar.gz:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mldebug-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mldebug-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb7a9a51df93485c5b5dfd116504c2c86081278b4b5875107afede6359437e64
MD5 5223f6d92402b91d6ced0d4142e6f83d
BLAKE2b-256 b44e6e5c05d43a3853863ad406c66f45f456fd7512f4a58472115c640ca65554

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.3.0-py3-none-any.whl:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page