Skip to main content

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Project description

mldebug

CI codecov

PyPI Python License

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Why mldebug

Machine learning systems fail silently when data changes.

Common production issues include:

  • feature distribution drift
  • increasing missing values
  • unseen categorical values
  • training vs production mismatch

mldebug provides a unified way to detect these issues before they become model failures.

What it does

mldebug compares:

  • a reference dataset (e.g. training data)
  • a current dataset (e.g. production data)

It runs a suite of checks and returns a structured report of detected issues.

Installation

pip install mldebug

Quick Start

from mldebug import run_checks
import numpy as np

reference = {
    "age": np.array([20, 21, 22]),
    "income": np.array([1000, 1200, 1100]),
    "country": np.array(["ES", "ES", "FR"]),
}

current = {
    "age": np.array([30, 35, 40]),
    "income": np.array([900, 800, 850]),
    "country": np.array(["ES", "DE", "DE"]),
}

schema = {
    "age": "numeric",
    "income": "numeric",
    "country": "categorical",
}

report = run_checks(reference=reference, current=current, schema=schema)

Inspect results

for issue in report.issues:
    print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)

Summary

print(report.summary())
{
  "total": 1,
  "by_severity": {
    "info": 0,
    "warning": 1,
    "critical": 0
  },
  "status": "issues_detected"
}

Structured output

print(report.to_dict())
{
  "issues": [
    {
      "name": "psi_drift",
      "metric": "psi",
      "severity": "warning",
      "message": "country: PSI drift detected (18.0152)",
      "feature": "country",
      "value": 18.01521528247136,
      "threshold": 0.2
    }
  ]
}

Available checks

mldebug provides runtime introspection of all available checks.

You can view the checks available in your installed version:

from mldebug import list_checks

checks = list_checks()
print(checks)
{
  "numeric": [
    "run_numeric_missing_value_check",
    "run_numeric_ks_test_check"
  ],
  "categorical": [
    "run_categorical_psi_drift_check"
  ]
}

Documentation

See documentation pages.

Status

Active development (v0.x). APIs may evolve before v1.0.0.

See CHANGELOG.md for version history and updates.

Development Setup

Requirements

Environment Setup

git clone https://github.com/anpenta/mldebug
cd mldebug
direnv allow

Development Workflow

Tasks are managed via poe (available in the project environment via direnv).

Run tests

poe test

Run linting

poe lint

Check linting

poe lint-check

Run full CI parity checks

poe test-all
poe lint-check-all

CI/CD

CI runs multi-Python version testing and linting. All pull requests must pass the checks before merging.

See CI workflow for details.

Contributing

We welcome contributions.

  1. Clone the repository
  2. Create a feature branch
  3. Make your changes
  4. Ensure all CI checks pass
  5. Open a pull request

Dependency Management

Dependencies are managed using uv and defined in pyproject.toml.

Citation

If you use mldebug in your work, please cite this software.

Preferred citation format is available in CITATION.cff or via GitHub's “Cite this repository” button.

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldebug-0.2.0.tar.gz (77.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mldebug-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file mldebug-0.2.0.tar.gz.

File metadata

  • Download URL: mldebug-0.2.0.tar.gz
  • Upload date:
  • Size: 77.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6664c2d28d33aa3e6f15bc49ff928099960dd51e0530fc3e24eac9a6bf2aff09
MD5 d840ff51eb5217f6f5fa86efea2503ba
BLAKE2b-256 10c620fba7f2a54547594542f017cfe3af445468591d52e71d74991fcdb83940

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.2.0.tar.gz:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mldebug-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mldebug-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 760496494c617bd5d8dd98f40c85727d8d878edca13d2e5cc653235c56168f4f
MD5 d91162a14f6a2d1f14311d5faa2b4d98
BLAKE2b-256 e0d521f6ebb571692bb2fbad28a2f7351377988fe45809140b68521ce1bb7ae7

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.2.0-py3-none-any.whl:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page