Skip to main content

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Project description

mldebug

CI codecov

PyPI Python License

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Why mldebug

Machine learning systems fail silently when data changes.

Common production issues include:

  • feature distribution drift
  • increasing missing values
  • unseen categorical values
  • training vs production mismatch

mldebug provides a unified way to detect these issues before they become model failures.

What it does

mldebug compares:

  • a reference dataset (e.g. training data)
  • a current dataset (e.g. production data)

It runs a suite of checks and returns a structured report of detected issues.

Installation

pip install mldebug

Quick Start

from mldebug import run_checks
import numpy as np

reference = {
    "age": np.array([20, 21, 22]),
    "income": np.array([1000, 1200, 1100]),
    "country": np.array(["ES", "ES", "FR"]),
}

current = {
    "age": np.array([30, 35, 40]),
    "income": np.array([900, 800, 850]),
    "country": np.array(["ES", "DE", "DE"]),
}

schema = {
    "age": "numeric",
    "income": "numeric",
    "country": "categorical",
}

report = run_checks(reference=reference, current=current, schema=schema)

Inspect detected issues

Human-readable output

for issue in report.issues:
    print(issue)
[WARNING] psi_drift - country: PSI drift detected (18.0152)

Summary

print(report.summary())
{
  "total": 1,
  "by_severity": {
    "info": 0,
    "warning": 1,
    "critical": 0
  },
  "status": "issues_detected"
}

Structured output

print(report.to_dict())
{
  "issues": [
    {
      "name": "psi_drift",
      "metric": "psi",
      "severity": "warning",
      "message": "country: PSI drift detected (18.0152)",
      "feature": "country",
      "value": 18.01521528247136,
      "threshold": 0.2
    }
  ]
}

Logs

for line in report.to_logs():
    print(line)
[WARNING] psi_drift - country: PSI drift detected (18.0152)

Supported Checks

mldebug runs a combination of:

Numeric features

Categorical features

Documentation

See documentation pages.

Status

Active development (v0.x). APIs may evolve before v1.0.0.

See CHANGELOG.md for version history and updates.

Development Setup

Requirements

Environment Setup

git clone https://github.com/anpenta/mldebug
cd mldebug
direnv allow

Development Workflow

Tasks are managed via poe (available in the project environment via direnv).

Run tests

poe test

Run linting

poe lint

Check linting

poe lint-check

Run full CI parity checks

poe test-all
poe lint-check-all

CI/CD

CI runs multi-Python version testing and linting. All pull requests must pass the checks before merging.

See CI workflow for details.

Contributing

We welcome contributions.

  1. Clone the repository
  2. Create a feature branch
  3. Make your changes
  4. Ensure all CI checks pass
  5. Open a pull request

Dependency Management

Dependencies are managed using uv and defined in pyproject.toml.

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldebug-0.1.0.tar.gz (76.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mldebug-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file mldebug-0.1.0.tar.gz.

File metadata

  • Download URL: mldebug-0.1.0.tar.gz
  • Upload date:
  • Size: 76.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0dfb5956366ef0964503d7ff27a16466e1902a7847d6e094f7d42c65bbbeb425
MD5 638cbbcfe938ed1d17434773c37a6cdb
BLAKE2b-256 d421c6674a8af4e8da5a07ca602bad0854bc7a84d58844f120d4aebedabd3d3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.1.0.tar.gz:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mldebug-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mldebug-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddb53aa87a72340de9af4c069a96bdc216db104410e59a9db0245417f9cdf43d
MD5 9c9a82b263cffb6296cd414caeecf2a1
BLAKE2b-256 bac9f224c3bfd0a27c8d0fcf968c57c2679ddbd46a9c529019c6e4ca73d9406b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.1.0-py3-none-any.whl:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page