A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

anpenta

These details have not been verified by PyPI

Project description

mldebug

A lightweight Python package for comparing datasets and detecting unexpected changes in machine learning systems.

Why mldebug

Machine learning systems often degrade silently when input data changes, even when models and code remain unchanged.

These issues are typically caused by changes in input data such as:

feature distribution drift
increasing missing values
unseen categorical values
mismatch between training and production data

mldebug makes these issues visible early by comparing datasets in a lightweight, schema-driven way and detecting unexpected changes before they impact model performance.

When To Use mldebug

Use mldebug for fast validation of ML datasets, especially in CI or pre-deployment checks.

It is a good fit for:

CI/CD validation pipelines
pre-deployment data checks
schema-based comparison between training and production data
lightweight integration into existing ML workflows

Not intended for:

full ML observability platforms
real-time production monitoring
long-term dashboards or alerting infrastructure

What It Does

mldebug compares:

a reference dataset (e.g. training data)
a current dataset (e.g. production data)

It runs a suite of checks and returns a structured report of detected issues.

Installation

pip install mldebug

Quick Start

Example Usage

from mldebug import run_checks, FeatureType
import numpy as np

reference = {
    "age": np.array([20, 21, 22]),
    "income": np.array([1000, 1200, 1100]),
    "country": np.array(["ES", "ES", "FR"]),
}

current = {
    "age": np.array([30, 35, 40]),
    "income": np.array([900, 800, 850]),
    "country": np.array(["ES", "DE", "DE"]),
}

schema = {
    "age": FeatureType.NUMERIC,
    "income": FeatureType.NUMERIC,
    "country": FeatureType.CATEGORICAL,
}

report = run_checks(reference=reference, current=current, schema=schema)

Output Inspection

Inspect Results

for issue in report.issues:
    print(issue)

[WARNING] variance_drift - age: variance drift detected (ratio=25.0000, threshold=2.0)
[WARNING] range_anomaly - age: 3 values outside [20.0000, 22.0000]
[WARNING] variance_drift - income: variance drift detected (ratio=0.2500, threshold=2.0)
[WARNING] range_anomaly - income: 3 values outside [1000.0000, 1200.0000]
[WARNING] psi_drift - country: PSI drift detected (18.0152)
[WARNING] unseen_categories - country: 1 unseen categories detected (e.g. ['DE'])

Summary

print(report.summary())

{
  "total": 6,
  "by_severity": {
    "info": 0,
    "warning": 6,
    "critical": 0
  },
  "status": "issues_detected"
}

Structured Output

print(report.to_dict())

{
  "issues": [
    {
      "name": "variance_drift",
      "metric": "variance_ratio",
      "severity": "warning",
      "message": "age: variance drift detected (ratio=25.0000, threshold=2.0)",
      "feature": "age",
      "value": 25.000000000000004,
      "threshold": 2.0
    },
    {
      "name": "range_anomaly",
      "metric": "out_of_range_count",
      "severity": "warning",
      "message": "age: 3 values outside [20.0000, 22.0000]",
      "feature": "age",
      "value": 3.0,
      "threshold": 0.0
    },
    {
      "name": "variance_drift",
      "metric": "variance_ratio",
      "severity": "warning",
      "message": "income: variance drift detected (ratio=0.2500, threshold=2.0)",
      "feature": "income",
      "value": 0.25,
      "threshold": 2.0
    },
    {
      "name": "range_anomaly",
      "metric": "out_of_range_count",
      "severity": "warning",
      "message": "income: 3 values outside [1000.0000, 1200.0000]",
      "feature": "income",
      "value": 3.0,
      "threshold": 0.0
    },
    {
      "name": "psi_drift",
      "metric": "psi",
      "severity": "warning",
      "message": "country: PSI drift detected (18.0152)",
      "feature": "country",
      "value": 18.01521528247136,
      "threshold": 0.2
    },
    {
      "name": "unseen_categories",
      "metric": "unseen_category_count",
      "severity": "warning",
      "message": "country: 1 unseen categories detected (e.g. ['DE'])",
      "feature": "country",
      "value": 1.0,
      "threshold": 0.0
    }
  ]
}

Scoring

This returns a dataset quality score based only on feature-level issues.

System-level issues (e.g. schema errors, missing features) are reported but excluded from scoring.

print(report.score())

{
  "overall_score": 70.0,
  "feature_scores": {
    "age": 70.0,
    "income": 70.0,
    "country": 70.0
  },
  "status": "warning",
  "system_issue_count": 0
}

Interpretation:

100 = clean data
80-99 = minor issues
50-79 = degraded data quality
< 50 = severe issues

Documentation

See documentation pages.

Status

Active development (v0.x). APIs may evolve before v1.0.0.

See CHANGELOG.md for version history.

Development

Requirements

Setup

git clone https://github.com/anpenta/mldebug
cd mldebug
uv sync

Workflow

All tasks are managed via poe.

Run Tests

uv run poe test

Run Linting

uv run poe lint

Check Linting

uv run poe lint-check

Dependency Management

Dependencies are managed using uv and defined in pyproject.toml.

For local development:

uv sync

This installs dependencies and updates the environment as needed.

For CI and reproducible environments:

uv sync --frozen

This ensures the environment exactly matches the lock file without modifying it.

CI

This project uses CI to ensure:

code quality (linting and type checking)
correctness across supported Python versions
test coverage thresholds
reproducible builds
automated publishing on release tags

Local development runs against the active Python environment only.

See CI workflow for details.

Contributing

We welcome contributions.

Clone the repository
Create a feature branch
Make your changes
Ensure all CI checks pass
Open a pull request

Citation

If you use mldebug in your work, please cite this software.

Preferred citation format is available in CITATION.cff or via GitHub's “Cite this repository” button.

License

See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

anpenta

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

May 11, 2026

This version

0.6.0

May 11, 2026

0.5.0

May 7, 2026

0.4.0

May 5, 2026

0.3.0

May 2, 2026

0.2.0

Apr 30, 2026

0.1.2

Apr 29, 2026

0.1.1

Apr 29, 2026

0.1.0

Apr 29, 2026

0.0.2

Apr 25, 2026

0.0.1

Apr 25, 2026

0.0.0

Apr 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldebug-0.6.0.tar.gz (92.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mldebug-0.6.0-py3-none-any.whl (26.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file mldebug-0.6.0.tar.gz.

File metadata

Download URL: mldebug-0.6.0.tar.gz
Upload date: May 11, 2026
Size: 92.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`8685e392526a7ec98ef9e0c3a91c2f9c223d3975969a3735127b79bc5ae2f72e`
MD5	`95c6f6f1758953c08590380b09f49c32`
BLAKE2b-256	`0fdf0c08af0797b49e0d2f3785be8e3b6ce5865402881777803c5c0b45192b74`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.6.0.tar.gz:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mldebug-0.6.0.tar.gz
- Subject digest: 8685e392526a7ec98ef9e0c3a91c2f9c223d3975969a3735127b79bc5ae2f72e
- Sigstore transparency entry: 1508988437
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: anpenta/mldebug@6950d0afcf89ce73ea06c6ff04a1862c6f205c34
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/anpenta
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@6950d0afcf89ce73ea06c6ff04a1862c6f205c34
- Trigger Event: push

File details

Details for the file mldebug-0.6.0-py3-none-any.whl.

File metadata

Download URL: mldebug-0.6.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mldebug-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c22e8ac2a42437d4cdd3815e0d9305aab57c1070e35cecc2653fe02c895c2594`
MD5	`1d5027a729e47b12b7b29c861a0be643`
BLAKE2b-256	`4f3fcb3b14efbab712b9553d73924d6bd4de6513d8e475f8d863f182c1cb271b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mldebug-0.6.0-py3-none-any.whl:

Publisher: ci.yml on anpenta/mldebug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mldebug-0.6.0-py3-none-any.whl
- Subject digest: c22e8ac2a42437d4cdd3815e0d9305aab57c1070e35cecc2653fe02c895c2594
- Sigstore transparency entry: 1508988526
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: anpenta/mldebug@6950d0afcf89ce73ea06c6ff04a1862c6f205c34
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/anpenta
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@6950d0afcf89ce73ea06c6ff04a1862c6f205c34
- Trigger Event: push

mldebug 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mldebug

Why mldebug

When To Use mldebug

What It Does

Installation

Quick Start

Example Usage

Output Inspection

Inspect Results

Summary

Structured Output

Scoring

Documentation

Status

Development

Requirements

Setup

Workflow

Run Tests

Run Linting

Check Linting

Dependency Management

CI

Contributing

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance