Skip to main content

Lightweight ML dataset health checks and recommendations

Project description

mlcheck

mlcheck is a lightweight Python library for quickly auditing datasets and getting actionable insights before training machine learning models. It detects common data problems (missing values, class imbalance, high cardinality, potential leakage, etc.), scores dataset health, and produces human-friendly reports and recommendations.

Features

  • Run a suite of data health checks over a pandas DataFrame
  • Produce a concise human-readable summary and structured report
  • Compute a dataset health score
  • Generate recommendations mapping issues to remediation steps
  • Export reports as JSON, Markdown, or HTML and write them to disk

Installation

Install from source (development):

python -m pip install -r requirements-dev.txt
python -m pip install -e .

Quickstart

Basic usage with a pandas DataFrame:

import pandas as pd
from mlcheck import inspect

df = pd.DataFrame({
	"age": [20, 21, 22, 23, 24],
	"label": [0, 0, 0, 0, 1]
})

report = inspect(df, target="label")

report.summary()                # print a human readable summary
print(report.health_score())     # numeric health score
print(report.to_markdown())      # markdown formatted report
print(report.to_html())          # HTML formatted report
report.to_file("reports/report.md")

API (high level)

  • inspect(df, target=None) — run health checks and return an MLReport instance
  • MLReport.summary() — print a short human-readable summary
  • MLReport.to_dict() — get a Python dict representation
  • MLReport.to_json() / to_markdown() / to_html() — renderers for different formats
  • MLReport.to_file(path, format=None) — write report to disk (infers format from extension)
  • MLReport.download_summary(path="mlcheck_summary.txt") — save the textual summary to disk
  • MLReport.health_score() — compute numeric health score (0–100)
  • MLReport.show_issues() / MLReport.issues() — inspect detected issues
  • MLReport.recommendations() — get remediation suggestions

Contributing

Contributions are welcome. Please follow these steps:

  1. Fork the repository and create a feature branch.
  2. Run tests and linters locally.
  3. Open a pull request describing your change.

See CONTRIBUTING.md for more details.

License

This project is licensed under the terms in the LICENSE file.


If you'd like, I can help prepare pyproject.toml packaging metadata and a short release checklist for PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlchecklib-0.1.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlchecklib-0.1.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file mlchecklib-0.1.0.tar.gz.

File metadata

  • Download URL: mlchecklib-0.1.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for mlchecklib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea3926e07f92da05107dc2473241f830703653563a68c6f0b1619858e5d1336b
MD5 da3a2dea73a0eaeb851f8a8f60dc4123
BLAKE2b-256 64ebf3de13624b80d8a31d3e4bef8e950ee11d1d149b5f01388f239d59bf0b1b

See more details on using hashes here.

File details

Details for the file mlchecklib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlchecklib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for mlchecklib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7296c272283fa1135cf4c283ecd6792588318e1e4a0f19275a0126d3b597f57b
MD5 9ad8f484cf38988cc9fa6608a019ee46
BLAKE2b-256 0f789fcfd6d19c862b1eece0416ff4c9d4a801e78a301feca17146e8b4c024f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page