Lightweight ML dataset health checks and recommendations
Project description
mlcheck
mlcheck is a lightweight Python library for quickly auditing datasets and getting actionable insights
before training machine learning models. It detects common data problems (missing values, class imbalance,
high cardinality, potential leakage, etc.), scores dataset health, and produces human-friendly reports and
recommendations.
Features
- Run a suite of data health checks over a pandas DataFrame
- Produce a concise human-readable summary and structured report
- Compute a dataset health score
- Generate recommendations mapping issues to remediation steps
- Export reports as JSON, Markdown, or HTML and write them to disk
Installation
Install from source (development):
python -m pip install -r requirements-dev.txt
python -m pip install -e .
Quickstart
Basic usage with a pandas DataFrame:
import pandas as pd
from mlcheck import inspect
df = pd.DataFrame({
"age": [20, 21, 22, 23, 24],
"label": [0, 0, 0, 0, 1]
})
report = inspect(df, target="label")
report.summary() # print a human readable summary
print(report.health_score()) # numeric health score
print(report.to_markdown()) # markdown formatted report
print(report.to_html()) # HTML formatted report
report.to_file("reports/report.md")
API (high level)
inspect(df, target=None)— run health checks and return anMLReportinstanceMLReport.summary()— print a short human-readable summaryMLReport.to_dict()— get a Python dict representationMLReport.to_json()/to_markdown()/to_html()— renderers for different formatsMLReport.to_file(path, format=None)— write report to disk (infers format from extension)MLReport.download_summary(path="mlcheck_summary.txt")— save the textual summary to diskMLReport.health_score()— compute numeric health score (0–100)MLReport.show_issues()/MLReport.issues()— inspect detected issuesMLReport.recommendations()— get remediation suggestions
Contributing
Contributions are welcome. Please follow these steps:
- Fork the repository and create a feature branch.
- Run tests and linters locally.
- Open a pull request describing your change.
See CONTRIBUTING.md for more details.
License
This project is licensed under the terms in the LICENSE file.
If you'd like, I can help prepare pyproject.toml packaging metadata and a short release checklist for PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlchecklib-0.1.0.tar.gz.
File metadata
- Download URL: mlchecklib-0.1.0.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea3926e07f92da05107dc2473241f830703653563a68c6f0b1619858e5d1336b
|
|
| MD5 |
da3a2dea73a0eaeb851f8a8f60dc4123
|
|
| BLAKE2b-256 |
64ebf3de13624b80d8a31d3e4bef8e950ee11d1d149b5f01388f239d59bf0b1b
|
File details
Details for the file mlchecklib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlchecklib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7296c272283fa1135cf4c283ecd6792588318e1e4a0f19275a0126d3b597f57b
|
|
| MD5 |
9ad8f484cf38988cc9fa6608a019ee46
|
|
| BLAKE2b-256 |
0f789fcfd6d19c862b1eece0416ff4c9d4a801e78a301feca17146e8b4c024f3
|