ML readiness scoring for tabular datasets

Project description

datascore

PyPI Python License

ML readiness scoring for tabular datasets.

Point it at a DataFrame and get a structured report telling you whether your data is ready for ML training — and if not, exactly why and in what order to fix it.

Install

pip install datascore

Usage

import seaborn as sns
from datascore import score

df = sns.load_dataset("titanic")
report = score(df, target="survived")
report.show()

Output

datascore Report

Rows: 891 | Features: 15 | Target: survived
Score: 45/100 — NOT READY

BLOCKERS

- age: 19.9% missing values
- deck: 77.2% missing values

WARNINGS

- Missing values detected: 6.5% overall
- 107 duplicate rows detected
- High skew in sibsp: 3.6891
- High skew in parch: 2.7445
- High skew in fare: 4.7793

INFO

- Outliers in age: 11 rows
- Outliers in sibsp: 46 rows
- Outliers in parch: 213 rows
- Outliers in fare: 116 rows
- No constant features detected
- No infinite values detected
- Class balance: 62/38

Save report to markdown

report.save("report.md")

What it checks

Category	Checks
Completeness	Missing values, high missing rate per column (>5%)
Integrity	Duplicate rows, constant features, infinite values
ML Readiness	Class imbalance, target leakage risk, high cardinality categoricals
Distribution	Skew per numerical column, outliers via IQR

Scoring

Starts at 100. Each blocker deducts 15 points, each warning deducts 5.

Score	Verdict
80–100	READY
50–79	NEEDS WORK
0–49	NOT READY

Why not Great Expectations or Pandera?

Those tools validate data against rules you define upfront.

datascore requires no configuration — it tells you what the problems are without you having to know what to look for first.

Assessment, not validation.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.2.0

May 31, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascore-0.2.0.tar.gz (7.3 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datascore-0.2.0-py3-none-any.whl (7.5 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file datascore-0.2.0.tar.gz.

File metadata

Download URL: datascore-0.2.0.tar.gz
Upload date: May 31, 2026
Size: 7.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascore-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`aa4cfd7f3e1dfd1185a021789416fa16f882d427058a829c2ee2487407aba477`
MD5	`8ae7a7774dff685dad95224c8581f4c3`
BLAKE2b-256	`d3765cf60c00fac2fe59dddf663ae088973e48c280a80509364405c686483792`

See more details on using hashes here.

File details

Details for the file datascore-0.2.0-py3-none-any.whl.

File metadata

Download URL: datascore-0.2.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 7.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascore-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc47b4a34fd8dea24c4bb8a6fc8309ed66eb305d4a5999a18e321f6dd4728141`
MD5	`1c8e91ca62c35399cc87fba538d677e7`
BLAKE2b-256	`c450aeac496b804a1be9f37bb031838717239695b45db2f9f4fd2f7d8449b54c`

See more details on using hashes here.

datascore 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

datascore

Install

Usage

Output

Save report to markdown

What it checks

Scoring

Why not Great Expectations or Pandera?

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes