Skip to main content

ML readiness scoring for tabular datasets

Project description

datascore

CI PyPI Python License

ML readiness scoring for tabular datasets.

Point it at a DataFrame and get a structured report telling you whether your data is ready for ML training — and if not, exactly why and in what order to fix it.

Install

pip install datascore

Usage

import seaborn as sns
from datascore import score

df = sns.load_dataset("titanic")
report = score(df, target="survived")
report.show()

Output

datascore Report

Rows: 891 | Features: 15 | Target: survived
Score: 45/100 — NOT READY

BLOCKERS

- age: 19.9% missing values
- deck: 77.2% missing values

WARNINGS

- Missing values detected: 6.5% overall
- 107 duplicate rows detected
- High skew in sibsp: 3.6891
- High skew in parch: 2.7445
- High skew in fare: 4.7793

INFO

- Outliers in age: 11 rows
- Outliers in sibsp: 46 rows
- Outliers in parch: 213 rows
- Outliers in fare: 116 rows
- No constant features detected
- No infinite values detected
- Class balance: 62/38

Save report to markdown

report.save("report.md")

What it checks

Category Checks
Completeness Missing values, high missing rate per column (>5%)
Integrity Duplicate rows, constant features, infinite values
ML Readiness Class imbalance, target leakage risk, high cardinality categoricals
Distribution Skew per numerical column, outliers via IQR

Scoring

Starts at 100. Each blocker deducts 15 points, each warning deducts 5.

Score Verdict
80–100 READY
50–79 NEEDS WORK
0–49 NOT READY

Why not Great Expectations or Pandera?

Those tools validate data against rules you define upfront.

datascore requires no configuration — it tells you what the problems are without you having to know what to look for first.

Assessment, not validation.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascore-0.2.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datascore-0.2.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file datascore-0.2.0.tar.gz.

File metadata

  • Download URL: datascore-0.2.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascore-0.2.0.tar.gz
Algorithm Hash digest
SHA256 aa4cfd7f3e1dfd1185a021789416fa16f882d427058a829c2ee2487407aba477
MD5 8ae7a7774dff685dad95224c8581f4c3
BLAKE2b-256 d3765cf60c00fac2fe59dddf663ae088973e48c280a80509364405c686483792

See more details on using hashes here.

File details

Details for the file datascore-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: datascore-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascore-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc47b4a34fd8dea24c4bb8a6fc8309ed66eb305d4a5999a18e321f6dd4728141
MD5 1c8e91ca62c35399cc87fba538d677e7
BLAKE2b-256 c450aeac496b804a1be9f37bb031838717239695b45db2f9f4fd2f7d8449b54c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page