preflight-data

Dataset readiness checker for machine learning — run a pre-flight checklist on any DataFrame before training.

These details have not been verified by PyPI

Project links

Project description

Preflight banner

Dataset readiness checks for ML pipelines. Use preflight to catch data blockers before training and deployment.

Why preflight

Runs fast checks for data quality, target risk, schema/type issues, split integrity, and statistical anomalies.
Produces machine-readable findings and a CI gate decision (PASS/FAIL).
Keeps output stable with schema versioning for downstream tooling.

Quickstart

Install:

pip install preflight-data

Python API:

import pandas as pd
import preflight

df = pd.read_csv("data.csv")
report = preflight.run(df, target="churn", profile="ci-balanced")

print(report)

CLI:

preflight run data.csv --target churn --profile ci-balanced --format text

Example output:

Preflight Run Report
────────────────────────────────────────────────────────
Gate: PASS
Heuristic Score: 97.0/100
Profile: ci-balanced
Dataset: 1000/1000 rows analyzed across 6 columns
Target: churn
Summary: 8 info, 1 warn, 0 error, 0 critical

Gate reasons:
- No findings met fail conditions for this profile

Findings:
- [WARN] completeness.missingness: Overall missingness is 1.8% (108 cells missing across dataset). (confidence=0.95)
- [INFO] duplicates.exact: No exact duplicate rows detected. (confidence=0.95)
- [INFO] balance.class_imbalance: Class distribution is within configured tolerance. (confidence=0.90)

HTML report preview

If you want to see what the generated HTML report looks like, open:

The notebook contains rendered report.to_html() output cells.

Core model

finding: one detected issue or advisory signal
severity: info | warn | error | critical
gate: policy decision based on severities (PASS | FAIL)
score: heuristic summary for trend/comparison, not statistical truth

Score guidance:

Good for rough trend tracking across runs
Not a probability of model success

Common workflows

Dataset readiness (single table)

report = preflight.run(df, target="churn", profile="ci-balanced")

Split integrity (train/validation or train/test)

split_report = preflight.run_split(train_df, valid_df, profile="ci-balanced")

Policy profiles

Built-in profiles:

exploratory: permissive, useful in notebooks
ci-balanced: practical CI default
ci-strict: highest sensitivity for blocking conditions

Example:

preflight run data.csv --target churn --profile ci-strict --format json

--fail-on override:

preflight run data.csv --target churn --profile ci-balanced --fail-on error,critical

Policy argument rules:

Use either --profile or --policy-file (mutually exclusive).
--fail-on is only supported with --profile.
Invalid policy/config files fail fast at load time.

CLI reference

# Recommended policy-first commands
preflight run data.csv --target churn --profile ci-balanced --format json
preflight run-split train.csv test.csv --profile ci-balanced --format markdown

# Optional artifacts
preflight run data.csv --target churn --format text --output report.txt --output-html report.html

# Compare against baseline JSON report
preflight compare current.json baseline.json --max-score-drop 3 --fail-on-new-error

# Suppressions
preflight suppress add --file suppressions.json --check-id leakage.high_correlation --reason "known safe"
preflight suppress list --file suppressions.json
preflight suppress validate --file suppressions.json --fail-on-expired

HTML output

CLI:

preflight run data.csv --target churn --profile ci-balanced --format text --output-html report.html

Python:

report = preflight.run(df, target="churn", profile="ci-balanced")
html = report.to_html()
with open("report.html", "w", encoding="utf-8") as f:
    f.write(html)

This creates a shareable HTML report you can attach to CI artifacts, docs, or review tickets.

Exit codes:

0: gate pass
2: gate fail or explicit CLI validation failure

Output schema contract

RunReport.to_dict() includes stable contract keys:

schema_version
run
dataset
gate
score
summary
findings

Per-finding payload includes evidence and explainability fields:

check_id, title, domain, severity, suppressed
suggested_action, docs_url
evidence.metrics, evidence.threshold, evidence.samples

Examples

Realistic workflow notebook:
- simple_example.ipynb
Public dataset demo notebook:
- preflight_public_datasets_demo.ipynb
Script demo:
- public_datasets_demo.py

Legacy compatibility

Legacy check(...) and check_split(...) APIs are still available for compatibility, but run(...) and run_split(...) are recommended for policy-first workflows.

Migration status: the policy-first runner now uses native checks for class balance, completeness, leakage, duplicates, distributional health, correlations, and types. Legacy APIs remain supported during migration.

Compatibility namespace:

preflight.legacy.check(...)
preflight.legacy.check_split(...)
preflight.legacy.Report

Development

make env
conda activate preflight
make install-dev
make test
make lint
make typecheck
make build

Supported versions

Python: 3.9-3.13
pandas: >=1.3
numpy: >=1.21

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 28, 2026

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preflight_data-0.1.1.tar.gz (74.6 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

preflight_data-0.1.1-py3-none-any.whl (79.9 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file preflight_data-0.1.1.tar.gz.

File metadata

Download URL: preflight_data-0.1.1.tar.gz
Upload date: Mar 28, 2026
Size: 74.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for preflight_data-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b51183727dd377fa7dd4f11a8c7fef71ec769d7c23c9ea7ea2d2ac8edd082453`
MD5	`f08e4e8a575d3a843fb009f7029955cc`
BLAKE2b-256	`25a949915387cb0cadb55960de77c2f2e176cd9d11cd93bf1c58e807bdb02eb6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for preflight_data-0.1.1.tar.gz:

Publisher: ci.yml on ryan-wolbeck/preflight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: preflight_data-0.1.1.tar.gz
- Subject digest: b51183727dd377fa7dd4f11a8c7fef71ec769d7c23c9ea7ea2d2ac8edd082453
- Sigstore transparency entry: 1190505183
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: ryan-wolbeck/preflight@d7dbf1dd625146222b4ba6a0eb06fefc7e11b960
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/ryan-wolbeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d7dbf1dd625146222b4ba6a0eb06fefc7e11b960
- Trigger Event: release

File details

Details for the file preflight_data-0.1.1-py3-none-any.whl.

File metadata

Download URL: preflight_data-0.1.1-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 79.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for preflight_data-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57e24d22b4a7fc7aeb85b81417fb14667ab556214df7db4399ff3a3777d198b2`
MD5	`cf873c6a601cb3592ec671c66f52f92d`
BLAKE2b-256	`c6f851827e1d8d30c1ba07b187b08fcb1da2b16371dd10d401e41a736c9bd433`

See more details on using hashes here.

Provenance

The following attestation bundles were made for preflight_data-0.1.1-py3-none-any.whl:

Publisher: ci.yml on ryan-wolbeck/preflight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: preflight_data-0.1.1-py3-none-any.whl
- Subject digest: 57e24d22b4a7fc7aeb85b81417fb14667ab556214df7db4399ff3a3777d198b2
- Sigstore transparency entry: 1190505187
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: ryan-wolbeck/preflight@d7dbf1dd625146222b4ba6a0eb06fefc7e11b960
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/ryan-wolbeck
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d7dbf1dd625146222b4ba6a0eb06fefc7e11b960
- Trigger Event: release

preflight-data 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why preflight

Quickstart

HTML report preview

Core model

Common workflows

Dataset readiness (single table)

Split integrity (train/validation or train/test)

Policy profiles

CLI reference

HTML output

Output schema contract

Examples

Legacy compatibility

Development

Supported versions

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance