A lightweight data-quality profiler and CI gate for tabular data.
Project description
framelint
A lightweight data-quality profiler and CI gate for tabular data.
framelint scans a pandas DataFrame or a CSV/Parquet file and produces a clear
data-quality report — nulls, duplicates, constant columns, likely-ID columns,
type inconsistencies, numeric outliers, format violations, and schema drift.
Its standout feature: it doubles as a CI gate. Point it at your data, set thresholds, and it exits non-zero when quality drops — so a bad dataset fails the build instead of silently flowing downstream.
Why this exists
Data pipelines break quietly. A column starts arriving 40% null, an upstream job
starts writing numbers as strings, a join silently doubles your rows — and
nobody notices until a dashboard looks wrong weeks later. framelint turns
those failures into loud, early, automated signals you can drop into CI in one
line.
Install
pip install framelint
# Parquet support:
pip install "framelint[parquet]"
Requires Python 3.9+.
30-second quickstart
import framelint
report = framelint.scan("sales.csv") # or pass a DataFrame
report.summary() # pretty console table
print(report.passed) # -> True / False
report.to_json("report.json") # machine-readable
report.to_html("report.html") # shareable report
Example console output:
framelint FAILED rows=1000 cols=6 errors=1 warnings=3 info=1
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Severity ┃ Check ┃ Column ┃ Message ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ error │ missingness │ region │ Column 'region' is 62.0% null. │
│ warning │ duplicates │ — │ Found 12 duplicate rows (full-row). │
│ warning │ type_consistency │ price │ Column 'price' holds numbers as ... │
│ warning │ outliers │ amount │ Column 'amount' has 18 outliers ... │
│ info │ cardinality │ id │ Column 'id' looks like an identifier. │
└──────────┴──────────────────┴─────────┴───────────────────────────────────────┘
Features
- Missingness — per-column null counts and rates, with severity thresholds.
- Duplicate rows — full-row or by a subset of key columns.
- Constant / zero-variance and all-null columns.
- Cardinality — likely-identifier and high-cardinality column detection.
- Type consistency — numbers stored as strings, mixed-type columns.
- Outliers — numeric outliers via IQR or z-score (configurable).
- Format validation (opt-in) — email, date/datetime, numeric ranges, regex, and allowed-value sets, per column.
- Schema drift — save a baseline, then detect added/removed columns, dtype changes, null-rate jumps, and distribution shifts.
- Severity levels — every finding is
info,warning, orerror. - Pass/fail decision — based on configurable thresholds, for use in CI.
- Outputs — rich console,
dict, JSON, HTML, and Markdown.
CLI
# Scan and write reports
framelint scan sales.csv --html report.html --json report.json
# Fail the build if any error-level finding is present
framelint scan sales.csv --fail-on error
# Save a baseline, then scan a new file for drift
framelint baseline save sales.csv baseline.json
framelint scan new.csv --baseline baseline.json
Exit codes: 0 = passed, 1 = quality failure, 2 = usage error.
Use it in CI to gate data quality
# .github/workflows/data-quality.yml
name: data-quality
on: [push, pull_request]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install framelint
- run: framelint scan data/sales.csv --fail-on error --baseline data/baseline.json
If quality drops below your thresholds, the step exits non-zero and the build fails — no extra glue code required.
Configuration
Thresholds and per-column rules can be set, in increasing order of precedence:
- Built-in defaults
[tool.framelint]inpyproject.toml- A standalone TOML file (
--config rules.toml) - A
dict/Configpassed toscan(...) - Individual CLI flags (e.g.
--fail-on,--outlier-method)
# pyproject.toml (or a standalone --config file, same schema)
[tool.framelint]
null_rate_warning = 0.10
null_rate_error = 0.50
duplicate_rate_error = 0.05
outlier_method = "iqr" # or "zscore"
fail_on = "error"
[tool.framelint.columns.email]
type = "email"
[tool.framelint.columns.age]
min = 0
max = 120
| Key | Default | Meaning |
|---|---|---|
null_rate_warning / null_rate_error |
0.10 / 0.50 | Null-rate thresholds |
duplicate_rate_warning / duplicate_rate_error |
0.0 / 0.10 | Duplicate-row thresholds |
duplicate_subset |
null |
Key columns for duplicate detection |
id_cardinality_ratio |
0.95 | Unique-ratio to flag a likely ID |
high_cardinality_ratio |
0.50 | Unique-ratio to flag high cardinality |
outlier_method |
"iqr" |
iqr or zscore |
iqr_factor / zscore_threshold |
1.5 / 3.0 | Outlier sensitivity |
outlier_rate_warning / outlier_rate_error |
0.01 / 0.10 | Outlier-rate thresholds |
drift_mean_shift |
3.0 | Mean shift (in baseline std) to flag drift |
drift_null_rate_increase |
0.10 | Null-rate jump to flag drift |
fail_on |
"error" |
Severity at/above which passed is False |
Per-column rules ([tool.framelint.columns.<name>]): type (email/date/
datetime), min, max, regex, allowed.
Programmatic API
import framelint
# Baseline + drift
framelint.save_baseline("sales.csv", "baseline.json")
report = framelint.scan("new.csv", baseline="baseline.json")
# Inline configuration
report = framelint.scan(df, config={"fail_on": "warning", "outlier_method": "zscore"})
report.to_dict() # full machine-readable result
report.to_markdown() # Markdown string
report.counts_by_severity()
Contributing
Contributions are welcome — see CONTRIBUTING.md and the Code of Conduct. In short:
pip install -e ".[dev]"
ruff check . && ruff format --check .
mypy
pytest
License
MIT © Anoop Ibrampur
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file framelint-0.1.0.tar.gz.
File metadata
- Download URL: framelint-0.1.0.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56a9c4dbaa286bd428216e89f662ddd42718dbfcec466caae47444aaf728ff2f
|
|
| MD5 |
7254445f5962d3c7952466068d523e7b
|
|
| BLAKE2b-256 |
590fc1187412e5fefee838babb8134ca83fe1ee82e3218fb5baf52bce7338459
|
Provenance
The following attestation bundles were made for framelint-0.1.0.tar.gz:
Publisher:
publish.yml on AnoopIbrampur/framelint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
framelint-0.1.0.tar.gz -
Subject digest:
56a9c4dbaa286bd428216e89f662ddd42718dbfcec466caae47444aaf728ff2f - Sigstore transparency entry: 1990425538
- Sigstore integration time:
-
Permalink:
AnoopIbrampur/framelint@84287641dc2b465c6e254749bc54f3aa654b2012 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/AnoopIbrampur
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84287641dc2b465c6e254749bc54f3aa654b2012 -
Trigger Event:
push
-
Statement type:
File details
Details for the file framelint-0.1.0-py3-none-any.whl.
File metadata
- Download URL: framelint-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
144c86bb79cae465521bd0769a09e214a08b1d65dd1c4049503903999779d147
|
|
| MD5 |
efbe1f6deda4805978bb72d346a8c899
|
|
| BLAKE2b-256 |
ce60f1636bd795f3f7cfee3534c047edfd4e838f3eb62f42398c7a9342d07382
|
Provenance
The following attestation bundles were made for framelint-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on AnoopIbrampur/framelint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
framelint-0.1.0-py3-none-any.whl -
Subject digest:
144c86bb79cae465521bd0769a09e214a08b1d65dd1c4049503903999779d147 - Sigstore transparency entry: 1990425622
- Sigstore integration time:
-
Permalink:
AnoopIbrampur/framelint@84287641dc2b465c6e254749bc54f3aa654b2012 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/AnoopIbrampur
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84287641dc2b465c6e254749bc54f3aa654b2012 -
Trigger Event:
push
-
Statement type: