Skip to main content

A lightweight data-quality profiler and CI gate for tabular data.

Project description

framelint

PyPI version Python versions CI codecov License: MIT Typed

A lightweight data-quality profiler and CI gate for tabular data.

framelint scans a pandas DataFrame or a CSV/Parquet file and produces a clear data-quality report — nulls, duplicates, constant columns, likely-ID columns, type inconsistencies, numeric outliers, format violations, and schema drift.

Its standout feature: it doubles as a CI gate. Point it at your data, set thresholds, and it exits non-zero when quality drops — so a bad dataset fails the build instead of silently flowing downstream.


Why this exists

Data pipelines break quietly. A column starts arriving 40% null, an upstream job starts writing numbers as strings, a join silently doubles your rows — and nobody notices until a dashboard looks wrong weeks later. framelint turns those failures into loud, early, automated signals you can drop into CI in one line.

Install

pip install framelint
# Parquet support:
pip install "framelint[parquet]"

Requires Python 3.9+.

30-second quickstart

import framelint

report = framelint.scan("sales.csv")   # or pass a DataFrame
report.summary()                        # pretty console table
print(report.passed)                    # -> True / False

report.to_json("report.json")           # machine-readable
report.to_html("report.html")           # shareable report

Example console output:

framelint  FAILED  rows=1000 cols=6  errors=1 warnings=3 info=1
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Severity ┃ Check            ┃ Column  ┃ Message                               ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ error    │ missingness      │ region  │ Column 'region' is 62.0% null.        │
│ warning  │ duplicates       │ —       │ Found 12 duplicate rows (full-row).   │
│ warning  │ type_consistency │ price   │ Column 'price' holds numbers as ...   │
│ warning  │ outliers         │ amount  │ Column 'amount' has 18 outliers ...   │
│ info     │ cardinality      │ id      │ Column 'id' looks like an identifier. │
└──────────┴──────────────────┴─────────┴───────────────────────────────────────┘

Features

  • Missingness — per-column null counts and rates, with severity thresholds.
  • Duplicate rows — full-row or by a subset of key columns.
  • Constant / zero-variance and all-null columns.
  • Cardinality — likely-identifier and high-cardinality column detection.
  • Type consistency — numbers stored as strings, mixed-type columns.
  • Outliers — numeric outliers via IQR or z-score (configurable).
  • Format validation (opt-in) — email, date/datetime, numeric ranges, regex, and allowed-value sets, per column.
  • Schema drift — save a baseline, then detect added/removed columns, dtype changes, null-rate jumps, and distribution shifts.
  • Severity levels — every finding is info, warning, or error.
  • Pass/fail decision — based on configurable thresholds, for use in CI.
  • Outputs — rich console, dict, JSON, HTML, and Markdown.

CLI

# Scan and write reports
framelint scan sales.csv --html report.html --json report.json

# Fail the build if any error-level finding is present
framelint scan sales.csv --fail-on error

# Save a baseline, then scan a new file for drift
framelint baseline save sales.csv baseline.json
framelint scan new.csv --baseline baseline.json

Exit codes: 0 = passed, 1 = quality failure, 2 = usage error.

Use it in CI to gate data quality

# .github/workflows/data-quality.yml
name: data-quality
on: [push, pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install framelint
      - run: framelint scan data/sales.csv --fail-on error --baseline data/baseline.json

If quality drops below your thresholds, the step exits non-zero and the build fails — no extra glue code required.

Configuration

Thresholds and per-column rules can be set, in increasing order of precedence:

  1. Built-in defaults
  2. [tool.framelint] in pyproject.toml
  3. A standalone TOML file (--config rules.toml)
  4. A dict / Config passed to scan(...)
  5. Individual CLI flags (e.g. --fail-on, --outlier-method)
# pyproject.toml  (or a standalone --config file, same schema)
[tool.framelint]
null_rate_warning = 0.10
null_rate_error = 0.50
duplicate_rate_error = 0.05
outlier_method = "iqr"      # or "zscore"
fail_on = "error"

[tool.framelint.columns.email]
type = "email"

[tool.framelint.columns.age]
min = 0
max = 120
Key Default Meaning
null_rate_warning / null_rate_error 0.10 / 0.50 Null-rate thresholds
duplicate_rate_warning / duplicate_rate_error 0.0 / 0.10 Duplicate-row thresholds
duplicate_subset null Key columns for duplicate detection
id_cardinality_ratio 0.95 Unique-ratio to flag a likely ID
high_cardinality_ratio 0.50 Unique-ratio to flag high cardinality
outlier_method "iqr" iqr or zscore
iqr_factor / zscore_threshold 1.5 / 3.0 Outlier sensitivity
outlier_rate_warning / outlier_rate_error 0.01 / 0.10 Outlier-rate thresholds
drift_mean_shift 3.0 Mean shift (in baseline std) to flag drift
drift_null_rate_increase 0.10 Null-rate jump to flag drift
fail_on "error" Severity at/above which passed is False

Per-column rules ([tool.framelint.columns.<name>]): type (email/date/ datetime), min, max, regex, allowed.

Programmatic API

import framelint

# Baseline + drift
framelint.save_baseline("sales.csv", "baseline.json")
report = framelint.scan("new.csv", baseline="baseline.json")

# Inline configuration
report = framelint.scan(df, config={"fail_on": "warning", "outlier_method": "zscore"})

report.to_dict()        # full machine-readable result
report.to_markdown()    # Markdown string
report.counts_by_severity()

Contributing

Contributions are welcome — see CONTRIBUTING.md and the Code of Conduct. In short:

pip install -e ".[dev]"
ruff check . && ruff format --check .
mypy
pytest

License

MIT © Anoop Ibrampur

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framelint-0.1.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

framelint-0.1.0-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file framelint-0.1.0.tar.gz.

File metadata

  • Download URL: framelint-0.1.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for framelint-0.1.0.tar.gz
Algorithm Hash digest
SHA256 56a9c4dbaa286bd428216e89f662ddd42718dbfcec466caae47444aaf728ff2f
MD5 7254445f5962d3c7952466068d523e7b
BLAKE2b-256 590fc1187412e5fefee838babb8134ca83fe1ee82e3218fb5baf52bce7338459

See more details on using hashes here.

Provenance

The following attestation bundles were made for framelint-0.1.0.tar.gz:

Publisher: publish.yml on AnoopIbrampur/framelint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file framelint-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: framelint-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for framelint-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 144c86bb79cae465521bd0769a09e214a08b1d65dd1c4049503903999779d147
MD5 efbe1f6deda4805978bb72d346a8c899
BLAKE2b-256 ce60f1636bd795f3f7cfee3534c047edfd4e838f3eb62f42398c7a9342d07382

See more details on using hashes here.

Provenance

The following attestation bundles were made for framelint-0.1.0-py3-none-any.whl:

Publisher: publish.yml on AnoopIbrampur/framelint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page