Skip to main content

CSV preflight validation and batch CSV quality checks that fail fast before pipeline runs.

Project description

csv-quality-gate: CSV preflight validation before pipeline runs

Fail fast before pipeline runs when the input CSV is broken, incomplete, duplicated, or obviously junk.

csv-quality-gate runs batch CSV quality checks and returns pass, warn, or fail before expensive pipeline steps burn time on bad input.

  • "We keep running expensive pipeline steps on broken CSVs."
  • "A batch run fails 20 minutes in because the input CSV was junk."
  • "We only discover missing required columns after the job already started."
  • "Duplicate rows and empty contact fields keep polluting our batch runs."
  • "I want CSV preflight validation, not a whole data platform."

Fastest install:

pip install csv-quality-gate

Fastest real usage:

csv-quality-gate check leads.csv --profile outreach

Exact outcome:

csv-quality-gate: FAIL
file: leads.csv
profile: outreach
rows: 125
  ERROR: missing required column: person_name
  WARNING: duplicate rate 12% exceeds warning threshold 10%

csv-quality-gate preview

It is designed for narrow, honest use as a preflight gate, not as a full data quality platform.

Install

pip install csv-quality-gate

For development:

pip install -e ".[dev]"

Common search-intent use cases

  • CSV preflight validation
  • batch CSV quality checks
  • fail fast before pipeline runs
  • CSV validation before ETL or enrichment
  • detect junk CSV rows before batch jobs

Usage

csv-quality-gate check leads.csv
csv-quality-gate check leads.csv --profile outreach
csv-quality-gate check leads.csv --profile generic --json

Exit codes:

  • 0 pass
  • 1 warnings only
  • 2 fail

Profiles

Built-in profiles:

  • generic
    • validates required columns, empties, duplicates, empty file
  • outreach
    • adds suspicious company-name heuristics for GTM/contact pipelines

Output

csv-quality-gate: FAIL
file: leads.csv
profile: outreach
rows: 125
  ERROR: missing required column: person_name
  WARNING: duplicate rate 12% exceeds warning threshold 10%

JSON mode

csv-quality-gate check leads.csv --json

Limitations

  • Heuristics are intentionally simple.
  • The outreach profile is opinionated and should not be treated as universal truth.
  • The tool validates shape and obvious noise, not semantic correctness.

When To Use It

  • Before enrichment, outreach, ETL, or batch scoring runs
  • In CI for checked-in CSV inputs
  • As a preflight gate before expensive pipeline work

When Not To Use It

  • When you need semantic validation of the data itself
  • When your input is not CSV
  • When you need a full data quality framework with lineage and profiling

More From Hermes Labs

Development

ruff check .
python3 -m pytest -q
python3 -m py_compile src/csv_quality_gate/*.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_quality_gate-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_quality_gate-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file csv_quality_gate-0.1.0.tar.gz.

File metadata

  • Download URL: csv_quality_gate-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for csv_quality_gate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2543b34c13e716bc3113bb0da942a6740b6c307c703e8d73d69941d0d2e8f534
MD5 6539abc7f26726d9d80e173d03bf45aa
BLAKE2b-256 7283645fa76b9957f724dc52ac87d2209f9ffe6f0c66ea2e76347a086e70b506

See more details on using hashes here.

File details

Details for the file csv_quality_gate-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for csv_quality_gate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5106ffdde939e12313f04c8f906268aeb508c55cf70f3df2a29e678e1f81ff24
MD5 ad8071cdcf7e70966eb1647e43119a9d
BLAKE2b-256 1ee7fc221762e4f5610d07bc2cc6ba42fff55ac4a84ae3cd67ef1cf6ea51ecb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page