CSV preflight validation and batch CSV quality checks that fail fast before pipeline runs.
Project description
csv-quality-gate: CSV preflight validation before pipeline runs
Fail fast before pipeline runs when the input CSV is broken, incomplete, duplicated, or obviously junk.
csv-quality-gate runs batch CSV quality checks and returns pass, warn, or fail before expensive pipeline steps burn time on bad input.
- "We keep running expensive pipeline steps on broken CSVs."
- "A batch run fails 20 minutes in because the input CSV was junk."
- "We only discover missing required columns after the job already started."
- "Duplicate rows and empty contact fields keep polluting our batch runs."
- "I want CSV preflight validation, not a whole data platform."
Fastest install:
pip install csv-quality-gate
Fastest real usage:
csv-quality-gate check leads.csv --profile outreach
Exact outcome:
csv-quality-gate: FAIL
file: leads.csv
profile: outreach
rows: 125
ERROR: missing required column: person_name
WARNING: duplicate rate 12% exceeds warning threshold 10%
It is designed for narrow, honest use as a preflight gate, not as a full data quality platform.
Install
pip install csv-quality-gate
For development:
pip install -e ".[dev]"
Common search-intent use cases
- CSV preflight validation
- batch CSV quality checks
- fail fast before pipeline runs
- CSV validation before ETL or enrichment
- detect junk CSV rows before batch jobs
Usage
csv-quality-gate check leads.csv
csv-quality-gate check leads.csv --profile outreach
csv-quality-gate check leads.csv --profile generic --json
Exit codes:
0pass1warnings only2fail
Profiles
Built-in profiles:
generic- validates required columns, empties, duplicates, empty file
outreach- adds suspicious company-name heuristics for GTM/contact pipelines
Output
csv-quality-gate: FAIL
file: leads.csv
profile: outreach
rows: 125
ERROR: missing required column: person_name
WARNING: duplicate rate 12% exceeds warning threshold 10%
JSON mode
csv-quality-gate check leads.csv --json
Limitations
- Heuristics are intentionally simple.
- The
outreachprofile is opinionated and should not be treated as universal truth. - The tool validates shape and obvious noise, not semantic correctness.
When To Use It
- Before enrichment, outreach, ETL, or batch scoring runs
- In CI for checked-in CSV inputs
- As a preflight gate before expensive pipeline work
When Not To Use It
- When you need semantic validation of the data itself
- When your input is not CSV
- When you need a full data quality framework with lineage and profiling
More From Hermes Labs
- intent-verify: repo intent verification and spec drift checks
Development
ruff check .
python3 -m pytest -q
python3 -m py_compile src/csv_quality_gate/*.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csv_quality_gate-0.1.0.tar.gz.
File metadata
- Download URL: csv_quality_gate-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2543b34c13e716bc3113bb0da942a6740b6c307c703e8d73d69941d0d2e8f534
|
|
| MD5 |
6539abc7f26726d9d80e173d03bf45aa
|
|
| BLAKE2b-256 |
7283645fa76b9957f724dc52ac87d2209f9ffe6f0c66ea2e76347a086e70b506
|
File details
Details for the file csv_quality_gate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: csv_quality_gate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5106ffdde939e12313f04c8f906268aeb508c55cf70f3df2a29e678e1f81ff24
|
|
| MD5 |
ad8071cdcf7e70966eb1647e43119a9d
|
|
| BLAKE2b-256 |
1ee7fc221762e4f5610d07bc2cc6ba42fff55ac4a84ae3cd67ef1cf6ea51ecb8
|