A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.
Project description
tha-csv-runner
A small Python library that runs a function against every row of a CSV — with a progress bar, required header validation, and structured error capture per row.
Install
pip install tha-csv-runner
Quick start
from tha_csv_runner import ThaCSV
def process(row: dict) -> None:
"""Raise any exception to mark the row as an error. Return value is ignored."""
if not row["email"].endswith("@example.com"):
raise ValueError("invalid email domain")
runner = ThaCSV()
rows = runner.read("Step 1 of 2", "data.csv", ["name", "email"], process)
runner.write("Step 2 of 2", "output.csv")
How it works
- Opens the CSV and validates that all
required_headersare present — raises immediately if any are missing - Iterates every row with a
tqdmprogress bar labelled withdesc - Calls your
validator(row)function — if it raises, that row is marked as an error and processing continues - Appends three columns to every row:
row number,row status, andmessagerow numberstarts at 2 (row 1 is the header)- On success:
row statusandmessageare blank - On error:
row status = "error",message = str(exception)
write()writes all rows (success and error) to a CSV
API
ThaCSV
ThaCSV()
runner.read()
runner.read(
"Step 1 of 2", # progress bar label — pass None to use the filename
"data.csv", # path to input CSV
["a", "b"], # columns that must exist — raises CsvError if missing
validator=my_func, # optional: callable(row: dict) -> None
enrich=True, # optional: set False to skip row number/status/message columns
)
Reads and processes all rows. Returns the rows as a list[dict] (same object as runner.rows).
The validator is designed for offline, in-memory checks — field presence, format, business rules. It runs synchronously on each row; don't use it for API calls or database lookups.
When enrich=False, validator exceptions are re-raised instead of captured.
runner.write()
runner.write(
"Step 2 of 2", # progress bar label — pass None for "Writing {stem} CSV"
output_path="output.csv", # optional — auto-named input_processed_TIMESTAMP.csv if omitted
rows=my_rows, # optional — use these rows instead of runner.rows
sort_by="name", # optional — column name, or list of column names
ascending=True, # optional — bool or list of bools matching sort_by
column_order=["name", "email"], # optional — listed columns come first, rest follow
keep=["name", "email"], # optional — keep only these columns (mutually exclusive with drop)
drop=["row number"], # optional — remove these columns (mutually exclusive with keep)
chunk_size=1000, # optional — split output into files of this many rows
)
Prints ✅ Done! CSV was written to: {path} on completion. Override by setting runner.status_cb = my_fn.
Returns the Path that was written, or a list[Path] when chunk_size is set.
chunk_size
When provided, write() splits the output into multiple files named output_001.csv, output_002.csv, etc. and returns a list[Path].
paths = runner.write("Step 2 of 2", "output.csv", chunk_size=1000)
# ["output_001.csv", "output_002.csv", ...]
Planned
- Encoding support —
read()andwrite()currently assume UTF-8; a future release will add anencoding=parameter for files exported from Excel (cp1252,latin-1, etc.) - Delimiter support — comma is currently assumed; a future release will add a
delimiter=parameter for TSV and other formats
Alternatives
This library is intentionally limited in scope — it handles row-by-row processing with error capture and a progress bar, not data analysis or transformation. For heavier workloads:
- pandas — the standard for CSV processing and in-memory data manipulation; use when you need filtering, grouping, joins, or vectorized operations
- polars — faster alternative to pandas for large files with a cleaner API and lazy evaluation
- csv (stdlib) — raw CSV reading/writing with no dependencies; sufficient when you don't need progress tracking or structured error capture
Choose this library when you need per-row error capture with row status and message columns baked in — pandas and polars process data, they don't track individual row failures.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tha_csv_runner-0.3.1.tar.gz.
File metadata
- Download URL: tha_csv_runner-0.3.1.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e76163ace743a5a643f8cd89da4a8eb86c722088f511028dd59281124f3aeac7
|
|
| MD5 |
58232648b3be6ee33e7482460b5fab4e
|
|
| BLAKE2b-256 |
7987146a094a18728bf98c4ae200373fd53ea3f2ef74ec14d417242e98723aea
|
Provenance
The following attestation bundles were made for tha_csv_runner-0.3.1.tar.gz:
Publisher:
publish.yml on tha-guy-nate/tha-csv-runner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tha_csv_runner-0.3.1.tar.gz -
Subject digest:
e76163ace743a5a643f8cd89da4a8eb86c722088f511028dd59281124f3aeac7 - Sigstore transparency entry: 1989306823
- Sigstore integration time:
-
Permalink:
tha-guy-nate/tha-csv-runner@e90fbf1c9372f9ac626e5be45f063c34c22e3bb7 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/tha-guy-nate
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e90fbf1c9372f9ac626e5be45f063c34c22e3bb7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tha_csv_runner-0.3.1-py3-none-any.whl.
File metadata
- Download URL: tha_csv_runner-0.3.1-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3907a43d45727222939acf2988424f050e298e305873a73a42dcfd57c329b40f
|
|
| MD5 |
e5033d67aea09b6c8d1b546462f591ff
|
|
| BLAKE2b-256 |
0d4190902643df15e5321ba58d0be35dbfcd27d6dbdd51635dc80ce0bd16c811
|
Provenance
The following attestation bundles were made for tha_csv_runner-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on tha-guy-nate/tha-csv-runner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tha_csv_runner-0.3.1-py3-none-any.whl -
Subject digest:
3907a43d45727222939acf2988424f050e298e305873a73a42dcfd57c329b40f - Sigstore transparency entry: 1989306918
- Sigstore integration time:
-
Permalink:
tha-guy-nate/tha-csv-runner@e90fbf1c9372f9ac626e5be45f063c34c22e3bb7 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/tha-guy-nate
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e90fbf1c9372f9ac626e5be45f063c34c22e3bb7 -
Trigger Event:
push
-
Statement type: