Skip to main content

A Tabular Helper API library that reads and writes CSVs with progress tracking, header validation, and structured per-row errors.

Project description

tha-csv-runner

CI

A small Python library that runs a function against every row of a CSV — with a progress bar, required header validation, and structured error capture per row.

Install

pip install tha-csv-runner

Quick start

from tha_csv_runner import ThaCSV

def process(row: dict) -> None:
    """Raise any exception to mark the row as an error. Return value is ignored."""
    if not row["email"].endswith("@example.com"):
        raise ValueError("invalid email domain")

runner = ThaCSV()

rows = runner.read("Step 1 of 2", "data.csv", ["name", "email"], process)
runner.write("Step 2 of 2", "output.csv")

How it works

  1. Opens the CSV and validates that all required_headers are present — raises immediately if any are missing
  2. Iterates every row with a tqdm progress bar labelled with desc
  3. Calls your validator(row) function — if it raises, that row is marked as an error and processing continues
  4. Appends three columns to every row: row number, row status, and message
    • row number starts at 2 (row 1 is the header)
    • On success: row status and message are blank
    • On error: row status = "error", message = str(exception)
  5. write() writes all rows (success and error) to a CSV

API

ThaCSV

ThaCSV()

runner.read()

runner.read(
    "Step 1 of 2",           # progress bar label — pass None to use the filename
    "data.csv",              # path to input CSV
    ["a", "b"],              # columns that must exist — raises ConfigError if missing
    validator=my_func,       # optional: callable(row: dict) -> None
    enrich=True,             # optional: set False to skip row number/status/message columns
)

Reads and processes all rows. Returns the rows as a list[dict] (same object as runner.rows).

The validator is designed for offline, in-memory checks — field presence, format, business rules. It runs synchronously on each row; don't use it for API calls or database lookups.

When enrich=False, validator exceptions are re-raised instead of captured.

runner.write()

runner.write(
    "Step 2 of 2",                     # progress bar label — pass None for "Writing {stem} CSV"
    output_path="output.csv",          # optional — auto-named input_processed_TIMESTAMP.csv if omitted
    rows=my_rows,                      # optional — use these rows instead of runner.rows
    sort_by="name",                    # optional — column name, or list of column names
    ascending=True,                    # optional — bool or list of bools matching sort_by
    column_order=["name", "email"],    # optional — listed columns come first, rest follow
    keep=["name", "email"],            # optional — keep only these columns (mutually exclusive with drop)
    drop=["row number"],               # optional — remove these columns (mutually exclusive with keep)
    chunk_size=1000,                   # optional — split output into files of this many rows
)

Prints ✅ Done! CSV was written to: {path} on completion. Override by setting runner.status_cb = my_fn.

Returns the Path that was written, or a list[Path] when chunk_size is set.

chunk_size

When provided, write() splits the output into multiple files named output_001.csv, output_002.csv, etc. and returns a list[Path].

paths = runner.write("Step 2 of 2", "output.csv", chunk_size=1000)
# ["output_001.csv", "output_002.csv", ...]

Planned

  • Encoding supportread() and write() currently assume UTF-8; a future release will add an encoding= parameter for files exported from Excel (cp1252, latin-1, etc.)
  • Delimiter support — comma is currently assumed; a future release will add a delimiter= parameter for TSV and other formats

Alternatives

This library is intentionally limited in scope — it handles row-by-row processing with error capture and a progress bar, not data analysis or transformation. For heavier workloads:

  • pandas — the standard for CSV processing and in-memory data manipulation; use when you need filtering, grouping, joins, or vectorized operations
  • polars — faster alternative to pandas for large files with a cleaner API and lazy evaluation
  • csv (stdlib) — raw CSV reading/writing with no dependencies; sufficient when you don't need progress tracking or structured error capture

Choose this library when you need per-row error capture with row status and message columns baked in — pandas and polars process data, they don't track individual row failures.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tha_csv_runner-0.3.0.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tha_csv_runner-0.3.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file tha_csv_runner-0.3.0.tar.gz.

File metadata

  • Download URL: tha_csv_runner-0.3.0.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_csv_runner-0.3.0.tar.gz
Algorithm Hash digest
SHA256 dbc34b5d22062e6fcf0e794f608535903d82d7c81e7123a44b03a3b38cb8e06a
MD5 5efda55913b8e0f86f5bf82566ebadc9
BLAKE2b-256 5ecdcbf2b13dcaf58dd555aea7fcc7fa913b3ea6d3ccc58fe8298b19dd81566c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_csv_runner-0.3.0.tar.gz:

Publisher: publish.yml on tha-guy-nate/tha-csv-runner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tha_csv_runner-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tha_csv_runner-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tha_csv_runner-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9ea0e61a7fff4f281daa3b93edbe2b8a3da6d9ad69c62284e4856ba9511537e
MD5 d4d8b05a2647422e44854c62788942b9
BLAKE2b-256 ac8ba924f21714c1845ad233cdc54b5c62633c542346e9a9ace8a7f4a1cfec7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tha_csv_runner-0.3.0-py3-none-any.whl:

Publisher: publish.yml on tha-guy-nate/tha-csv-runner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page