Skip to main content

Enhanced CSV reader and writer with automatic type inference.

Project description

philiprehberger-csv-kit

Tests PyPI version Last updated

Enhanced CSV reader and writer with automatic type inference.

Installation

pip install philiprehberger-csv-kit

Usage

Reading CSV

from philiprehberger_csv_kit import read_csv

rows = read_csv("data.csv")
# [{"name": "Alice", "age": 30, "score": 9.5}, ...]

Values are automatically cast to int, float, bool, or None. Disable with typed=False:

rows = read_csv("data.csv", typed=False)
# [{"name": "Alice", "age": "30", "score": "9.5"}, ...]

Writing CSV

from philiprehberger_csv_kit import write_csv

rows = [
    {"name": "Alice", "age": 30, "score": 9.5},
    {"name": "Bob", "age": 25, "score": 8.0},
]

write_csv("output.csv", rows)
write_csv("output.csv", rows, columns=["name", "age"])  # select columns

Streaming large files

from philiprehberger_csv_kit import stream_csv

for chunk in stream_csv("large.csv", chunk_size=500):
    for row in chunk:
        process(row)

Column statistics

from philiprehberger_csv_kit import column_stats

stats = column_stats("data.csv")
# {"age": {"min": 25, "max": 30, "unique": 2, "nulls": 0, "count": 2}, ...}

# Analyse specific columns only
stats = column_stats("data.csv", columns=["age", "score"])

Dialect detection

from philiprehberger_csv_kit import detect_dialect

# Detect from a file
result = detect_dialect("data.tsv")
print(result.delimiter)   # "\t"
print(result.quotechar)   # '"'

# Detect from a raw text sample
result = detect_dialect("name;age;score\nAlice;30;9.5\n")
print(result.delimiter)   # ";"

Column data quality

from philiprehberger_csv_kit import read_csv, column_quality

rows = read_csv("data.csv")
quality = column_quality(rows, "email")
print(quality.completeness)      # 87.5  (percentage of non-null values)
print(quality.cardinality_ratio)  # 0.95  (unique values / total rows)
print(quality.null_count)         # 2

Transformation pipeline

from philiprehberger_csv_kit import read_csv, CsvPipeline

rows = read_csv("employees.csv")

result = (
    CsvPipeline(rows)
    .filter(lambda r: r["age"] > 18)
    .map_column("name", str.upper)
    .sort_by("age")
    .to_list()
)

# Group by department
groups = (
    CsvPipeline(rows)
    .filter(lambda r: r["active"] is True)
    .group_by("department")
)
# {"Engineering": [...], "Sales": [...]}

Type inference

from philiprehberger_csv_kit import infer_types

raw = [{"val": "42"}, {"val": "3.14"}, {"val": "true"}, {"val": ""}]
typed = infer_types(raw)
# [{"val": 42}, {"val": 3.14}, {"val": True}, {"val": None}]

API

Function / Class Description
read_csv(path, typed=True, encoding="utf-8") Read CSV file, return list of dicts. Infers types when typed=True.
write_csv(path, rows, columns=None, encoding="utf-8") Write list of dicts to CSV. Optional column filter.
stream_csv(path, chunk_size=1000, encoding="utf-8") Generator yielding chunks of row dicts for memory-efficient reading.
column_stats(path, columns=None) Compute per-column stats: min, max, unique, nulls, count.
infer_types(rows) Cast string values to int, float, bool, or None where possible.
detect_dialect(filepath_or_sample) Detect CSV delimiter, quotechar, and formatting from a file or text sample. Returns DialectResult.
column_quality(rows, column) Score column data quality: completeness %, cardinality ratio, null count. Returns QualityResult.
CsvPipeline(rows) Chainable pipeline with .filter(), .map_column(), .add_column(), .rename_column(), .select_columns(), .sort_by(), .group_by(), .head(), .tail(), .to_list(), .count(), .first().

Development

pip install -e .
python -m pytest tests/ -v

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

philiprehberger_csv_kit-0.3.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

philiprehberger_csv_kit-0.3.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file philiprehberger_csv_kit-0.3.1.tar.gz.

File metadata

  • Download URL: philiprehberger_csv_kit-0.3.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for philiprehberger_csv_kit-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8b0eec0ba1dd81069f2c404bd309911bf289a7547b9e50373cce0563f6c367bd
MD5 2bcf69e8289df2ae71bc4578321cc682
BLAKE2b-256 914e99519e8dd7eab9b4cb7eb55295ff09a88806596b5f3d4e8f61faa0b0d275

See more details on using hashes here.

File details

Details for the file philiprehberger_csv_kit-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for philiprehberger_csv_kit-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b44e27c0e2a6ab1777ed5c8d0cdd7de0c3725267adac12e79b92adac23cd4bdb
MD5 e8e34ec59c0ce47ce6c0fac5209c3373
BLAKE2b-256 e96cb379c168e975064f6cb6e6e8914679014a35ea07a1b3f9e3c57bd9a5cb8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page