Generate Python dataclasses, loaders, from CSV/TSV files
Project description
CSV Dataclass Generator
Generate Python dataclasses and loader functions from CSV/TSV files.
Features
- Automatic Type Inference: Detects
int,float, andstrtypes based on a sample of rows. - Dialect Detection: Automatically identifies CSV delimiters (including TSV).
- Name Sanitization: Converts CSV column headers into valid Python identifiers.
- Iterator Loading: Generates a loading generator that yields dataclass instances, suitable for larger datasets.
Installation
pip install csv-dataclass-gen
Usage
Command Line Interface (CLI)
The package provides a csv-dataclass-gen command.
# Generate code to stdout
csv-dataclass-gen data.csv
# Generate code and save to a directory
csv-dataclass-gen data.csv --output ./generated_models/
# Specify a custom class name and sample size for type inference
csv-dataclass-gen data.csv --name my_custom_data --sample-size 500
CLI Help Message:
Usage: csv-dataclass-gen [OPTIONS] INPUT_FILE
Generate dataclass and loader code from CSV files.
Options:
-o, --output TEXT Output directory for generated files. "-" outputs
the result to stdout.
-s, --sample-size INTEGER Number of rows to sample for type inference
-n, --name TEXT Alternative name for the generated name. Snake
case / spaced words is recommended.
--help Show this message and exit.
Arguments:
INPUT_FILE: Path to the CSV/TSV file (required).
Example Generated Code
Given a CSV like users.csv:
id,user_name,score
1,alice,95.5
2,bob,88.0
The generator will produce:
from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
import csv
@dataclass
class Users:
id: int # Original: "id"
user_name: str # Original: "user_name"
score: float # Original: "score"
def load_users(csv_path: Path, max_rows: int | None = None, delimiter: str = ',') -> Iterator[Users]:
# ... loading logic ...
pass
Development
Dependencies
This project uses uv for dependency management, but it can also be installed using standard tools.
uv sync --all-groups
Running Tests
We use pytest for testing.
uv run pytest
Tests include:
tests/test_name_sanitizer.py: Logic for sanitizing names into different formats.tests/test_type_inferrer.py: Logic for detecting data types.tests/test_csv_analyzer.py: Logic for CSV structure analysis.tests/test_code_gen_e2e.py: End-to-end tests that generate code and verify it by reconstructing the original CSV.
License
MIT License – see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csv_dataclass_gen-0.1.3.tar.gz.
File metadata
- Download URL: csv_dataclass_gen-0.1.3.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
612aa64d2f264f981d0a82588ed537b484361513d4dd9489369e12eb48c8623c
|
|
| MD5 |
db093d4d20b555c42be31bc8f417d363
|
|
| BLAKE2b-256 |
9e7441a9c919d810c25505c68351d873270676f526bef7ac74201f875de725a1
|
Provenance
The following attestation bundles were made for csv_dataclass_gen-0.1.3.tar.gz:
Publisher:
publish.yml on khwong-c/csv-dataclass-gen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csv_dataclass_gen-0.1.3.tar.gz -
Subject digest:
612aa64d2f264f981d0a82588ed537b484361513d4dd9489369e12eb48c8623c - Sigstore transparency entry: 1357063703
- Sigstore integration time:
-
Permalink:
khwong-c/csv-dataclass-gen@77316b10a15ee2a7b4e2a5798d232cb1cefff781 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/khwong-c
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@77316b10a15ee2a7b4e2a5798d232cb1cefff781 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file csv_dataclass_gen-0.1.3-py3-none-any.whl.
File metadata
- Download URL: csv_dataclass_gen-0.1.3-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
394ebdf1253243920b4c954e1ed2bb8463311ffee643939c3a7173226ec132cb
|
|
| MD5 |
e4ba0caa30dc065dbd8e9c999dc407fd
|
|
| BLAKE2b-256 |
5e73bc69d5c7924a785962612d2267ef376ba46044c4dd1dc36f8e4fef16dc03
|
Provenance
The following attestation bundles were made for csv_dataclass_gen-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on khwong-c/csv-dataclass-gen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csv_dataclass_gen-0.1.3-py3-none-any.whl -
Subject digest:
394ebdf1253243920b4c954e1ed2bb8463311ffee643939c3a7173226ec132cb - Sigstore transparency entry: 1357063709
- Sigstore integration time:
-
Permalink:
khwong-c/csv-dataclass-gen@77316b10a15ee2a7b4e2a5798d232cb1cefff781 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/khwong-c
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@77316b10a15ee2a7b4e2a5798d232cb1cefff781 -
Trigger Event:
workflow_dispatch
-
Statement type: