Skip to main content

Generate Python dataclasses, loaders, from CSV/TSV files

Project description

CSV Dataclass Generator

Generate Python dataclasses and loader functions from CSV/TSV files.

Features

  • Automatic Type Inference: Detects int, float, and str types based on a sample of rows.
  • Dialect Detection: Automatically identifies CSV delimiters (including TSV).
  • Name Sanitization: Converts CSV column headers into valid Python identifiers.
  • Iterator Loading: Generates a loading generator that yields dataclass instances, suitable for larger datasets.

Installation

pip install csv-dataclass-gen

Usage

Command Line Interface (CLI)

The package provides a csv-dataclass-gen command.

# Generate code to stdout
csv-dataclass-gen data.csv

# Generate code and save to a directory
csv-dataclass-gen data.csv --output ./generated_models/

# Specify a custom class name and sample size for type inference
csv-dataclass-gen data.csv --name my_custom_data --sample-size 500

CLI Help Message:

Usage: csv-dataclass-gen [OPTIONS] INPUT_FILE

  Generate dataclass and loader code from CSV files.

Options:
  -o, --output TEXT          Output directory for generated files. "-" outputs
                             the result to stdout.
  -s, --sample-size INTEGER  Number of rows to sample for type inference
  -n, --name TEXT            Alternative name for the generated name. Snake
                             case / spaced words is recommended.
  --help                     Show this message and exit.

Arguments:

  • INPUT_FILE: Path to the CSV/TSV file (required).

Example Generated Code

Given a CSV like users.csv:

id,user_name,score
1,alice,95.5
2,bob,88.0

The generator will produce:

from dataclasses import dataclass
from pathlib import Path
from typing import Iterator
import csv

@dataclass
class Users:
    id: int  # Original: "id"
    user_name: str  # Original: "user_name"
    score: float  # Original: "score"

def load_users(csv_path: Path, max_rows: int | None = None, delimiter: str = ',') -> Iterator[Users]:
    # ... loading logic ...
    pass

Development

Dependencies

This project uses uv for dependency management, but it can also be installed using standard tools.

uv sync --all-groups

Running Tests

We use pytest for testing.

uv run pytest

Tests include:

  • tests/test_name_sanitizer.py: Logic for sanitizing names into different formats.
  • tests/test_type_inferrer.py: Logic for detecting data types.
  • tests/test_csv_analyzer.py: Logic for CSV structure analysis.
  • tests/test_code_gen_e2e.py: End-to-end tests that generate code and verify it by reconstructing the original CSV.

License

MIT License – see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_dataclass_gen-0.1.2.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_dataclass_gen-0.1.2-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file csv_dataclass_gen-0.1.2.tar.gz.

File metadata

  • Download URL: csv_dataclass_gen-0.1.2.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for csv_dataclass_gen-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c1befb0fedad2f1283fb5e1cb8494cb5ee213bba8c6e10e8a5693f9968ee746b
MD5 5a01c3bbbdd68832fe4e38103aa70454
BLAKE2b-256 51d06c5e9ba17c49658f1fba57e35b2303cf504beee2329c8675f30c9cd2933b

See more details on using hashes here.

File details

Details for the file csv_dataclass_gen-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for csv_dataclass_gen-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4579da1d75525a7ace51676e39bd300837c83fddfa183a480c3f55df301069f9
MD5 e347f62a95ba9b8f156980c5d12af492
BLAKE2b-256 48fd7116a12c07de5756f7dd12377bc55a48bd34136200ec736a3f11e3d8f30f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page