Skip to main content

A tool for generating dataclass type files for pandas DataFrame rows

Project description

df-types

PyPI version License: MIT

A Python tool for automatically generating dataclass definitions from pandas DataFrames.

Getting Started

pip install df-types
from df_types import DFTypes
import pandas as pd
import random

# Load your data
df = pd.DataFrame({
    "id": list(range(1, 301)),
    "name": ["Alice", "Bob", "Charlie"] * 100,
    "age": [random.randint(18, 100) for _ in range(300)],
    "prefers-pizza": [random.choice([True, False]) for _ in range(300)]  # Not a valid Python identifier, will be normalized
})

# Generate type definitions
dft = DFTypes(df)
dft.write_types()  # Creates typed_df.py

# Creates the following dataclass:
#
# @dataclass(slots=True)
# class TypedRowData:
#     id: int
#     name: Literal['Alice', 'Bob', 'Charlie']
#     age: int
#     prefers_pizza: bool

# Use the generated types
from typed_df import convert, iter_dataclasses

df_typed = convert(df)  # Converts NaNs to None, normalizes column names to Python identifiers
for row_data in iter_dataclasses(df_typed):
    # Each row_data is now a typed dataclass
    print(f"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}")

Features

Supported Types

Feature Example Description
Literal Types Literal["A", "B", "C"] For categorical data with known values
Union Types int | float For columns with mixed numeric types
Optional Types str | None For columns with missing values
Custom Types pd.Timestamp, Decimal Import and use external types
Primitive Types int, str, bool, float Standard Python types

Configuration

from df_types.config import DFTypesConfig

# Basic options
config = DFTypesConfig(
    filename="my_types.py",
    class_name="MyRow",  # Default is "TypedRowData"
    max_literal_values=10  # Increase if you have more categories you want to infer as Literal types
)

dft = DFTypes(df, config=config)
dft.write_types()

from my_types import convert, iter_dataclasses, MyRow

# Use the generated types

Considerations

If a type cannot be imported from the generated file, it will be given the type hint object and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., main.py which imports df_types and provides CustomType that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.

Due to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase sample_middle_rows, sample_head_rows, or sample_tail_rows if you want to sample more rows.

Future Features

  • Support for typed containers (e.g., List[int], Dict[str, int])
  • Support for nested dataclasses
  • More advanced configuration options

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df-types-0.0.2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_types-0.0.2-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file df-types-0.0.2.tar.gz.

File metadata

  • Download URL: df-types-0.0.2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for df-types-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e12a5ed0f2a99a995067b98ea6dfeecafe4369f6812e3c492619f6f3009077b1
MD5 5ca450b6e1c90acf76adae4a444899f7
BLAKE2b-256 0bd0cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300

See more details on using hashes here.

File details

Details for the file df_types-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: df_types-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for df_types-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5f972f64d1422e0085be6146e086c395f40cc25efa953e2a6b9b7fa1bee9f3e7
MD5 8085299189a819b3c0868a411e610a4c
BLAKE2b-256 bc1f8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page