A tool for generating dataclass type files for pandas DataFrame rows
Project description
df-types
A Python tool for automatically generating dataclass definitions from pandas DataFrames.
Getting Started
pip install df-types
from df_types import DFTypes
import pandas as pd
import random
# Load your data
df = pd.DataFrame({
"id": list(range(1, 301)),
"name": ["Alice", "Bob", "Charlie"] * 100,
"age": [random.randint(18, 100) for _ in range(300)],
"prefers-pizza": [random.choice([True, False]) for _ in range(300)] # Not a valid Python identifier, will be normalized
})
# Generate type definitions
dft = DFTypes(df)
dft.write_types() # Creates typed_df.py
# Creates the following dataclass:
#
# @dataclass(slots=True)
# class TypedRowData:
# id: int
# name: Literal['Alice', 'Bob', 'Charlie']
# age: int
# prefers_pizza: bool
# Use the generated types
from typed_df import convert, iter_dataclasses
df_typed = convert(df) # Converts NaNs to None, normalizes column names to Python identifiers
for row_data in iter_dataclasses(df_typed):
# Each row_data is now a typed dataclass
print(f"ID: {row_data.id}, Name: {row_data.name}, Age: {row_data.age}, Prefers Pizza: {row_data.prefers_pizza}")
Features
Supported Types
| Feature | Example | Description |
|---|---|---|
| Literal Types | Literal["A", "B", "C"] |
For categorical data with known values |
| Union Types | int | float |
For columns with mixed numeric types |
| Optional Types | str | None |
For columns with missing values |
| Custom Types | pd.Timestamp, Decimal |
Import and use external types |
| Primitive Types | int, str, bool, float |
Standard Python types |
Configuration
from df_types.config import DFTypesConfig
# Basic options
config = DFTypesConfig(
filename="my_types.py",
class_name="MyRow", # Default is "TypedRowData"
max_literal_values=10 # Increase if you have more categories you want to infer as Literal types
)
dft = DFTypes(df, config=config)
dft.write_types()
from my_types import convert, iter_dataclasses, MyRow
# Use the generated types
Considerations
If a type cannot be imported from the generated file, it will be given the type hint object and a warning will be printed. Most often this occurs because the type is contained in the calling module (e.g., main.py which imports df_types and provides CustomType that is contained in the DataFrame). You can manually move the type definition to another file to avoid this warning.
Due to sampling, if you have a column with a large number of rows and a disproportionate distribution of values, the inferred literals may not include all possible values. You can increase sample_middle_rows, sample_head_rows, or sample_tail_rows if you want to sample more rows.
Future Features
- Support for typed containers (e.g.,
List[int],Dict[str, int]) - Support for nested dataclasses
- More advanced configuration options
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file df-types-0.0.2.tar.gz.
File metadata
- Download URL: df-types-0.0.2.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e12a5ed0f2a99a995067b98ea6dfeecafe4369f6812e3c492619f6f3009077b1
|
|
| MD5 |
5ca450b6e1c90acf76adae4a444899f7
|
|
| BLAKE2b-256 |
0bd0cb7490a7eb29539a3fce9574177e90ba41b706ceb04cc112a99853dfa300
|
File details
Details for the file df_types-0.0.2-py3-none-any.whl.
File metadata
- Download URL: df_types-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f972f64d1422e0085be6146e086c395f40cc25efa953e2a6b9b7fa1bee9f3e7
|
|
| MD5 |
8085299189a819b3c0868a411e610a4c
|
|
| BLAKE2b-256 |
bc1f8f8106e210548f71d43302ff5c3bd8676c8c444179a803c9339b4d359ea4
|