Skip to main content

Convert between Polars schemas and Python data structures (dataclasses, TypedDicts, namedtuples)

Project description

Rattata

Recursive And Type Transformation Automation for Type Annotations

PyPI version Python Version License Code Style

Convert between Polars schemas and Python data structures (dataclasses, TypedDicts, namedtuples) with ease.

Rattata provides simple, bidirectional conversion functions to transform schemas between Polars and Python's type-annotated structures, supporting all complex types including nested structures and arrays.

โœจ Key Features

  • Bidirectional Conversion: Convert schemas in both directions (Polars โ†” Python)
  • Multiple Schema Formats: Supports pl.Schema, dict[str, pl.DataType], and Iterable[tuple[str, pl.DataType]]
  • Native Polars Integration: Returns pl.Schema objects from from_* functions for seamless DataFrame integration
  • Comprehensive Type Support: Supports all primitive and complex Polars types
  • Nested Structures: Handles deeply nested structs and arrays recursively
  • Type Safety: Clear error messages with custom exceptions for unsupported types
  • Simple API: Functional, stateless functions - easy to use and understand
  • Multiple Output Formats: Support for dataclasses, TypedDicts, and NamedTuples
  • Python 3.8+ Compatible: Works with Python 3.8, 3.9, 3.10, 3.11, and 3.12

๐Ÿ“ฆ Installation

pip install rattata

๐Ÿ“‹ Requirements

  • Python >= 3.8
  • polars >= 0.19.0

๐Ÿš€ Quick Start

Converting Polars Schema to Dataclass

Rattata supports three Polars schema formats - use whichever is most convenient:

import polars as pl
from rattata import to_dataclass

# Format 1: Dictionary
polars_schema_dict = {
    "name": pl.String,
    "age": pl.Int32,
    "score": pl.Float64,
    "tags": pl.List(pl.String),
}

# Format 2: pl.Schema object
polars_schema_schema = pl.Schema({
    "name": pl.String,
    "age": pl.Int32,
    "score": pl.Float64,
    "tags": pl.List(pl.String),
})

# Format 3: List of tuples
polars_schema_list = [
    ("name", pl.String),
    ("age", pl.Int32),
    ("score", pl.Float64),
    ("tags", pl.List(pl.String)),
]

# All three formats work identically!
Person = to_dataclass(polars_schema_dict, class_name="Person")
# or: to_dataclass(polars_schema_schema, class_name="Person")
# or: to_dataclass(polars_schema_list, class_name="Person")

# Use the dataclass
person = Person(name="Alice", age=30, score=95.5, tags=["python", "data"])
print(person.name)
# Output: Alice

print(person)
# Output: Person(name='Alice', age=30, score=95.5, tags=['python', 'data'])

Converting Polars Schema to TypedDict

import polars as pl
from rattata import to_typeddict

# Define a Polars schema
polars_schema = {
    "name": pl.String,
    "age": pl.Int32,
    "scores": pl.List(pl.Float64),
}

# Convert to TypedDict
PersonDict = to_typeddict(polars_schema, dict_name="PersonDict")

# Use the TypedDict with type checking
person: PersonDict = {
    "name": "Bob",
    "age": 25,
    "scores": [88.5, 92.0, 85.5]
}

print(person["name"])
# Output: Bob

Converting Polars Schema to NamedTuple

import polars as pl
from rattata import to_namedtuple

# Define a Polars schema
polars_schema = {
    "name": pl.String,
    "age": pl.Int32,
    "active": pl.Boolean,
}

# Convert to NamedTuple
Person = to_namedtuple(polars_schema, tuple_name="Person")

# Use the NamedTuple
person = Person(name="Charlie", age=28, active=True)
print(person.name)  # Charlie
print(person[0])    # Charlie (also supports indexing)

Converting Dataclass to Polars Schema

from dataclasses import dataclass
from typing import List, Optional
from rattata import from_dataclass
import polars as pl

@dataclass
class Product:
    name: str
    price: float
    tags: List[str]
    description: Optional[str] = None

# Convert to Polars schema (returns pl.Schema)
polars_schema = from_dataclass(Product)
print(polars_schema)
# Output: Schema([('name', String), ('price', Float64), ('tags', List(String)), ('description', String)])

print(type(polars_schema))
# Output: <class 'polars.schema.Schema'>

# Access fields like a dictionary
print(polars_schema["name"])
# Output: String

# Use directly with Polars DataFrames
df = pl.DataFrame(
    {
        "name": ["Widget"],
        "price": [19.99],
        "tags": [["electronics", "gadgets"]],
        "description": ["A useful widget"]
    },
    schema=polars_schema
)
print(df)
# Output:
# shape: (1, 4)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ name   โ”† price โ”† tags                       โ”† description     โ”‚
# โ”‚ ---    โ”† ---   โ”† ---                        โ”† ---             โ”‚
# โ”‚ str    โ”† f64   โ”† list[str]                  โ”† str             โ”‚
# โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
# โ”‚ Widget โ”† 19.99 โ”† ["electronics", "gadgets"] โ”† A useful widget โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Converting TypedDict to Polars Schema

from typing import TypedDict, List
from rattata import from_typeddict
import polars as pl

class BookDict(TypedDict):
    title: str
    author: str
    pages: int
    genres: List[str]

# Convert to Polars schema (returns pl.Schema)
polars_schema = from_typeddict(BookDict)
print(polars_schema)
# Output: Schema([('title', String), ('author', String), ('pages', Int64), ('genres', List(String))])

print(type(polars_schema))
# Output: <class 'polars.schema.Schema'>

# Access fields like a dictionary
print(polars_schema["title"])
# Output: String

Converting NamedTuple to Polars Schema

from typing import NamedTuple
from rattata import from_namedtuple
import polars as pl

class Point(NamedTuple):
    x: float
    y: float
    z: float

# Convert to Polars schema (returns pl.Schema)
polars_schema = from_namedtuple(Point)
print(polars_schema)
# Output: Schema([('x', Float64), ('y', Float64), ('z', Float64)])

print(type(polars_schema))
# Output: <class 'polars.schema.Schema'>

# Access fields like a dictionary
print(polars_schema["x"])
# Output: Float64

๐Ÿ’ก Use Cases

Rattata is perfect for:

  • Schema Definition: Define your data structure once as a Polars schema, then generate Python classes
  • Type-Safe Data Processing: Convert Polars schemas to dataclasses for type-safe data manipulation
  • API Development: Generate TypedDicts from Polars schemas for API request/response validation
  • Data Pipeline Integration: Seamlessly convert between Polars DataFrames and Python objects
  • Testing: Generate test fixtures from Polars schemas
  • Documentation: Automatically generate Python type definitions from Polars schemas

๐Ÿ“š Advanced Examples

Nested Structures

Rattata handles deeply nested structures automatically:

import polars as pl
from rattata import to_dataclass

# Define a nested Polars schema
polars_schema = {
    "user": pl.Struct([
        pl.Field("name", pl.String),
        pl.Field("address", pl.Struct([
            pl.Field("street", pl.String),
            pl.Field("city", pl.String),
            pl.Field("zip", pl.Int32),
        ])),
    ]),
}

User = to_dataclass(polars_schema, class_name="User")

# Access nested struct classes dynamically
UserStruct = User.UserStruct
AddressStruct = User.UserStructStruct

# Use nested structure
user = User(
    user=UserStruct(
        name="Alice",
        address=AddressStruct(
            street="123 Main St",
            city="Springfield",
            zip=12345
        )
    )
)

print(user.user.name)
# Output: Alice

print(user.user.address.city)
# Output: Springfield

Arrays with Nested Types

import polars as pl
from rattata import to_typeddict

# Nested arrays
polars_schema = {
    "matrix": pl.List(pl.List(pl.Float64)),
    "tags": pl.List(pl.String),
}

MatrixDict = to_typeddict(polars_schema, dict_name="MatrixDict")

# MatrixDict is a TypedDict with nested list types
print(type(MatrixDict))
# Output: <class 'typing._TypedDictMeta'>

print(MatrixDict.__annotations__)
# Output: {'matrix': typing.List[typing.List[typing.Union[float, NoneType]]], 'tags': typing.List[typing.Union[str, NoneType]]}

Round-Trip Conversion

Convert from Polars schema โ†’ Python class โ†’ Polars schema:

import polars as pl
from dataclasses import dataclass
from typing import List
from rattata import to_dataclass, from_dataclass

# Start with Polars schema
original = {
    "name": pl.String,
    "age": pl.Int32,
    "scores": pl.List(pl.Float64),
}

# Convert to dataclass and back
Person = to_dataclass(original, class_name="Person")
converted_back = from_dataclass(Person)  # Returns pl.Schema

# Verify types match (with some flexibility for Optional/nullability)
print(f"Original: {original}")
# Output: Original: {'name': String, 'age': Int32, 'scores': List(Float64)}

print(f"Converted back: {converted_back}")
# Output: Converted back: Schema([('name', String), ('age', Int64), ('scores', List(Float64))])

print(f"Type: {type(converted_back)}")
# Output: Type: <class 'polars.schema.Schema'>

assert converted_back["name"] == original["name"]
# Note: Int32 converts to Int64 (Python int defaults to Int64)

# converted_back is a pl.Schema, so you can use it directly with Polars
df = pl.DataFrame(
    {"name": ["Alice"], "age": [30], "scores": [[88.5, 92.0]]},
    schema=converted_back
)
print(df)
# Output:
# shape: (1, 3)
# โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
# โ”‚ name  โ”† age โ”† scores       โ”‚
# โ”‚ ---   โ”† --- โ”† ---          โ”‚
# โ”‚ str   โ”† i64 โ”† list[f64]    โ”‚
# โ•žโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
# โ”‚ Alice โ”† 30  โ”† [88.5, 92.0] โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Date and Time Types

import polars as pl
from datetime import date, datetime
from decimal import Decimal
from rattata import to_dataclass

polars_schema = {
    "event_date": pl.Date,
    "timestamp": pl.Datetime(time_unit="us"),
    "price": pl.Decimal(precision=10, scale=2),
}

Event = to_dataclass(polars_schema, class_name="Event")

event = Event(
    event_date=date(2024, 1, 15),
    timestamp=datetime(2024, 1, 15, 10, 30, 0),
    price=Decimal("99.99")
)

๐Ÿ”ง Error Handling

Rattata provides clear, actionable error messages through custom exceptions:

from rattata import ConversionError, UnsupportedTypeError, SchemaError, to_dataclass
import polars as pl

try:
    # Invalid: not a valid Python identifier
    schema = to_dataclass({"name": pl.String}, class_name="123invalid")
except SchemaError as e:
    print(f"Invalid name: {e}")
    # Output: Invalid name: class_name '123invalid' is not a valid Python identifier

try:
    # Invalid: Python keyword as class name
    schema = to_dataclass({"name": pl.String}, class_name="class")
except SchemaError as e:
    print(f"Invalid name: {e}")
    # Output: Invalid name: class_name 'class' is a Python keyword and cannot be used

๐Ÿ“– API Reference

to_dataclass(polars_schema, class_name="DataClass")

Convert a Polars schema to a dataclass.

Parameters:

  • polars_schema: Polars schema in any supported format:
    • pl.Schema: Polars Schema object
    • dict[str, pl.DataType]: Dictionary mapping field names to types
    • Iterable[tuple[str, pl.DataType]]: Iterable of (field_name, type) tuples (e.g., [("name", pl.String), ...])
  • class_name (str): Name for the generated dataclass (must be a valid Python identifier)

Returns:

  • type: A dataclass type

Raises:

  • SchemaError: If the schema structure is invalid or class_name is invalid
  • UnsupportedTypeError: If a type cannot be converted
  • ConversionError: If conversion fails

to_typeddict(polars_schema, dict_name="TypedDict")

Convert a Polars schema to a TypedDict.

Parameters:

  • polars_schema: Polars schema in any supported format (same as to_dataclass)
  • dict_name (str): Name for the generated TypedDict (must be a valid Python identifier)

Returns:

  • type: A TypedDict type

Raises:

  • SchemaError: If the schema structure is invalid or dict_name is invalid
  • UnsupportedTypeError: If a type cannot be converted
  • ConversionError: If conversion fails

to_namedtuple(polars_schema, tuple_name="NamedTuple")

Convert a Polars schema to a NamedTuple.

Parameters:

  • polars_schema: Polars schema in any supported format (same as to_dataclass)
  • tuple_name (str): Name for the generated NamedTuple (must be a valid Python identifier)

Returns:

  • type: A NamedTuple type (typing.NamedTuple preferred, collections.namedtuple as fallback)

Raises:

  • SchemaError: If the schema structure is invalid or tuple_name is invalid
  • UnsupportedTypeError: If a type cannot be converted
  • ConversionError: If conversion fails

from_dataclass(dataclass_cls)

Convert a dataclass to a Polars schema.

Parameters:

  • dataclass_cls (type): A dataclass type

Returns:

  • pl.Schema: Polars Schema object mapping field names to Polars types

Raises:

  • SchemaError: If the input is not a dataclass
  • ConversionError: If conversion fails
  • UnsupportedTypeError: If a type cannot be converted

from_typeddict(typeddict_cls)

Convert a TypedDict to a Polars schema.

Parameters:

  • typeddict_cls (type): A TypedDict type

Returns:

  • pl.Schema: Polars Schema object mapping field names to Polars types

Raises:

  • SchemaError: If the input is not a TypedDict
  • ConversionError: If conversion fails
  • UnsupportedTypeError: If a type cannot be converted

from_namedtuple(namedtuple_cls)

Convert a NamedTuple to a Polars schema.

Parameters:

  • namedtuple_cls (type): A NamedTuple type (typing.NamedTuple or collections.namedtuple)

Returns:

  • pl.Schema: Polars Schema object mapping field names to Polars types

Raises:

  • SchemaError: If the input is not a NamedTuple
  • ConversionError: If conversion fails
  • UnsupportedTypeError: If a type cannot be converted

๐ŸŽฏ Supported Types

Primitive Types

Polars Python
Int8 int
Int16 int
Int32 int
Int64 int
UInt8 int
UInt16 int
UInt32 int
UInt64 int
Float32 float
Float64 float
Boolean bool
String / Utf8 str
Date date
Datetime datetime
Decimal Decimal
Binary bytes
Null None
Categorical str
Enum str

Complex Types

  • Arrays/Lists: Fully supported with nested arrays (List[List[T]], etc.)
  • Structs: Fully supported with nested structs (converts to nested dataclasses/TypedDicts)
  • Dicts: Python Dict[str, T] converts to Polars Struct with key and value fields

โš ๏ธ Limitations

Type Conversions with Information Loss

Some type conversions result in information loss or semantic changes:

  • UInt64 โ†’ int: Python's int type can represent unsigned 64-bit integers, but the semantic meaning is lost
  • Decimal precision/scale: When converting from Python Decimal to Polars Decimal, defaults to precision=38, scale=10. Specific precision/scale from Polars schemas are preserved when converting to Python
  • Datetime time units: When converting from Python datetime to Polars Datetime, defaults to time_unit="ns". Specific time units from Polars schemas are preserved when converting to Python
  • Dict โ†’ Struct: Python Dict[str, T] is converted to Polars Struct with key (String) and value (T) fields

Nullability

  • Polars โ†’ Python: All fields are created with Optional[...] to handle nullability, as Polars schemas don't explicitly track nullability at the schema definition level
  • Python โ†’ Polars: The Optional attribute from Python type annotations is handled, but all Polars fields can contain nulls by default

NamedTuple Limitations

  • Nested structures in NamedTuples are converted to Dict[str, Any] due to NamedTuple's limitations with complex nested types
  • collections.namedtuple (without type annotations) loses type information during conversion

Input Validation

Rattata validates schemas before conversion:

  • Duplicate field names: Raises SchemaError if duplicate field names are detected
  • Empty field names: Raises SchemaError if any field name is an empty string
  • Invalid field types: Raises SchemaError if field types are None
  • Invalid field name types: Raises SchemaError if field names are not strings
  • Invalid class/dict/tuple names: Raises SchemaError if provided names are not valid Python identifiers or are Python keywords

Schema Format Support

Rattata accepts Polars schemas in three formats:

  1. pl.Schema objects: Native Polars Schema objects

    schema = pl.Schema({"name": pl.String, "age": pl.Int32})
    
  2. dict[str, pl.DataType]: Dictionary mapping field names to types

    schema = {"name": pl.String, "age": pl.Int32}
    
  3. Iterable[tuple[str, pl.DataType]]: Iterable of (field_name, type) tuples

    schema = [("name", pl.String), ("age", pl.Int32)]
    schema = (("name", pl.String), ("age", pl.Int32))  # Also works
    

All three formats work with to_dataclass, to_typeddict, and to_namedtuple.

๐Ÿ› ๏ธ Development

Setup

# Clone the repository
git clone https://github.com/eddiethedean/rattata.git
cd rattata

# Install in development mode with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=rattata --cov-report=html

# Run specific test file
pytest tests/test_converters.py

# Run with verbose output
pytest -v

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

# Type check
mypy rattata/

Testing Across Python Versions

The project is tested across Python 3.8, 3.9, 3.10, 3.11, and 3.12. Use pyenv or tox to test locally:

# Example with pyenv
pyenv local 3.8 3.9 3.10 3.11 3.12
for version in 3.8 3.9 3.10 3.11 3.12; do
    pyenv local $version
    python -m pytest
done

Project Structure

rattata/
โ”œโ”€โ”€ rattata/
โ”‚   โ”œโ”€โ”€ __init__.py          # Public API
โ”‚   โ”œโ”€โ”€ converters.py         # Core conversion functions
โ”‚   โ”œโ”€โ”€ type_mappings.py      # Type mapping dictionaries and utilities
โ”‚   โ””โ”€โ”€ errors.py             # Custom exceptions
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ conftest.py           # Shared fixtures and test utilities
โ”‚   โ”œโ”€โ”€ test_converters.py    # Conversion function tests
โ”‚   โ”œโ”€โ”€ test_type_mappings.py # Type mapping tests
โ”‚   โ”œโ”€โ”€ test_integration.py   # Integration tests with Polars DataFrames
โ”‚   โ””โ”€โ”€ test_edge_cases.py    # Edge case and error handling tests
โ”œโ”€โ”€ pyproject.toml            # Package configuration
โ”œโ”€โ”€ LICENSE                   # MIT License
โ””โ”€โ”€ README.md                 # This file

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ™ Inspiration

This project is inspired by charmander, which provides similar functionality for converting between Polars schemas and PySpark schemas.

๐Ÿ“ง Contact

Odos Matthews

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rattata-0.1.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rattata-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file rattata-0.1.0.tar.gz.

File metadata

  • Download URL: rattata-0.1.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for rattata-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b8d52b62d8facf5ac6b16e5eed625d26dde44d06588ba828727b43a7331909f7
MD5 02187bf1b1c53ef7838aae29504603b5
BLAKE2b-256 d2834b71bf46aaace4c41faf81db901cb9ee5252023793ae585aca21357ee144

See more details on using hashes here.

File details

Details for the file rattata-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rattata-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for rattata-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91139d915cb88823e81463d15d36e40edcf508bbf41623137cf625eddf6452ea
MD5 f8d2bd6fcb6ea8e6cd57c92344f4ec79
BLAKE2b-256 85f1f19deb09629ef1a47beac94e82a6bd328e3e6702f1683818b2a015dc04c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page