Skip to main content

A fast dataframe implementation with Pydantic integration

Project description

FastDataFrame

FastDataFrame is a modern Python library that bridges the world of Pydantic models and dataframe libraries, providing a standard interface for validating, transforming, and creating dataframes from Pydantic models. It currently supports:

The goal is to make it easy to:

  • Validate a dataframe against a Pydantic model schema
  • Convert a dataframe to an iterable of Pydantic models
  • Create a dataframe from a list of Pydantic models, with correct types

Features

  • Schema Validation: Ensure your dataframe matches your Pydantic model's schema.
  • Model Conversion: Easily convert between dataframes and Pydantic models.
  • Type-Safe DataFrames: Automatically infer and enforce correct column types.
  • Multi-Backend: Works with Polars and Iceberg (with plans for Pandas, PyArrow, and more).

Polars Integration

1. Define Your Pydantic Model

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    age: int

2. Create a Polars DataFrame from Models

import polars as pl
from fastdataframe.polars.model import PolarsFastDataframeModel

# Create a Polars-compatible model
PolarsUser = PolarsFastDataframeModel.from_base_model(User)

# List of Pydantic models
users = [
    User(id=1, name="Alice", age=30),
    User(id=2, name="Bob", age=25),
]

# Create a DataFrame
df = pl.DataFrame([user.model_dump() for user in users])

# Cast DataFrame to match the model schema (enforces types)
df = PolarsUser.cast_to_model_schema(df)
print(df)

3. Validate a DataFrame Against the Model

errors = PolarsUser.validate_schema(df)
if errors:
    print("Validation errors:", errors)
else:
    print("DataFrame is valid!")

4. Convert DataFrame Rows to Pydantic Models

models = [PolarsUser(**row) for row in df.to_dicts()]
print(models)

Iceberg Integration

1. Define Your Pydantic Model

from pydantic import BaseModel

class Transaction(BaseModel):
    transaction_id: str
    amount: float
    timestamp: str

2. Generate Iceberg Schema from Model

from fastdataframe.iceberg.model import IcebergFastDataframeModel

IcebergTransaction = IcebergFastDataframeModel.from_base_model(Transaction)
iceberg_schema = IcebergTransaction.get_iceberg_schema()
print(iceberg_schema)

3. Validate Iceberg Table Schema

# Suppose you have an Iceberg table schema as a dict
table_schema = {
    "fields": [
        {"name": "transaction_id", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "timestamp", "type": "string"},
    ]
}

errors = IcebergTransaction.validate_schema(table_schema)
if errors:
    print("Schema validation errors:", errors)
else:
    print("Iceberg schema is valid!")

Why FastDataFrame?

  • Type Safety: Enforce your data contracts at the dataframe level.
  • Consistency: Use the same Pydantic models for both API and data processing.
  • Extensible: Designed to support multiple dataframe backends.

Installation

uv pip install fastdataframe
# or
pip install fastdataframe

Roadmap

  • Polars support
  • Iceberg support
  • Pandas support
  • PyArrow support
  • PyIceberg support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastdataframe-0.0.5-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file fastdataframe-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: fastdataframe-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fastdataframe-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9143f906c9de124a17bba0a56a609e7ed03c60bc5ad90e96dd2d485361c52347
MD5 f649f0c8528e08f1de08533691273171
BLAKE2b-256 5e87ffaa7dfc12e1b47ea20c01d5046599dec4fd056b9462ec5b6bf0b7c3e845

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastdataframe-0.0.5-py3-none-any.whl:

Publisher: ci.yml on davzucky/fastdataframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page