Skip to main content

A fast dataframe implementation with Pydantic integration

Project description

FastDataFrame

FastDataFrame is a modern Python library that bridges the world of Pydantic models and dataframe libraries, providing a standard interface for validating, transforming, and creating dataframes from Pydantic models. It currently supports:

The goal is to make it easy to:

  • Validate a dataframe against a Pydantic model schema
  • Convert a dataframe to an iterable of Pydantic models
  • Create a dataframe from a list of Pydantic models, with correct types

Features

  • Schema Validation: Ensure your dataframe matches your Pydantic model's schema.
  • Model Conversion: Easily convert between dataframes and Pydantic models.
  • Type-Safe DataFrames: Automatically infer and enforce correct column types.
  • Multi-Backend: Works with Polars and Iceberg (with plans for Pandas, PyArrow, and more).

Polars Integration

1. Define Your Pydantic Model

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    age: int

2. Create a Polars DataFrame from Models

import polars as pl
from fastdataframe.polars.model import PolarsFastDataframeModel

# Create a Polars-compatible model
PolarsUser = PolarsFastDataframeModel.from_base_model(User)

# List of Pydantic models
users = [
    User(id=1, name="Alice", age=30),
    User(id=2, name="Bob", age=25),
]

# Create a DataFrame
df = pl.DataFrame([user.model_dump() for user in users])

# Cast DataFrame to match the model schema (enforces types)
df = PolarsUser.cast_to_model_schema(df)
print(df)

3. Validate a DataFrame Against the Model

errors = PolarsUser.validate_schema(df)
if errors:
    print("Validation errors:", errors)
else:
    print("DataFrame is valid!")

4. Convert DataFrame Rows to Pydantic Models

models = [PolarsUser(**row) for row in df.to_dicts()]
print(models)

Iceberg Integration

1. Define Your Pydantic Model

from pydantic import BaseModel

class Transaction(BaseModel):
    transaction_id: str
    amount: float
    timestamp: str

2. Generate Iceberg Schema from Model

from fastdataframe.iceberg.model import IcebergFastDataframeModel

IcebergTransaction = IcebergFastDataframeModel.from_base_model(Transaction)
iceberg_schema = IcebergTransaction.get_iceberg_schema()
print(iceberg_schema)

3. Validate Iceberg Table Schema

# Suppose you have an Iceberg table schema as a dict
table_schema = {
    "fields": [
        {"name": "transaction_id", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "timestamp", "type": "string"},
    ]
}

errors = IcebergTransaction.validate_schema(table_schema)
if errors:
    print("Schema validation errors:", errors)
else:
    print("Iceberg schema is valid!")

Why FastDataFrame?

  • Type Safety: Enforce your data contracts at the dataframe level.
  • Consistency: Use the same Pydantic models for both API and data processing.
  • Extensible: Designed to support multiple dataframe backends.

Installation

uv pip install fastdataframe
# or
pip install fastdataframe

Roadmap

  • Polars support
  • Iceberg support
  • Pandas support
  • PyArrow support
  • PyIceberg support

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastdataframe-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file fastdataframe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fastdataframe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fastdataframe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 403d50bd2e45728fad4eb328ca563232dc0ffaf226c0c046ef967f7762389d5c
MD5 35c58bc92fe2df52341055e6abe0b0bd
BLAKE2b-256 3ec4cefbe8aade862eb616bc79f4216e03057ca45dc29832768d4ab2d2ea5f02

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastdataframe-0.1.0-py3-none-any.whl:

Publisher: ci.yml on davzucky/fastdataframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page