A fast dataframe implementation with Pydantic integration
Project description
FastDataFrame
FastDataFrame is a modern Python library that bridges the world of Pydantic models and dataframe libraries, providing a standard interface for validating, transforming, and creating dataframes from Pydantic models. It currently supports:
- Polars DataFrame and LazyFrame
- Apache Iceberg tables
The goal is to make it easy to:
- Validate a dataframe against a Pydantic model schema
- Convert a dataframe to an iterable of Pydantic models
- Create a dataframe from a list of Pydantic models, with correct types
Features
- Schema Validation: Ensure your dataframe matches your Pydantic model's schema.
- Model Conversion: Easily convert between dataframes and Pydantic models.
- Type-Safe DataFrames: Automatically infer and enforce correct column types.
- Multi-Backend: Works with Polars and Iceberg (with plans for Pandas, PyArrow, and more).
Polars Integration
1. Define Your Pydantic Model
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
age: int
2. Create a Polars DataFrame from Models
import polars as pl
from fastdataframe.polars.model import PolarsFastDataframeModel
# Create a Polars-compatible model
PolarsUser = PolarsFastDataframeModel.from_base_model(User)
# List of Pydantic models
users = [
User(id=1, name="Alice", age=30),
User(id=2, name="Bob", age=25),
]
# Create a DataFrame
df = pl.DataFrame([user.model_dump() for user in users])
# Cast DataFrame to match the model schema (enforces types)
df = PolarsUser.cast_to_model_schema(df)
print(df)
3. Validate a DataFrame Against the Model
errors = PolarsUser.validate_schema(df)
if errors:
print("Validation errors:", errors)
else:
print("DataFrame is valid!")
4. Convert DataFrame Rows to Pydantic Models
models = [PolarsUser(**row) for row in df.to_dicts()]
print(models)
Iceberg Integration
1. Define Your Pydantic Model
from pydantic import BaseModel
class Transaction(BaseModel):
transaction_id: str
amount: float
timestamp: str
2. Generate Iceberg Schema from Model
from fastdataframe.iceberg.model import IcebergFastDataframeModel
IcebergTransaction = IcebergFastDataframeModel.from_base_model(Transaction)
iceberg_schema = IcebergTransaction.get_iceberg_schema()
print(iceberg_schema)
3. Validate Iceberg Table Schema
# Suppose you have an Iceberg table schema as a dict
table_schema = {
"fields": [
{"name": "transaction_id", "type": "string"},
{"name": "amount", "type": "double"},
{"name": "timestamp", "type": "string"},
]
}
errors = IcebergTransaction.validate_schema(table_schema)
if errors:
print("Schema validation errors:", errors)
else:
print("Iceberg schema is valid!")
Why FastDataFrame?
- Type Safety: Enforce your data contracts at the dataframe level.
- Consistency: Use the same Pydantic models for both API and data processing.
- Extensible: Designed to support multiple dataframe backends.
Installation
uv pip install fastdataframe
# or
pip install fastdataframe
Roadmap
- Polars support
- Iceberg support
- Pandas support
- PyArrow support
- PyIceberg support
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastdataframe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fastdataframe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
403d50bd2e45728fad4eb328ca563232dc0ffaf226c0c046ef967f7762389d5c
|
|
| MD5 |
35c58bc92fe2df52341055e6abe0b0bd
|
|
| BLAKE2b-256 |
3ec4cefbe8aade862eb616bc79f4216e03057ca45dc29832768d4ab2d2ea5f02
|
Provenance
The following attestation bundles were made for fastdataframe-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on davzucky/fastdataframe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastdataframe-0.1.0-py3-none-any.whl -
Subject digest:
403d50bd2e45728fad4eb328ca563232dc0ffaf226c0c046ef967f7762389d5c - Sigstore transparency entry: 367114488
- Sigstore integration time:
-
Permalink:
davzucky/fastdataframe@28df8eb51fd438b78260202e7bf345a233a7631a -
Branch / Tag:
refs/tags/V0.1.0 - Owner: https://github.com/davzucky
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@28df8eb51fd438b78260202e7bf345a233a7631a -
Trigger Event:
release
-
Statement type: