A fast dataframe implementation with Pydantic integration

These details have not been verified by PyPI

Project description

FastDataFrame

FastDataFrame bridges Pydantic models and dataframe/table backends. A FastDataFrame model owns backend-neutral column definitions, and backend modules expose stateless functions for Polars, PyArrow, and Apache Iceberg.

Supported backends:

Polars DataFrame and LazyFrame
PyArrow schemas
Apache Iceberg schemas/tables through PyIceberg

Core idea

Define the schema once:

from typing import Annotated

from pydantic import BaseModel, Field

from fastdataframe import ColumnInfo, FastDataFrameModel, Int32


class User(BaseModel):
    user_id: Annotated[
        int,
        Field(validation_alias="userId", serialization_alias="user_id"),
        ColumnInfo(dtype=Int32()),
    ]
    name: str
    score: float = 0.0
    nickname: str | None = None


FastUser = FastDataFrameModel.from_base_model(User)

Then generate backend-native schemas with stateless backend functions:

import fastdataframe.polars as fpl
import fastdataframe.pyarrow as farrow
import fastdataframe.iceberg as fice

polars_schema = fpl.schema(FastUser)
arrow_schema = farrow.schema(FastUser)
iceberg_schema = fice.schema(FastUser)

Column definitions

FastDataFrameModel owns immutable, backend-neutral column definitions:

FastUser.column_definitions
FastUser.column_map

ColumnInfo is optional user-authored metadata. Fields without ColumnInfo receive default metadata.

class Trade(FastDataFrameModel):
    trade_id: str
    quantity: Annotated[int, ColumnInfo(dtype=Int32(), is_unique=False)]

Name accessors

Resolved names are available as immutable accessors keyed by Python field name:

FastUser.serialization_names.user_id  # "user_id"
FastUser.validation_names.user_id     # "userId"
FastUser.storage_names.user_id        # "user_id"
FastUser.serialization_names["user_id"]

The storage name is the canonical dataframe/table column name and defaults to the Pydantic serialization name.

Dtype refinements

ColumnInfo(dtype=...) can refine backend schema generation while the Python annotation remains the semantic type.

Initial backend-neutral scalar dtypes include:

Boolean, String, Binary
Int8, Int16, Int32, Int64
Float32, Float64
Date, Time, Timestamp
Decimal

Unsigned integer dtypes are intentionally not included initially. Small signed integers are widened when mapped to Iceberg where necessary.

Polars

import polars as pl
import fastdataframe.polars as fpl

raw = pl.DataFrame({"user_id": ["1"], "name": ["Alice"], "score": ["1.5"], "nickname": [None]})

cast_df = fpl.cast(FastUser, raw)
errors = fpl.validate_schema(FastUser, cast_df)

fpl.string_schema(FastUser) returns a schema with all columns as strings for ingest flows.

PyArrow

import fastdataframe.pyarrow as farrow

schema = farrow.schema(FastUser)
string_schema = farrow.string_schema(FastUser)

PyArrow schemas encode nullability from Optional / None unions. Pydantic defaults do not imply nullable storage.

Iceberg

import fastdataframe.iceberg as fice

schema = fice.schema(FastUser)

Iceberg migration support is additive-only by default:

fice.apply_additive_migration(FastUser, table)

Destructive deletes are intentionally not automatic.

For Polars-to-Iceberg persistence, data is written through the FastDataFrame-generated PyArrow schema boundary:

fice.append_polars(FastUser, table, cast_df)

This is important because Polars schemas do not encode column nullability in the same way as PyArrow and Iceberg.

Column lifecycle

Deprecated fields remain model fields, remain in schemas, and must be nullable:

class UserV2(FastDataFrameModel):
    old_score: Annotated[float | None, ColumnInfo(deprecated=True)]
    score: float

Deprecated and removed column names can be reserved through model config to prevent unsafe reuse:

class UserV3(FastDataFrameModel):
    model_config = {
        "fastdataframe_deprecated_column_names": {"old_score"},
        "fastdataframe_removed_column_names": {"very_old_score"},
    }

    score: float

Removed names remain reserved even if a backend later physically deletes the column.

Installation

pip install fastdataframe
# or with optional backends
pip install 'fastdataframe[polars,pyarrow,iceberg]'

Development

uv sync --all-extras
uv run pytest tests/
uv run ruff check .
uv run ruff format .
uv run ty check

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

May 27, 2026

0.1.0

Aug 8, 2025

0.0.6

Aug 5, 2025

0.0.5

Jul 27, 2025

0.0.4

Jun 3, 2025

0.0.3

Jun 2, 2025

0.0.2

Jun 2, 2025

0.0.1

May 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastdataframe-0.2.0-py3-none-any.whl (37.1 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file fastdataframe-0.2.0-py3-none-any.whl.

File metadata

Download URL: fastdataframe-0.2.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 37.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastdataframe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`08d8ca03699110fbf53f3bd68fe84e992c66b34cc273ea076e985255ef4a3eb6`
MD5	`3a780c036b3c2c9169ed0dcdd5855a72`
BLAKE2b-256	`e24c56645a2f8e5311468f0ab785d3a707d3a036031bf092fc2634b531fd07ba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastdataframe-0.2.0-py3-none-any.whl:

Publisher: ci.yml on davzucky/fastdataframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fastdataframe-0.2.0-py3-none-any.whl
- Subject digest: 08d8ca03699110fbf53f3bd68fe84e992c66b34cc273ea076e985255ef4a3eb6
- Sigstore transparency entry: 1643550156
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: davzucky/fastdataframe@4f481933a7e444810316760889c83ed2b135f7a6
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/davzucky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@4f481933a7e444810316760889c83ed2b135f7a6
- Trigger Event: release

fastdataframe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers