Skip to main content

Strongly-typed DataFrames for Python, powered by Rust.

Project description

PydanTable

CI Documentation PyPI version Python versions License: MIT

Strongly typed DataFrames for Python, powered by Rust — Pydantic schemas, Polars-backed execution in the native extension, and an API built for services (including optional FastAPI integration).

Current release: 1.17.0 — highlights in the changelog.

Why PydanTable

  • One schema, many surfaces: define columns with Pydantic models; use DataFrameModel (SQLModel-style) or DataFrame[YourSchema].
  • Typed expressions: Expr and transform chains are validated and lowered in Rust; many errors fail fast at build/plan time.
  • Familiar operations: select, filter, join, group_by, windows, melt/pivot, and pandas-flavored helpers where they help.
  • Flexible materialization: row models via collect() / rows(), columnar dict[str, list], or Polars/PyArrow with the right extras.
  • I/O: lazy read_* / aread_*, streaming writes, NDJSON/JSON Lines, Parquet, CSV, IPC, HTTP, SQL (SQLModel-first fetch_sqlmodel / write_sqlmodel, explicit string SQL fetch_sql_raw / write_sql_raw, or deprecated unprefixed names), MongoDB eager fetch_mongo / write_mongo (and async mirrors) with pydantable[mongo]I/O overview, IO_SQL, MONGO_ENGINE, SQLModel roadmap, and decision tree.
  • JSON & struct columns: struct expressions, JSON encode/decode helpers, unnest/nested models — IO_JSON, SELECTORS.
  • FastAPI (optional): shared executor lifespan, NDJSON streaming from astream(), OpenAPI-friendly columnar bodies, register_exception_handlers (503 / 400 / 422). Start with the golden path and FastAPI guide.
  • Lazy SQL DataFrame (optional): install pydantable[sql] for SqlDataFrame / SqlDataFrameModel with the SQLAlchemy lazy-SQL ExecutionEngine (pydantable-protocol). The goal is to keep transforms on the SQL side (plans compiled to SQL) instead of loading whole tables into Python—especially when you write results back to the same database. Guide: MOLTRES_SQL; protocol authors: Custom engine packages.
  • Mongo engine (optional, 1.17.0+): pip install "pydantable[mongo]"PyMongo, Beanie, and the Mongo plan stack for lazy frames. Define collections with Beanie Document models, then MongoDataFrame.from_beanie / fetch_mongo(sync_pymongo_collection(...)) (see MONGO_ENGINE). Pydantic Schema + from_collection remains supported if you use a raw Collection. Under the hood: MongoPydantableEngine (pydantable) and MongoRoot from the plan stack.

Install

pip install pydantable

Common extras:

pip install "pydantable[polars]"   # to_polars
pip install "pydantable[arrow]"    # to_arrow / Arrow constructors
pip install "pydantable[io]"       # full file I/O convenience (arrow + polars)
pip install "pydantable[sql]"      # SQLModel + SQLAlchemy + moltres-core lazy SqlDataFrame; add a DB-API driver for your URL
pip install "pydantable[pandas]"   # pandas-flavored façade (pandas UI doc)
pip install "pydantable[fastapi]"  # FastAPI integration (pydantable.fastapi)
pip install "pydantable[mongo]"     # pymongo + Beanie + Mongo plan stack (lazy MongoDataFrame + I/O + from_beanie)

Quick start

from pydantable import DataFrameModel

class User(DataFrameModel):
    id: int
    age: int | None

df = User({"id": [1, 2], "age": [20, None]})
result = (
    df.with_columns(age2=df.age * 2)
    .filter(df.age > 10)
    .select("id", "age2")
)

print(result.to_dict())
print([r.model_dump() for r in result.collect()])

Output (exact values depend on filtering; this matches scripts/verify_doc_examples.py):

{'id': [1], 'age2': [40]}
[{'id': 1, 'age2': 40}]

Core concepts

Piece Role
DataFrameModel Table class with annotated columns (class Orders(DataFrameModel): ...).
DataFrame[Schema] Generic API over your own Pydantic BaseModel.
SqlDataFrame / SqlDataFrameModel Same shapes with pydantable[sql] — the lazy-SQL bridge compiles plans to SQL so transforms can stay in the database (sql_config= / sql_engine=); prefer when you are not round-tripping full tables through Python (e.g. write back to the same DB).
MongoDataFrame / MongoDataFrameModel Primary: pydantable[mongo]Beanie Document + from_beanie / sync_pymongo_collection for I/O. Also: Pydantic Schema with from_collection(sync_collection) without wiring Beanie. Lazy execution uses MongoPydantableEngine and MongoRoot. See MONGO_ENGINE.
Expr Typed expressions in with_columns, filter, etc.
Errors Ingest issues such as column length mismatch raise ColumnLengthMismatchError (ValueError subclass) from pydantable.errors — map to HTTP 400 in FastAPI via register_exception_handlers.

Static typing

  • mypy: schema-evolving return types for many chains via the bundled mypy plugin (plugins in pyproject.toml).
  • Pyright / Pylance: use committed stubs under typings/; for explicit targets, as_model(...) / try_as_model(...) / assert_model(...) and typed escape hatches like agg_as_model(...) / rolling_agg_as_model(...). See TYPING.

Rich column types (Literal, ipaddress, WKB, Annotated, …) are covered in SUPPORTED_TYPES.

Materialization: collect() / rows() → row models; to_dict()dict[str, list]; to_polars() / to_arrow() with matching extras.

I/O at a glance

  • DataFrameModel / DataFrame[Schema]: lazy read_* / aread_*, export_*, write_*, and SQLModel helpers (fetch_sqlmodel, write_sqlmodel, …). For eager column loads, import materialize_*, fetch_sqlmodel, iter_sqlmodel, … from pydantable (same entrypoints as the internal pydantable.io package) and pass dict[str, list] into constructors.
  • SQL details: IO_SQL (recommended APIs, *_raw, deprecations) and SQLMODEL_SQL_ROADMAP (phased migration).
  • Large files & NDJSON patterns: IO_JSON, IO_NDJSON, EXECUTION.

Validation controls

  • Strict by default on constructors.
  • Optional ingest controls: trusted_mode, ignore_errors, on_validation_errors.
  • Missing optional fields: fill_missing_optional (default True).
  • Validation presets: validation_profile=... (or __pydantable__ = {"validation_profile": "..."}).
  • Per-column and nested strictness: STRICTNESS (field policies + profile defaults).

Documentation

Topic Link
Docs home pydantable.readthedocs.io
Map of all pages DOCS_MAP
Quickstart QUICKSTART
DataFrameModel DATAFRAMEMODEL
Typing (mypy vs Pyright) TYPING
I/O overview IO_OVERVIEW
SQL (SQLModel, raw string SQL) IO_SQL · SQLMODEL_SQL_ROADMAP
Lazy SQL DataFrame MOLTRES_SQL
MongoDB (lazy MongoDataFrame + eager fetch_mongo) MONGO_ENGINE
Pandas-like API PANDAS_UI
FastAPI path GOLDEN_PATH_FASTAPIFASTAPIFASTAPI_ENHANCEMENTS
Service ergonomics (OpenAPI, aliases, redaction) SERVICE_ERGONOMICS
Custom dtypes CUSTOM_DTYPES
Strictness STRICTNESS
Cookbooks Cookbook index (FastAPI, lazy pipelines, JSON logs, …)
Example multi-router app docs/examples/fastapi/service_layout/ in this repo
Test helpers pydantable.testing.fastapi — see FASTAPI
Execution & async EXECUTION · MATERIALIZATION
Behavioral contract INTERFACE_CONTRACT
Troubleshooting TROUBLESHOOTING
Versioning VERSIONING
Changelog CHANGELOG

Development

Use a virtual environment at .venv in the repo root (the Makefile defaults to .venv/bin/python). Full contributor setup, Maturin/Rust builds, and release notes: DEVELOPER.

make check-full      # ruff, ty, pyright, typing snippet tests, Sphinx, Rust

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantable-1.17.0.tar.gz (213.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantable-1.17.0-py3-none-any.whl (244.9 kB view details)

Uploaded Python 3

File details

Details for the file pydantable-1.17.0.tar.gz.

File metadata

  • Download URL: pydantable-1.17.0.tar.gz
  • Upload date:
  • Size: 213.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.13

File hashes

Hashes for pydantable-1.17.0.tar.gz
Algorithm Hash digest
SHA256 bf8db5a83160ea0d54623fdef7759ec2b1cfd1aee2961d208e1e2d85606f1604
MD5 40afdee7b83df5c8736ba55a34ed6e61
BLAKE2b-256 1f23c85a0c98f4e42e12d4884bb50657c5331eff281ef1b4dd6135e2ccc01930

See more details on using hashes here.

File details

Details for the file pydantable-1.17.0-py3-none-any.whl.

File metadata

  • Download URL: pydantable-1.17.0-py3-none-any.whl
  • Upload date:
  • Size: 244.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.13

File hashes

Hashes for pydantable-1.17.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37ae014f388dfaeb3aa2e7e2e2bfaee8869e8c7e6a89e7304b1886c439455427
MD5 c5cd3dc8d3105b1b112edd3ee3423ad1
BLAKE2b-256 45a675fbbc546f46cdc6688b8f566f03506940626f58b9c9b1fc579d6b637fe5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page