Skip to main content

Strongly-typed DataFrames for Python, powered by Rust.

Project description

PydanTable

CI Documentation PyPI version Python versions License: MIT

Typed dataframe transformations for FastAPI and Pydantic services, backed by a Rust execution core (Polars inside the native extension).

Current release: 0.20.0 · Python 3.10+


At a glance

  • Schemas first: Pydantic field annotations define column types, nullability (T | None), and which expressions are legal. Many mistakes are caught when you build the Expr, not only when you run the query.
  • Two entry styles: DataFrameModel (SQLModel-like whole-table class with a generated row model) or DataFrame[YourSchema](data) with any Pydantic BaseModel schema.
  • Polars-shaped API: select, with_columns, filter, join, group_by, windows, reshape helpers — semantics are documented in the interface contract, not guaranteed identical to Polars on every edge case.
  • Optional extras: pydantable[polars] for to_polars(); pydantable[arrow] for read_parquet / read_ipc, to_arrow / ato_arrow, and pa.Table / RecordBatch constructors.
  • Optional façades: pydantable.pandas and pydantable.pyspark swap naming/imports; execution stays the same in-process core (not a real Spark or pandas backend).
  • Service-ready: Sync and async materialization (collect, to_dict, acollect, ato_dict, …), FastAPI patterns, and trusted ingest modes for bulk JSON or Arrow.
  • REPL / discovery: repr(df) on DataFrame and DataFrameModel shows the parameterized class, schema type, and column dtypes (wide tables truncate with … and N more). columns, shape, empty, dtypes, info(), and describe() (numeric, bool, str) are on the core API (see Interface contract for shape vs materialized rows). Expr and WhenChain have readable repr for debugging pipelines. Row counts in repr are omitted—use collect() / to_dict() when you need data. Jupyter / VS Code notebooks: _repr_html_() renders a bounded HTML table preview (no polars required); tune via pydantable.display or PYDANTABLE_REPR_HTML_*. Details: Execution.

Documentation

The canonical manual is on Read the Docs: https://pydantable.readthedocs.io/en/latest/

Topic Read the Docs
Home / overview Documentation home
Five-minute tour Quickstart
Changelog & versions Changelog · Versioning (0.x)
DataFrameModel (inputs, transforms, collisions, materialization) DataFrameModel
Column types (scalars, structs, list[T], maps, trusted ingest) Supported data types
FastAPI (routers, bodies, async, multipart) FastAPI integration
Execution (collect, to_dict, to_polars, to_arrow, async, repr) Execution
Semantics (nulls, joins, windows, reshape) Interface contract
Roadmap (shipped 0.20.0 UX / discovery, path to v1.0.0) Roadmap
Why not Polars alone? Why not just use Polars?
Pandas-style API (pydantable.pandas) Pandas UI
PySpark-style API (pydantable.pyspark) PySpark UI · Parity matrix
Polars parity Scorecard · Workflows · Transformation roadmap
Contributors Developer guide
Architecture plan Plan document
Python API (autodoc) API reference

Install

pip install pydantable

Optional dependencies (same package, feature extras):

pip install 'pydantable[polars]'   # to_polars()
pip install 'pydantable[arrow]'  # read_parquet/read_ipc, to_arrow, Table/RecordBatch constructors

From a git checkout you need a Rust toolchain and a build of the extension (e.g. Maturin):

pip install .
# editable: maturin develop --manifest-path pydantable-core/Cargo.toml

Full setup, make check-full, and release notes: Developer guide.


Quick start

from pydantable import DataFrameModel

class User(DataFrameModel):
    id: int
    age: int | None

df = User({"id": [1, 2], "age": [20, None]})
df2 = df.with_columns(age2=df.age * 2)
df3 = df2.select("id", "age2")
df4 = df3.filter(df3.age2 > 10)

# Columnar dict (good for JSON APIs)
print(df4.to_dict())
# {'age2': [40], 'id': [1]}

# List of Pydantic row models (default collect)
for row in df4.collect():
    print(row.id, row.age2)

Materialization: collect()list of row models; to_dict() / collect(as_lists=True)dict[str, list]; to_polars() / to_arrow() when the matching extra is installed. Async: acollect, ato_dict, ato_polars, ato_arrow offload blocking work from the event loop (Execution, FastAPI).

Alternate import styles (same engine):

from pydantable.pandas import DataFrameModel as PandasDataFrameModel
from pydantable.pyspark import DataFrameModel as PySparkDataFrameModel
from pydantable import DataFrameModel as DefaultDataFrameModel

More examples: FastAPI, Polars-style workflows.

Validation policy: Constructors validate strictly by default. For messy row lists, ignore_errors=True plus on_validation_errors=callback receives failed rows (row_index, row, Pydantic errors). Trusted bulk paths use trusted_mode (off / shape_only / strict). Details: DataFrameModel, Supported types.


Expression & API surface

Typed Expr builds a Rust AST. Highlights:

  • Globals in select: global_sum, global_mean, global_count, global_min, global_max, global_row_count() (row count). PySpark façade: F.count() with no argument = row count.
  • Windows: row_number, rank, dense_rank, window_sum, window_mean, window_min, window_max, lag, lead with Window.partitionBy(...).orderBy(..., nulls_last=...); framed rowsBetween / rangeBetween where supported (window semantics).
  • Temporal & strings: strptime, unix_timestamp, cast to date/datetime, dt_* parts, strip / lower / upper, str_replace, strip_prefix / suffix / chars, list helpers (list_len, list_get, …).
  • Maps (string keys): map_len, map_get, map_contains_key, map_keys, map_values, map_entries, map_from_entries, element_at; binary_len for bytes columns.

PySpark-named wrappers: pydantable.pyspark.sql.functions mirrors much of the above (parity table).


Recent releases

0.20.0UX, discovery, docs, and display: Quickstart, Execution (materialization costs, import styles, copy-as / interchange); core columns, shape, info(), describe() (int/float/bool/str), value_counts, set_display_options / PYDANTABLE_REPR_HTML_*, _repr_mimebundle_, optional PYDANTABLE_VERBOSE_ERRORS; Expr / WhenChain repr; PySpark show() / summary(); multi-line DataFrame repr and _repr_html_. Changelog, Interface contract Introspection.

0.19.0 — Pre-1.0 documentation consolidation: Versioning (0.x), interface contract cross-links, parity/README/index refresh for the 0.19 → 1.0 path, PERFORMANCE benchmark spot-check note, release-hygiene alignment with CI; group_by tests sort output where row order is not guaranteed (stable pytest-xdist). No new Expr or PySpark façade methods.

0.18.0 — Clearer Polars error context for group_by().agg(); explicit deferral of non-string map keys (Supported types, Roadmap); parity/roadmap doc refresh (no new façade APIs); Hypothesis smoke for join / group_by.

0.17.0 — Tighter docs and tests for map_get / map_contains_key after PyArrow map<utf8, …> ingest; more pyspark.sql.functions thin wrappers (str_replace, regexp_replace, strip_*, strptime, binary_len, list_*). Non-string map keys (dict[int, T], etc.) remain future work (Roadmap Later).

0.16.x — Arrow interchange (read_parquet / read_ipc, to_arrow / ato_arrow, Table/RecordBatch constructors), FastAPI multipart and deployment docs, map-column arithmetic TypeError fix, DataFrame[Schema](pa.Table) constructor fix.

Older highlights: 0.15.0 async materialization and Arrow map ingest; 0.14.0 window null ordering and FastAPI TestClient coverage. Full history: Changelog.


Development

From a clone with .venv and pip install -e ".[dev]" plus a built extension:

make check-full              # Ruff, mypy, Rust fmt / clippy / tests
PYTHONPATH=python pytest -q  # integration tests (see DEVELOPER.md)

Rust tests need the Makefile PYO3_PYTHON / PYTHONPATH wiring: make rust-test. Details: Developer guide.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantable-0.20.0.tar.gz (152.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pydantable-0.20.0-cp313-cp313-win_arm64.whl (17.2 MB view details)

Uploaded CPython 3.13Windows ARM64

pydantable-0.20.0-cp313-cp313-macosx_11_0_arm64.whl (17.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pydantable-0.20.0-cp313-cp313-macosx_10_12_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

pydantable-0.20.0-cp312-cp312-win_amd64.whl (19.0 MB view details)

Uploaded CPython 3.12Windows x86-64

pydantable-0.20.0-cp312-cp312-musllinux_1_2_x86_64.whl (17.3 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

pydantable-0.20.0-cp312-cp312-musllinux_1_2_aarch64.whl (15.8 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ ARM64

pydantable-0.20.0-cp312-cp312-macosx_11_0_arm64.whl (17.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pydantable-0.20.0-cp312-cp312-macosx_10_12_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

pydantable-0.20.0-cp311-cp311-manylinux_2_28_aarch64.whl (18.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

pydantable-0.20.0-cp311-cp311-macosx_11_0_arm64.whl (17.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pydantable-0.20.0-cp311-cp311-macosx_10_12_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

pydantable-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file pydantable-0.20.0.tar.gz.

File metadata

  • Download URL: pydantable-0.20.0.tar.gz
  • Upload date:
  • Size: 152.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for pydantable-0.20.0.tar.gz
Algorithm Hash digest
SHA256 19b31ad5860e8ecc81c679c0d643916c9bc1d6496743e8325f0bf10082e4400c
MD5 af41b145df9732cd56741bd58d75c334
BLAKE2b-256 3852df624c46bf3cb1b0ac4351d0b287715375ce0cc26791427fc07374360617

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp313-cp313-win_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp313-cp313-win_arm64.whl
Algorithm Hash digest
SHA256 60b8914df9ccccb290e766a79d3306c2b65825f32dfd79dac25f66b2bc3d94e3
MD5 0a2c63c1c989b050929dfe1f423b42ac
BLAKE2b-256 d7c5e43aeda3cade8504bdff940c9580fdbf370f7846f123ce703f6da20ef6b6

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2a4e707ce08609b66f762a5302ef4182d712c9b983d3ef22b902e20f7807fc76
MD5 6f3d0ef6775544a34b64d4cd1497885d
BLAKE2b-256 e041f3956b094682af3e69f1507728b46250cd2a37396bace28137d6f7be06c7

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c810a0bb529ff167828135001b92c6309f29e37c013fbc261899b11055c883bd
MD5 8fa0b0d6313bfbb706f5d9a03d4cc1e1
BLAKE2b-256 0d7066cb900ea2ed169dfb90d68874e49d81577de67bbdf7b356c3ca996b402c

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a366a3f79a70d83053b9f7f38e1a8865a8304033e97896af6d2535e24e09060c
MD5 b1f6a7e888ea407cd0c8b664f1af4dd0
BLAKE2b-256 650cf40945cbde167941b0e962025ab4d27a70cb3b5e680ca11ed4d76b932679

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 533fd8ccac4bba0bd45cb24eb55c24cb8205957a579f1720587b60c2bb9095cd
MD5 ff39521216164089a381c5a7fcbbf1ce
BLAKE2b-256 7ee6e4f12a78241ae0f42630f56887cfe7bd0c6833c38e63906e56d700d355e2

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp312-cp312-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d3aac73ba3392f26ad9aba6a954928b3130b11a80a3cefce954116bd4ce5f428
MD5 f51a289446985ddf611982626687395c
BLAKE2b-256 15f1a82e72e598126e20984a0acbda70230b01bb3b6dbc06a5050e7e48b6b07c

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2498a3e90cbb03ac245e71508495682d63c9c8919a4e3adee4bcd592f407a1c6
MD5 149b49bd370b79dd9b067a1adeb74a58
BLAKE2b-256 d0e13f540c4605ba695042246cbfd09a9139045f2a2bba5582c4b1f733f59a44

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1e0cf956a81f3154e57fb1de0ef838dbfc633283db5a05d600f2593aebcf821f
MD5 0af01521ed378f9614debc520913f8a4
BLAKE2b-256 11f12eece1bb57f1251d55a3ed5cee3c9d8e803e18065b4284ac5910f6e18673

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 390d1b082f745ef42c9389c8671a39f0d7b42e343881c7ca8effddd06d940f61
MD5 beb85332bb678326cb7ee94533327ed8
BLAKE2b-256 fa5308d613e613b07d9d58b4b4f36fac8e88791adbf30b26c55ea90963814645

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9447b31cb46c2213a2da1c38139ee121fcefea168a2421c2e8266e45b3d5817
MD5 3ef6bab97265ac83d42f04651ffc8c3e
BLAKE2b-256 948287802e925db0dda965c7c4f82511175fd3c0bcb5c7451eea9a0d43126d1f

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 85d11d7713b5f73584f202c78daed33b5d2993240262405771ac4866a7fb185c
MD5 6c4cf29ae18a6d53695fc54baa95e002
BLAKE2b-256 e8557c0507f47fa28ee970f69ca0941a4a35e1053d306bf3a88bf1f769f761c3

See more details on using hashes here.

File details

Details for the file pydantable-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 69388edbc0b2039b8d5ec992db10911fff3a5d31dd18273aa53ab944f17146e2
MD5 adbfea64b40e780b2e39c2779f1be8d1
BLAKE2b-256 3af65781f15183626a129b4cad10f40b9640680005311007963b2f8043612fda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page