Strongly-typed DataFrames for Python, powered by Rust.
Project description
PydanTable
Typed dataframe transformations for FastAPI and Pydantic services, backed by a Rust execution core (Polars inside the native extension).
Current release: 0.23.0 · Python 3.10–3.13
At a glance
- Schemas first: Pydantic field annotations define column types, nullability (
T | None), and which expressions are legal. Many mistakes are caught when you build theExpr, not only when you run the query. - Two entry styles:
DataFrameModel(SQLModel-like whole-table class with a generated row model) orDataFrame[YourSchema](data)with any PydanticBaseModelschema. - Polars-shaped API:
select,with_columns,filter,join,group_by, windows, reshape helpers — semantics are documented in the interface contract, not guaranteed identical to Polars on every edge case. - I/O: Primary —
DataFrame/DataFrameModel— lazyread_*/write_*(Parquet, CSV, NDJSON, IPC, JSON array-of-objects viaread_json), eagermaterialize_*/fetch_sql,export_*/write_sql, SQLfrom_sql, Polars options viascan_kwargs/write_kwargs. HTTP Parquet:read_parquet_urlleaves a temp file on disk; useread_parquet_url_ctx/aread_parquet_url_ctx(pydantable.ioorDataFrameModel) to unlink it when the block exits. Secondary —pydantable.io— full mirror: lazy roots, column dicts,fetch_*_url, object store (max_bytesguards), extras. Top-levelpydantablere-exports a small subset;from pydantable.io import …is the complete module API. If the native extension is missing, lazy scan/sink paths can raiseMissingRustExtensionError(subclass ofNotImplementedError). I/O decision tree, I/O overview, HTTP & object stores, JSON files, Execution. - Optional extras:
pydantable[polars]forto_polars()andexport_*(dict[str, list]→ file);pydantable[arrow]for buffer/streaming Parquet/IPC,to_arrow/ato_arrow, andpa.Table/RecordBatchconstructors;pydantable[io]bundles arrow + polars for full I/O;[sql]forfetch_sql/write_sql(SQLAlchemy; add psycopg, pymysql, etc. for your database URLs);[cloud],[excel],[kafka],[bq],[snowflake],[rap]for other bridges indocs/DATA_IO_SOURCES.md. - Optional façades:
pydantable.pandasandpydantable.pysparkswap naming/imports; execution stays the same in-process core (not a real Spark or pandas backend). - Service-ready: Sync and async materialization (
collect,to_dict,acollect,ato_dict, …), FastAPI patterns, and trusted ingest modes for bulk JSON or Arrow. - REPL / discovery:
repr(df)onDataFrameandDataFrameModelshows the parameterized class, schema type, and column dtypes (wide tables truncate with… and N more).columns,shape,empty,dtypes,info(), anddescribe()(numeric, bool, str) are on the core API (see Interface contract forshapevs materialized rows).ExprandWhenChainhave readablereprfor debugging pipelines. Row counts inreprare omitted—usecollect()/to_dict()when you need data. Jupyter / VS Code notebooks:_repr_html_()renders a bounded HTML table preview (nopolarsrequired); tune viapydantable.displayorPYDANTABLE_REPR_HTML_*. Details: Execution.
Upgrading
- From 0.22.x → 0.23.0: breaking I/O renames — eager file reads into
dict[str, list]arematerialize_*/amaterialize_*(not the oldread_*/aread_*names); lazy local files useread_*/aread_*(ScanFileRoot); lazy plan output usesDataFrame.write_*; eagerdict[str, list]→ file usesexport_*/aexport_*;read_sql/aread_sql→fetch_sql/afetch_sql; HTTP column readersread_*_url→fetch_*_url; lazy HTTP Parquet temp-file entry isread_parquet_url/aread_parquet_url. See Changelog and Execution. - From 0.21.x → 0.22.0: no intended breaking changes (see changelog for 0.22.0 I/O additions).
Documentation
The canonical manual is on Read the Docs: https://pydantable.readthedocs.io/en/latest/
| Topic | Read the Docs |
|---|---|
| Home / overview | Documentation home |
| Five-minute tour | Quickstart |
| Changelog & versions | Changelog · Versioning (0.x) |
DataFrameModel (inputs, transforms, collisions, materialization) |
DataFrameModel |
Column types (scalars, structs, list[T], maps, trusted ingest) |
Supported data types |
| FastAPI (routers, bodies, async, multipart) | FastAPI integration |
Execution (collect, to_dict, to_polars, to_arrow, async, repr) |
Execution |
Data I/O (primary: DataFrame / DataFrameModel; secondary: pydantable.io) |
I/O overview |
Choosing an I/O API (lazy vs eager, io vs typed frame) |
I/O decision tree |
| HTTP(S) & object stores (URLs, fsspec, temp Parquet lifecycle) | I/O HTTP |
JSON (array of objects: lazy read_json, materialize/export) |
I/O JSON |
| Data sources & transports (planning, FastAPI async stacks) | Data I/O sources |
Streamlit (st.dataframe, interchange, editors) |
Streamlit |
| Semantics (nulls, joins, windows, reshape) | Interface contract |
| Roadmap (shipped 0.23.x I/O, 0.20.x UX, path to v1.0.0) | Roadmap |
| Why not Polars alone? | Why not just use Polars? |
Pandas-style API (pydantable.pandas) |
Pandas UI |
PySpark-style API (pydantable.pyspark) |
PySpark UI · Parity matrix |
| Polars parity | Scorecard · Workflows · Transformation roadmap |
| Contributors | Developer guide |
| Architecture plan | Plan document |
| Python API (autodoc) | API reference |
Install
pip install pydantable
Optional dependencies (same package, feature extras):
pip install 'pydantable[polars]' # to_polars(); write_parquet/write_* from dict (IPC hop)
pip install 'pydantable[arrow]' # materialize_parquet/ipc from bytes, to_arrow, Table/RecordBatch
pip install 'pydantable[io]' # arrow + polars (recommended for mixed file I/O)
pip install 'pydantable[sql]' # fetch_sql / write_sql (SQLAlchemy + your DB driver)
From a git checkout you need a Rust toolchain and a build of the extension (e.g. Maturin):
pip install .
# editable: maturin develop --manifest-path pydantable-core/Cargo.toml
Full setup, make check-full, and release notes: Developer guide.
Quick start
from pydantable import DataFrameModel
class User(DataFrameModel):
id: int
age: int | None
df = User({"id": [1, 2], "age": [20, None]})
df2 = df.with_columns(age2=df.age * 2)
df3 = df2.select("id", "age2")
df4 = df3.filter(df3.age2 > 10)
# Columnar dict (good for JSON APIs)
print(df4.to_dict())
# {'age2': [40], 'id': [1]}
# List of Pydantic row models (default collect)
for row in df4.collect():
print(row.id, row.age2)
Materialization: collect() → list of row models; to_dict() / collect(as_lists=True) → dict[str, list]; to_polars() / to_arrow() when the matching extra is installed. Async: acollect, ato_dict, ato_polars, ato_arrow offload blocking work from the event loop (Execution, FastAPI).
Alternate import styles (same engine):
from pydantable.pandas import DataFrameModel as PandasDataFrameModel
from pydantable.pyspark import DataFrameModel as PySparkDataFrameModel
from pydantable import DataFrameModel as DefaultDataFrameModel
More examples: FastAPI, Polars-style workflows.
Validation policy: Constructors validate strictly by default. For messy row lists, ignore_errors=True plus on_validation_errors=callback receives failed rows (row_index, row, Pydantic errors). Trusted bulk paths use trusted_mode (off / shape_only / strict). Details: DataFrameModel, Supported types.
Expression & API surface
Typed Expr builds a Rust AST. Highlights:
- Globals in
select:global_sum,global_mean,global_count,global_min,global_max,global_row_count()(row count). PySpark façade:F.count()with no argument = row count. - Windows:
row_number,rank,dense_rank,window_sum,window_mean,window_min,window_max,lag,leadwithWindow.partitionBy(...).orderBy(..., nulls_last=...); framedrowsBetween/rangeBetweenwhere supported (window semantics). - Temporal & strings:
strptime,unix_timestamp,casttodate/datetime,dt_*parts,strip/lower/upper,str_replace,strip_prefix/suffix/chars, list helpers (list_len,list_get, …). - Maps (string keys):
map_len,map_get,map_contains_key,map_keys,map_values,map_entries,map_from_entries,element_at;binary_lenforbytescolumns.
PySpark-named wrappers: pydantable.pyspark.sql.functions mirrors much of the above (parity table).
Recent releases
0.23.0 — Lazy file roots and I/O vocabulary: out-of-core read_* / aread_* (ScanFileRoot) for Parquet, CSV, NDJSON, IPC, and JSON; DataFrame / DataFrameModel write_* for lazy pipeline sinks; eager reads named materialize_*; eager dict→file export_*; SQL fetch_sql / write_sql and from_sql on DataFrameModel; HTTP column readers fetch_*_url; lazy HTTP Parquet read_parquet_url with optional read_parquet_url_ctx / aread_parquet_url_ctx for temp-file cleanup; fetch_bytes / read_from_object_store support max_bytes. MissingRustExtensionError when the compiled extension is absent on scan/sink paths. Docs: I/O decision tree, I/O overview, I/O HTTP, I/O JSON, Changelog.
0.21.0 — Streamlit ergonomics: DataFrame / DataFrameModel implement the dataframe interchange protocol (__dataframe__) via PyArrow so st.dataframe(df) can render typed frames directly when pyarrow is installed (pip install 'pydantable[arrow]'). See Execution (interchange) and Streamlit integration for editing fallbacks (st.data_editor(df.to_arrow()) / to_polars()), costs, and limitations.
0.20.0 — UX, discovery, docs, and display: Quickstart, Execution (materialization costs, import styles, copy-as / interchange); core columns, shape, info(), describe() (int/float/bool/str), value_counts, set_display_options / PYDANTABLE_REPR_HTML_*, _repr_mimebundle_, optional PYDANTABLE_VERBOSE_ERRORS; Expr / WhenChain repr; PySpark show() / summary(); multi-line DataFrame repr and _repr_html_. Changelog, Interface contract Introspection.
0.19.0 — Pre-1.0 documentation consolidation: Versioning (0.x), interface contract cross-links, parity/README/index refresh for the 0.19 → 1.0 path, PERFORMANCE benchmark spot-check note, release-hygiene alignment with CI; group_by tests sort output where row order is not guaranteed (stable pytest-xdist). No new Expr or PySpark façade methods.
0.18.0 — Clearer Polars error context for group_by().agg(); explicit deferral of non-string map keys (Supported types, Roadmap); parity/roadmap doc refresh (no new façade APIs); Hypothesis smoke for join / group_by.
0.17.0 — Tighter docs and tests for map_get / map_contains_key after PyArrow map<utf8, …> ingest; more pyspark.sql.functions thin wrappers (str_replace, regexp_replace, strip_*, strptime, binary_len, list_*). Non-string map keys (dict[int, T], etc.) remain future work (Roadmap Later).
0.16.x — Arrow interchange (eager Parquet/IPC readers, to_arrow / ato_arrow, Table/RecordBatch constructors), FastAPI multipart and deployment docs, map-column arithmetic TypeError fix, DataFrame[Schema](pa.Table) constructor fix.
Older highlights: 0.15.0 async materialization and Arrow map ingest; 0.14.0 window null ordering and FastAPI TestClient coverage. Full history: Changelog.
Development
From a clone with .venv and pip install -e ".[dev]" plus a built extension:
make check-full # Ruff, mypy, Rust fmt / clippy / tests
PYTHONPATH=python pytest -q -n auto # full suite; omit -n auto for single-process
Rust tests need the Makefile PYO3_PYTHON / PYTHONPATH wiring: make rust-test. Details: Developer guide.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantable-0.23.0.tar.gz.
File metadata
- Download URL: pydantable-0.23.0.tar.gz
- Upload date:
- Size: 179.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3c764470ab5a8c9572850818ff1ca0fed24e5bdc4f94cd440d8a1c182d03689
|
|
| MD5 |
442bd6cb92f36d3ab3aeb45a72b3ffbe
|
|
| BLAKE2b-256 |
0f69ef5117345b70ba6b5229ece92495b9fee31dfb7c6ca4ba5007ba2193e8a2
|
File details
Details for the file pydantable-0.23.0-cp313-cp313-win_arm64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp313-cp313-win_arm64.whl
- Upload date:
- Size: 20.2 MB
- Tags: CPython 3.13, Windows ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acd356cd3d2de06f1220983f185a8098755736e38dad8f3257e71f2a5018adcb
|
|
| MD5 |
7143c198d111015136544a28af99a739
|
|
| BLAKE2b-256 |
3b44ae235e6e4cd174cc5407c01e833d001cbfdfc927f5b6176df6d95a101efe
|
File details
Details for the file pydantable-0.23.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 20.2 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ed6cd723da005de47ea0e7390e2f69abe804c9c657449c40eab52a9dcecc26c
|
|
| MD5 |
0a537e80ae6f5e5198d2cbf3c20c5022
|
|
| BLAKE2b-256 |
719f797cc9bc8e0a0a6808b7dc96d0464df42340a1af8d157877fe2a1ac824fa
|
File details
Details for the file pydantable-0.23.0-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 21.9 MB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b08bb03e1be5e86e9fc69f1ad39231087fad33b8a49f5705ed304903a5052adb
|
|
| MD5 |
00570c1e6d30cc3d40771a78d388d5e0
|
|
| BLAKE2b-256 |
dddfcd6ab615f86ad66c617c508bbd7847ad3212b4ba11441f728d35f82050d6
|
File details
Details for the file pydantable-0.23.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 22.3 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6470a38b7969206a9ddde920e219b0cc7e6cc83238a65d1d6e6e63bef46617b9
|
|
| MD5 |
2c9bbb5c04fee131ab756768a619a760
|
|
| BLAKE2b-256 |
b2c5591171f4d08ea4e523c803cf30c3fd292aad42a585cd12ba37ffb13dc6f0
|
File details
Details for the file pydantable-0.23.0-cp312-cp312-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp312-cp312-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 20.3 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbd3dc2f3ba9631d7ccc352348b3a47b005f438198de81ff1d9190e2ac17b4c0
|
|
| MD5 |
911ec4f9792c5b51cd73e7de26b686ac
|
|
| BLAKE2b-256 |
cbaf608d9a1403fb5101826f17c38cb68b006d695692a688b3a19df7bfcaa62d
|
File details
Details for the file pydantable-0.23.0-cp312-cp312-musllinux_1_2_aarch64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp312-cp312-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 18.6 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eed1959ca91656974b90ea1272543d68891628aa9b96ca0153fe148de88827a2
|
|
| MD5 |
b8c341f9dc6ff2e955fc2d8d757dc850
|
|
| BLAKE2b-256 |
5a0c539d6b76b0356f1f3e8394fa829dc45dd8cf988e10adda6fc04529f36143
|
File details
Details for the file pydantable-0.23.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 20.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3ae9ad19670193a5f6ddcc5b500efaeb0d2010c9ceb516851672694b544681f
|
|
| MD5 |
d69245fc9061560cf60b27c9497308f1
|
|
| BLAKE2b-256 |
995e90f7eb8978a6f45ad538d1692afd422c3f11b616fe017c79b171bb50ad8b
|
File details
Details for the file pydantable-0.23.0-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 21.9 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72be1589f58e4a16428daf9c3d4247287cb7180e6c61a8c1cb50b6ea12fa75cf
|
|
| MD5 |
2bea72ffe9535073aac639d3ab169357
|
|
| BLAKE2b-256 |
88517628f957d2f4f11709b5017d80cc1096a2526bb9f51dbf1efcec4ab84196
|
File details
Details for the file pydantable-0.23.0-cp311-cp311-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 21.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3291d463094f3f12d8b6b9df911dd6c72720a7470d5693b8b74fa4128a2cd81
|
|
| MD5 |
e66b002de8a241691ccef4322bb5d1c3
|
|
| BLAKE2b-256 |
16b590b170daf67db18476902ba4debf8f75fcbe1e103ba5c33dc26cbe460dad
|
File details
Details for the file pydantable-0.23.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 20.2 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85b7e300a1140562ac68dd33441147d253e8bdb16bf0b3ca8ee3e913d8f35d10
|
|
| MD5 |
05ab7fe497aaef708751e720451fed1b
|
|
| BLAKE2b-256 |
1ace41f5b0ce8fd535b62e99f4eff001e60d1c601f945ceaf4b7ee6eda19dcf5
|
File details
Details for the file pydantable-0.23.0-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 21.9 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24e2f58a295d04d31936931756b63163177946cc19268fc41bfe7c6bef6942cd
|
|
| MD5 |
e909b800cfd88c016840683674ab9b1a
|
|
| BLAKE2b-256 |
4494a4137b7e2f548717b72ff45de7141788e3d3cf71fe7ce1e1f10b9d3642c7
|
File details
Details for the file pydantable-0.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pydantable-0.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 22.2 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5729731104ce2a5870ea67ce168d9e007ac3cd5b90ae7e0e0acddcb55b813b10
|
|
| MD5 |
c1e73fb73c54aa01ce116f556d7f8910
|
|
| BLAKE2b-256 |
716729c0a005ae2d657463caf8632f4c0b7d688b005ce720eee6e5763ed76240
|