Skip to main content

Strongly-typed DataFrames for Python, powered by Rust.

Project description

PydanTable

CI Documentation PyPI version Python versions License: MIT

Typed dataframe transformations for FastAPI and Pydantic services, backed by a Rust execution core (Polars inside the native extension).

Current release: 0.23.0 · Python 3.10–3.13


At a glance

  • Schemas first: Pydantic field annotations define column types, nullability (T | None), and which expressions are legal. Many mistakes are caught when you build the Expr, not only when you run the query.
  • Two entry styles: DataFrameModel (SQLModel-like whole-table class with a generated row model) or DataFrame[YourSchema](data) with any Pydantic BaseModel schema.
  • Polars-shaped API: select, with_columns, filter, join, group_by, windows, reshape helpers — semantics are documented in the interface contract, not guaranteed identical to Polars on every edge case.
  • I/O: PrimaryDataFrame / DataFrameModel — lazy read_* / write_* (Parquet, CSV, NDJSON, IPC, JSON array-of-objects via read_json), eager materialize_* / fetch_sql, export_* / write_sql, SQL from_sql, Polars options via scan_kwargs / write_kwargs. HTTP Parquet: read_parquet_url leaves a temp file on disk; use read_parquet_url_ctx / aread_parquet_url_ctx ( pydantable.io or DataFrameModel) to unlink it when the block exits. Secondarypydantable.io — full mirror: lazy roots, column dicts, fetch_*_url, object store (max_bytes guards), extras. Top-level pydantable re-exports a small subset; from pydantable.io import … is the complete module API. If the native extension is missing, lazy scan/sink paths can raise MissingRustExtensionError (subclass of NotImplementedError). I/O decision tree, I/O overview, HTTP & object stores, JSON files, Execution.
  • Optional extras: pydantable[polars] for to_polars() and export_* (dict[str, list] → file); pydantable[arrow] for buffer/streaming Parquet/IPC, to_arrow / ato_arrow, and pa.Table / RecordBatch constructors; pydantable[io] bundles arrow + polars for full I/O; [sql] for fetch_sql / write_sql (SQLAlchemy; add psycopg, pymysql, etc. for your database URLs); [cloud], [excel], [kafka], [bq], [snowflake], [rap] for other bridges in docs/DATA_IO_SOURCES.md.
  • Optional façades: pydantable.pandas and pydantable.pyspark swap naming/imports; execution stays the same in-process core (not a real Spark or pandas backend).
  • Service-ready: Sync and async materialization (collect, to_dict, acollect, ato_dict, …), FastAPI patterns, and trusted ingest modes for bulk JSON or Arrow.
  • REPL / discovery: repr(df) on DataFrame and DataFrameModel shows the parameterized class, schema type, and column dtypes (wide tables truncate with … and N more). columns, shape, empty, dtypes, info(), and describe() (numeric, bool, str) are on the core API (see Interface contract for shape vs materialized rows). Expr and WhenChain have readable repr for debugging pipelines. Row counts in repr are omitted—use collect() / to_dict() when you need data. Jupyter / VS Code notebooks: _repr_html_() renders a bounded HTML table preview (no polars required); tune via pydantable.display or PYDANTABLE_REPR_HTML_*. Details: Execution.

Upgrading

  • From 0.22.x → 0.23.0: breaking I/O renames — eager file reads into dict[str, list] are materialize_* / amaterialize_* (not the old read_* / aread_* names); lazy local files use read_* / aread_* ( ScanFileRoot ); lazy plan output uses DataFrame.write_*; eager dict[str, list] → file uses export_* / aexport_*; read_sql / aread_sqlfetch_sql / afetch_sql; HTTP column readers read_*_urlfetch_*_url; lazy HTTP Parquet temp-file entry is read_parquet_url / aread_parquet_url. See Changelog and Execution.
  • From 0.21.x → 0.22.0: no intended breaking changes (see changelog for 0.22.0 I/O additions).

Documentation

The canonical manual is on Read the Docs: https://pydantable.readthedocs.io/en/latest/

Topic Read the Docs
Home / overview Documentation home
Five-minute tour Quickstart
Changelog & versions Changelog · Versioning (0.x)
DataFrameModel (inputs, transforms, collisions, materialization) DataFrameModel
Column types (scalars, structs, list[T], maps, trusted ingest) Supported data types
FastAPI (routers, bodies, async, multipart) FastAPI integration
Execution (collect, to_dict, to_polars, to_arrow, async, repr) Execution
Data I/O (primary: DataFrame / DataFrameModel; secondary: pydantable.io) I/O overview
Choosing an I/O API (lazy vs eager, io vs typed frame) I/O decision tree
HTTP(S) & object stores (URLs, fsspec, temp Parquet lifecycle) I/O HTTP
JSON (array of objects: lazy read_json, materialize/export) I/O JSON
Data sources & transports (planning, FastAPI async stacks) Data I/O sources
Streamlit (st.dataframe, interchange, editors) Streamlit
Semantics (nulls, joins, windows, reshape) Interface contract
Roadmap (shipped 0.23.x I/O, 0.20.x UX, path to v1.0.0) Roadmap
Why not Polars alone? Why not just use Polars?
Pandas-style API (pydantable.pandas) Pandas UI
PySpark-style API (pydantable.pyspark) PySpark UI · Parity matrix
Polars parity Scorecard · Workflows · Transformation roadmap
Contributors Developer guide
Architecture plan Plan document
Python API (autodoc) API reference

Install

pip install pydantable

Optional dependencies (same package, feature extras):

pip install 'pydantable[polars]'   # to_polars(); write_parquet/write_* from dict (IPC hop)
pip install 'pydantable[arrow]'    # materialize_parquet/ipc from bytes, to_arrow, Table/RecordBatch
pip install 'pydantable[io]'       # arrow + polars (recommended for mixed file I/O)
pip install 'pydantable[sql]'      # fetch_sql / write_sql (SQLAlchemy + your DB driver)

From a git checkout you need a Rust toolchain and a build of the extension (e.g. Maturin):

pip install .
# editable: maturin develop --manifest-path pydantable-core/Cargo.toml

Full setup, make check-full, and release notes: Developer guide.


Quick start

from pydantable import DataFrameModel

class User(DataFrameModel):
    id: int
    age: int | None

df = User({"id": [1, 2], "age": [20, None]})
df2 = df.with_columns(age2=df.age * 2)
df3 = df2.select("id", "age2")
df4 = df3.filter(df3.age2 > 10)

# Columnar dict (good for JSON APIs)
print(df4.to_dict())
# {'age2': [40], 'id': [1]}

# List of Pydantic row models (default collect)
for row in df4.collect():
    print(row.id, row.age2)

Materialization: collect()list of row models; to_dict() / collect(as_lists=True)dict[str, list]; to_polars() / to_arrow() when the matching extra is installed. Async: acollect, ato_dict, ato_polars, ato_arrow offload blocking work from the event loop (Execution, FastAPI).

Alternate import styles (same engine):

from pydantable.pandas import DataFrameModel as PandasDataFrameModel
from pydantable.pyspark import DataFrameModel as PySparkDataFrameModel
from pydantable import DataFrameModel as DefaultDataFrameModel

More examples: FastAPI, Polars-style workflows.

Validation policy: Constructors validate strictly by default. For messy row lists, ignore_errors=True plus on_validation_errors=callback receives failed rows (row_index, row, Pydantic errors). Trusted bulk paths use trusted_mode (off / shape_only / strict). Details: DataFrameModel, Supported types.


Expression & API surface

Typed Expr builds a Rust AST. Highlights:

  • Globals in select: global_sum, global_mean, global_count, global_min, global_max, global_row_count() (row count). PySpark façade: F.count() with no argument = row count.
  • Windows: row_number, rank, dense_rank, window_sum, window_mean, window_min, window_max, lag, lead with Window.partitionBy(...).orderBy(..., nulls_last=...); framed rowsBetween / rangeBetween where supported (window semantics).
  • Temporal & strings: strptime, unix_timestamp, cast to date/datetime, dt_* parts, strip / lower / upper, str_replace, strip_prefix / suffix / chars, list helpers (list_len, list_get, …).
  • Maps (string keys): map_len, map_get, map_contains_key, map_keys, map_values, map_entries, map_from_entries, element_at; binary_len for bytes columns.

PySpark-named wrappers: pydantable.pyspark.sql.functions mirrors much of the above (parity table).


Recent releases

0.23.0Lazy file roots and I/O vocabulary: out-of-core read_* / aread_* (ScanFileRoot) for Parquet, CSV, NDJSON, IPC, and JSON; DataFrame / DataFrameModel write_* for lazy pipeline sinks; eager reads named materialize_*; eager dict→file export_*; SQL fetch_sql / write_sql and from_sql on DataFrameModel; HTTP column readers fetch_*_url; lazy HTTP Parquet read_parquet_url with optional read_parquet_url_ctx / aread_parquet_url_ctx for temp-file cleanup; fetch_bytes / read_from_object_store support max_bytes. MissingRustExtensionError when the compiled extension is absent on scan/sink paths. Docs: I/O decision tree, I/O overview, I/O HTTP, I/O JSON, Changelog.

0.21.0Streamlit ergonomics: DataFrame / DataFrameModel implement the dataframe interchange protocol (__dataframe__) via PyArrow so st.dataframe(df) can render typed frames directly when pyarrow is installed (pip install 'pydantable[arrow]'). See Execution (interchange) and Streamlit integration for editing fallbacks (st.data_editor(df.to_arrow()) / to_polars()), costs, and limitations.

0.20.0UX, discovery, docs, and display: Quickstart, Execution (materialization costs, import styles, copy-as / interchange); core columns, shape, info(), describe() (int/float/bool/str), value_counts, set_display_options / PYDANTABLE_REPR_HTML_*, _repr_mimebundle_, optional PYDANTABLE_VERBOSE_ERRORS; Expr / WhenChain repr; PySpark show() / summary(); multi-line DataFrame repr and _repr_html_. Changelog, Interface contract Introspection.

0.19.0 — Pre-1.0 documentation consolidation: Versioning (0.x), interface contract cross-links, parity/README/index refresh for the 0.19 → 1.0 path, PERFORMANCE benchmark spot-check note, release-hygiene alignment with CI; group_by tests sort output where row order is not guaranteed (stable pytest-xdist). No new Expr or PySpark façade methods.

0.18.0 — Clearer Polars error context for group_by().agg(); explicit deferral of non-string map keys (Supported types, Roadmap); parity/roadmap doc refresh (no new façade APIs); Hypothesis smoke for join / group_by.

0.17.0 — Tighter docs and tests for map_get / map_contains_key after PyArrow map<utf8, …> ingest; more pyspark.sql.functions thin wrappers (str_replace, regexp_replace, strip_*, strptime, binary_len, list_*). Non-string map keys (dict[int, T], etc.) remain future work (Roadmap Later).

0.16.x — Arrow interchange (eager Parquet/IPC readers, to_arrow / ato_arrow, Table/RecordBatch constructors), FastAPI multipart and deployment docs, map-column arithmetic TypeError fix, DataFrame[Schema](pa.Table) constructor fix.

Older highlights: 0.15.0 async materialization and Arrow map ingest; 0.14.0 window null ordering and FastAPI TestClient coverage. Full history: Changelog.


Development

From a clone with .venv and pip install -e ".[dev]" plus a built extension:

make check-full                    # Ruff, mypy, Rust fmt / clippy / tests
PYTHONPATH=python pytest -q -n auto  # full suite; omit -n auto for single-process

Rust tests need the Makefile PYO3_PYTHON / PYTHONPATH wiring: make rust-test. Details: Developer guide.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantable-0.23.0.tar.gz (179.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pydantable-0.23.0-cp313-cp313-win_arm64.whl (20.2 MB view details)

Uploaded CPython 3.13Windows ARM64

pydantable-0.23.0-cp313-cp313-macosx_11_0_arm64.whl (20.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pydantable-0.23.0-cp313-cp313-macosx_10_12_x86_64.whl (21.9 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

pydantable-0.23.0-cp312-cp312-win_amd64.whl (22.3 MB view details)

Uploaded CPython 3.12Windows x86-64

pydantable-0.23.0-cp312-cp312-musllinux_1_2_x86_64.whl (20.3 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

pydantable-0.23.0-cp312-cp312-musllinux_1_2_aarch64.whl (18.6 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ ARM64

pydantable-0.23.0-cp312-cp312-macosx_11_0_arm64.whl (20.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pydantable-0.23.0-cp312-cp312-macosx_10_12_x86_64.whl (21.9 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

pydantable-0.23.0-cp311-cp311-manylinux_2_28_aarch64.whl (21.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

pydantable-0.23.0-cp311-cp311-macosx_11_0_arm64.whl (20.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pydantable-0.23.0-cp311-cp311-macosx_10_12_x86_64.whl (21.9 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

pydantable-0.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.2 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file pydantable-0.23.0.tar.gz.

File metadata

  • Download URL: pydantable-0.23.0.tar.gz
  • Upload date:
  • Size: 179.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for pydantable-0.23.0.tar.gz
Algorithm Hash digest
SHA256 d3c764470ab5a8c9572850818ff1ca0fed24e5bdc4f94cd440d8a1c182d03689
MD5 442bd6cb92f36d3ab3aeb45a72b3ffbe
BLAKE2b-256 0f69ef5117345b70ba6b5229ece92495b9fee31dfb7c6ca4ba5007ba2193e8a2

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp313-cp313-win_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp313-cp313-win_arm64.whl
Algorithm Hash digest
SHA256 acd356cd3d2de06f1220983f185a8098755736e38dad8f3257e71f2a5018adcb
MD5 7143c198d111015136544a28af99a739
BLAKE2b-256 3b44ae235e6e4cd174cc5407c01e833d001cbfdfc927f5b6176df6d95a101efe

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5ed6cd723da005de47ea0e7390e2f69abe804c9c657449c40eab52a9dcecc26c
MD5 0a537e80ae6f5e5198d2cbf3c20c5022
BLAKE2b-256 719f797cc9bc8e0a0a6808b7dc96d0464df42340a1af8d157877fe2a1ac824fa

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b08bb03e1be5e86e9fc69f1ad39231087fad33b8a49f5705ed304903a5052adb
MD5 00570c1e6d30cc3d40771a78d388d5e0
BLAKE2b-256 dddfcd6ab615f86ad66c617c508bbd7847ad3212b4ba11441f728d35f82050d6

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6470a38b7969206a9ddde920e219b0cc7e6cc83238a65d1d6e6e63bef46617b9
MD5 2c9bbb5c04fee131ab756768a619a760
BLAKE2b-256 b2c5591171f4d08ea4e523c803cf30c3fd292aad42a585cd12ba37ffb13dc6f0

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 fbd3dc2f3ba9631d7ccc352348b3a47b005f438198de81ff1d9190e2ac17b4c0
MD5 911ec4f9792c5b51cd73e7de26b686ac
BLAKE2b-256 cbaf608d9a1403fb5101826f17c38cb68b006d695692a688b3a19df7bfcaa62d

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp312-cp312-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 eed1959ca91656974b90ea1272543d68891628aa9b96ca0153fe148de88827a2
MD5 b8c341f9dc6ff2e955fc2d8d757dc850
BLAKE2b-256 5a0c539d6b76b0356f1f3e8394fa829dc45dd8cf988e10adda6fc04529f36143

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d3ae9ad19670193a5f6ddcc5b500efaeb0d2010c9ceb516851672694b544681f
MD5 d69245fc9061560cf60b27c9497308f1
BLAKE2b-256 995e90f7eb8978a6f45ad538d1692afd422c3f11b616fe017c79b171bb50ad8b

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 72be1589f58e4a16428daf9c3d4247287cb7180e6c61a8c1cb50b6ea12fa75cf
MD5 2bea72ffe9535073aac639d3ab169357
BLAKE2b-256 88517628f957d2f4f11709b5017d80cc1096a2526bb9f51dbf1efcec4ab84196

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f3291d463094f3f12d8b6b9df911dd6c72720a7470d5693b8b74fa4128a2cd81
MD5 e66b002de8a241691ccef4322bb5d1c3
BLAKE2b-256 16b590b170daf67db18476902ba4debf8f75fcbe1e103ba5c33dc26cbe460dad

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 85b7e300a1140562ac68dd33441147d253e8bdb16bf0b3ca8ee3e913d8f35d10
MD5 05ab7fe497aaef708751e720451fed1b
BLAKE2b-256 1ace41f5b0ce8fd535b62e99f4eff001e60d1c601f945ceaf4b7ee6eda19dcf5

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 24e2f58a295d04d31936931756b63163177946cc19268fc41bfe7c6bef6942cd
MD5 e909b800cfd88c016840683674ab9b1a
BLAKE2b-256 4494a4137b7e2f548717b72ff45de7141788e3d3cf71fe7ce1e1f10b9d3642c7

See more details on using hashes here.

File details

Details for the file pydantable-0.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pydantable-0.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5729731104ce2a5870ea67ce168d9e007ac3cd5b90ae7e0e0acddcb55b813b10
MD5 c1e73fb73c54aa01ce116f556d7f8910
BLAKE2b-256 716729c0a005ae2d657463caf8632f4c0b7d688b005ce720eee6e5763ed76240

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page