Skip to main content

Bring polars data back to Python objects, safely. Validation, schema/query generation.

Project description

Pydantic, for Polars

PyPI Tests codecov License Supported Python versions Black Pyright

Type-safe, maintainable interfaces between Polars and Python objects.

uv add pydantic-polars

pydantic_polars.validate

Go from Polars query -> Python objects

Learn the API by example:

from pydantic_polars import validate as plv

# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf)  # -> list[dict[str, Any]]

# I have a model now. Parse + validate a list of them.
users = plv.records[list[User]].collect(lf)  # -> list[User]

# Can there be an api around the list so I can model_dump?
users = plv.records[list[User]].collect_model(lf)  # -> pydantic.RootModel[list[User]]
users_json = users.model_dump_json()

# My query produces, at most, 1 user. But 0 rows may come back.
user = plv.get_record[User].collect(lf.filter(name='Mo').head(1))  # -> User | None

# My query produces *exactly* 1 user. It cannot produce 0 or 2.
user = plv.record[User].collect(lf.head(1))  # -> User

# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserNamedTuple]].collect_async(lf)  # -> list[UserNamedTupleRow]

# Need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age))

# Everyone's names, please
users_names = plv.column[list[str]].collect(lf.select(c.name))  # -> list[str]

# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max()))  # -> int | None

# Can we parallelize those in Rust, on other threads?
users_names, oldest_age = await plv.collect_all_async(
    plv.column[list[str]].defer(lf.select(c.name)),
    plv.item[int | None].defer(lf.select(c.age.max())),
)  # -> (list[str], int | None)

# Only need his age, but 0 rows may come back. Safely get int or None.
age = plv.get_item[int].collect(
    lf.filter(c.name == 'jeff').select(c.age).head(1)
)  # -> int | None

1. Pick a Shape

A shape is a fixed, non-configurable representation of a dataframe as plain Python objects.

records means, Produce a list of row dicts. It translates to df.rows(named=True).

records[T] means, Produce T by passing a list of row dicts as input to Pydantic validation.

plv.<shape>.collect(lf)     # Returns Default T for <shape>
plv.<shape>[T].collect(lf)  # Returns T
  • Scalar
    • item: One value.
  • Row-oriented
    • record: One row as a dict. records: List of many.
    • row: One row as a tuple. rows: List of many.
    • map: The rows of 2 columns, as one {col0: col1} dict.
    • keyed_records: Rows as one {col0: record} dict.
    • keyed_rows: Rows as one {col0: row} dict.
    • record_map: Rows of 2+ columns, as one {col0: {**rest_record}} dict.
    • row_map: Rows of 2+ columns, as one {col0: (*rest_row)} dict.
  • Column-oriented
    • column: One column as a list of values. columns: Tuple of many.
    • keys: One unique column as a list of values.
    • column_entry: One (name, column). column_entries: Tuple of many.
    • column_map: Many columns, as one {name: column} dict.
  • With table header
    • table_records: (names, records)
    • table_rows: (names, rows)
    • table_columns: (names, columns)
Shape Default T Returns Input query must produce
item Any T height == 1, width == 1
column list[item] T width == 1
keys list[item] T width == 1, col0 UNIQUE
row tuple[item, ...] T height == 1
record dict[name, item] T height == 1
column_entry tuple[name, column] T width == 1
records list[record] T
rows list[row] T
columns tuple[column, ...] T
keyed_records dict[item, record] T width >= 1, col0 UNIQUE
keyed_rows dict[item, row] T width >= 1, col0 UNIQUE
map dict[item, item] T width == 2, col0 UNIQUE
record_map dict[item, partial_record] T width >= 2, col0 UNIQUE
row_map dict[item, partial_row] T width >= 2, col0 UNIQUE
column_entries tuple[column_entry, ...] T
column_map dict[name, column] T
table_columns tuple[names, columns] T
table_rows tuple[names, rows] T
table_records tuple[names, records] T
get_item item T or None height <= 1, width == 1
get_row row T or None height <= 1
get_record record T or None height <= 1

2. Call a method to create T

All shapes have the same methods.

# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df)  # DataFrame equivalent

# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_polars-0.0.3.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_polars-0.0.3-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_polars-0.0.3.tar.gz.

File metadata

  • Download URL: pydantic_polars-0.0.3.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pydantic_polars-0.0.3.tar.gz
Algorithm Hash digest
SHA256 4a236517ecbc0bd06b3107a29397ecb99609a83f75031103afc77dfa019d0dd9
MD5 59577b58cab2b0fc552b24cf438593be
BLAKE2b-256 d16c514fc399dfa7c2ba1690c54736b5b17806ea6f540210df4bfda6065b7a12

See more details on using hashes here.

File details

Details for the file pydantic_polars-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_polars-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d85fa682278e35ecb508a8d3c8c46e0d881e617c84ee4c118c1f33221a3545cf
MD5 f6ed54c81e9140e50737e69fd9512918
BLAKE2b-256 a05fb39e42c685cbde1e3a800f1a302cde57970fc01bbab6d1f82871fad2accb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page