Bring polars data back to Python objects, safely. Validation, schema/query generation.
Project description
Pydantic, for Polars
Type-safe, maintainable interfaces between Polars and Python objects.
uv add pydantic-polars
pydantic_polars.validate
Go from Polars query -> Python objects
Learn the API by example:
from pydantic_polars import validate as plv
# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf) # -> list[dict[str, Any]]
# I have a model now. Parse + validate a list of them.
users = plv.records[list[User]].collect(lf) # -> list[User]
# Can there be an api around the list so I can model_dump?
users = plv.records[list[User]].collect_model(lf) # -> pydantic.RootModel[list[User]]
users_json = users.model_dump_json()
# My query produces, at most, 1 user. But 0 rows may come back.
user = plv.get_record[User].collect(lf.filter(name='Mo').head(1)) # -> User | None
# My query produces *exactly* 1 user. It cannot produce 0 or 2.
user = plv.record[User].collect(lf.head(1)) # -> User
# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserNamedTuple]].collect_async(lf) # -> list[UserNamedTupleRow]
# Need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age))
# Everyone's names, please
users_names = plv.column[list[str]].collect(lf.select(c.name)) # -> list[str]
# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max())) # -> int | None
# Can we parallelize those in Rust, on other threads?
users_names, oldest_age = await plv.collect_all_async(
plv.column[list[str]].defer(lf.select(c.name)),
plv.item[int | None].defer(lf.select(c.age.max())),
) # -> (list[str], int | None)
# Only need his age, but 0 rows may come back. Safely get int or None.
age = plv.get_item[int].collect(
lf.filter(c.name == 'jeff').select(c.age).head(1)
) # -> int | None
1. Pick a Shape
A shape is a fixed, non-configurable representation of a dataframe as plain Python objects.
records means, Produce a list of row dicts. It translates to df.rows(named=True).
records[T] means, Produce T by passing a list of row dicts as input to Pydantic validation.
plv.<shape>.collect(lf) # Returns Default T for <shape>
plv.<shape>[T].collect(lf) # Returns T
- Scalar
item: One value.
- Row-oriented
record: One row as a dict.records: List of many.row: One row as a tuple.rows: List of many.map: The rows of 2 columns, as one {col0: col1} dict.keyed_records: Rows as one {col0: record} dict.keyed_rows: Rows as one {col0: row} dict.record_map: Rows of 2+ columns, as one {col0: {**rest_record}} dict.row_map: Rows of 2+ columns, as one {col0: (*rest_row)} dict.
- Column-oriented
column: One column as a list of values.columns: Tuple of many.keys: One unique column as a list of values.column_entry: One (name, column).column_entries: Tuple of many.column_map: Many columns, as one {name: column} dict.
- With table header
table_records: (names, records)table_rows: (names, rows)table_columns: (names, columns)
| Shape | Default T |
Returns | Input query must produce |
|---|---|---|---|
item |
Any |
T |
height == 1, width == 1 |
column |
list[item] |
T |
width == 1 |
keys |
list[item] |
T |
width == 1, col0 UNIQUE |
row |
tuple[item, ...] |
T |
height == 1 |
record |
dict[name, item] |
T |
height == 1 |
column_entry |
tuple[name, column] |
T |
width == 1 |
records |
list[record] |
T |
|
rows |
list[row] |
T |
|
columns |
tuple[column, ...] |
T |
|
keyed_records |
dict[item, record] |
T |
width >= 1, col0 UNIQUE |
keyed_rows |
dict[item, row] |
T |
width >= 1, col0 UNIQUE |
map |
dict[item, item] |
T |
width == 2, col0 UNIQUE |
record_map |
dict[item, partial_record] |
T |
width >= 2, col0 UNIQUE |
row_map |
dict[item, partial_row] |
T |
width >= 2, col0 UNIQUE |
column_entries |
tuple[column_entry, ...] |
T |
|
column_map |
dict[name, column] |
T |
|
table_columns |
tuple[names, columns] |
T |
|
table_rows |
tuple[names, rows] |
T |
|
table_records |
tuple[names, records] |
T |
|
get_item |
item |
T or None |
height <= 1, width == 1 |
get_row |
row |
T or None |
height <= 1 |
get_record |
record |
T or None |
height <= 1 |
2. Call a method to create T
All shapes have the same methods.
# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df) # DataFrame equivalent
# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_polars-0.0.3.tar.gz.
File metadata
- Download URL: pydantic_polars-0.0.3.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a236517ecbc0bd06b3107a29397ecb99609a83f75031103afc77dfa019d0dd9
|
|
| MD5 |
59577b58cab2b0fc552b24cf438593be
|
|
| BLAKE2b-256 |
d16c514fc399dfa7c2ba1690c54736b5b17806ea6f540210df4bfda6065b7a12
|
File details
Details for the file pydantic_polars-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pydantic_polars-0.0.3-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d85fa682278e35ecb508a8d3c8c46e0d881e617c84ee4c118c1f33221a3545cf
|
|
| MD5 |
f6ed54c81e9140e50737e69fd9512918
|
|
| BLAKE2b-256 |
a05fb39e42c685cbde1e3a800f1a302cde57970fc01bbab6d1f82871fad2accb
|