Bring polars data back to Python objects, safely. Validation, schema/query generation.
Project description
Pydantic, for Polars
Type-safe, maintainable interfaces between Polars and Python objects.
uv add pydantic-polars
pydantic_polars.validate
Go from Polars query -> Python objects
Provides an exhaustive set of distinct shape contracts.
Each is a structural guarantee, to which you can attach a type-form for parsing/validation.
from pydantic_polars import validate as plv
# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf) # -> list[dict[str, Any]]
# "I have a model now. Parse + validate a list of them."
users = plv.records[list[User]].collect(lf) # -> list[User]
# Can there be an api around that list, so I can model_dump?
users = plv.records[list[User]].collect_model(lf) # -> pydantic.RootModel[list[User]]
# My query produces, at most, 1 user. But 0 rows may come back.
user = plv.get_record[User | None].collect(lf.filter(name='Mo').head(1)) # -> User | None
# My query produces *exactly* 1 user. It cannot produce 0.
user = plv.record[User].collect(lf.head(1)) # -> User
# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserNamedTuple]].collect_async(lf) # -> list[UserNamedTupleRow]
# Need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age))
# Everyone's names, please
users_names = plv.column[list[str]].collect(lf.select(c.name)) # -> list[str]
# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max())) # -> int | None
# Can we parallelize those in Rust, on other threads?
users_names, oldest_age = await plv.collect_all_async(
plv.column[list[str]].defer(lf.select(c.name)),
plv.item[int | None].defer(lf.select(c.age.max())),
) # -> (list[str], int | None)
# Only need his age, but 0 rows may come back. Safely get int or None.
age = plv.get_item[int | None].collect(
lf.filter(c.name == 'jeff').select(c.age).head(1)
) # -> int | None
Each name after plv. is a shape.
plv.<shape>.collect(lf) # Returns primitive T for <shape>
plv.<shape>[T].collect(lf) # Returns T (yes, T can be any type form)
A shape is a fixed contract.
Part of the contract is that all values in the dataframe are returned (materializing data you don't need is a bug).
Example: map makes a dict from 2 columns, but only if column 0 was unique. (if len(result) == input_df.height)
A shape has only one meaning. It can't be configured to change the structure.
Example: item doesn't just grab a value, it asserts the dataframe has exactly 1 value. If it may have 0, that's a different shape: get_item.
Step 1. Pick a Shape
- Scalar
item: One value.
- Row-oriented
record,records: Row(s) as dict(s).row,rows: tuple(s) instead of dict(s).(record|row)_entry,(record|row)_entries: Row(s) as '(col0, other_cols)' pair(s).keyed_(record|row)_entries: As '(col0, all_cols)' pairs.
- Uniquely-keyed rows
map: Rows as one tall dict from 2 columns: {unique_col0: col1}.(record|row)_map: Rows as one tall dict: {unique_col0: other_cols}.keyed_(record|row)_map: As {unique_col0: all_cols}.
- Column-oriented
column: One column as a list (usekeysif unique).columns: Tuple of any.column_map: One {name: column} dict of any columns.
- With table header
table_<shape>: (names, shapeT). For example,table_records: (names, records)
Step 2 (optional). Set a custom T for Pydantic to validate into
Examples:
plv.column[list[float]]
plv.column[tuple[float, ...]]
plv.column[list[float] | list[Decimal] | list[int]]
plv.column[MyCustomArrayType[float | None]]
[!TIP] Skipping this step (e.g.
plv.column.collect(lf)) means skipping Pydantic validation. Forcolumn, this means you getlf.collect().to_series().to_list()directly.
Step 3. Call a method to create T
All shapes have the same methods. These ones return T:
# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df) # DataFrame equivalent to collect
# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))
*_model variants of each method also exist, to return T wrapped in pydantic.RootModel[T].
Shapes
| Shape | Default T |
Input df must have |
|---|---|---|
item |
Any |
height == 1, width == 1 |
get_item |
Any or None |
height <= 1, width == 1 |
map |
dict[item, item] |
width == 2, col0 UNIQUE |
table_map |
(names, map) |
width == 2, col0 UNIQUE |
column |
list[item] |
width == 1 |
keys |
list[item] |
width == 1, col0 UNIQUE |
columns |
(column, ...) |
|
column_map |
dict[name, column] |
|
table_columns |
(names, columns) |
|
record |
dict[name, item] |
height == 1 |
get_record |
dict[name, item] or None |
height <= 1 |
record_entry |
(item, rest_record) |
height == 1, width >= 2 |
get_record_entry |
(item, rest_record) or None |
height <= 1, width >= 2 |
records |
list[record] |
|
record_map |
dict[item, rest_record] |
width >= 2, col0 UNIQUE |
keyed_record_map |
dict[item, record] |
width >= 1, col0 UNIQUE |
record_entries |
list[record_entry] |
width >= 2 |
keyed_record_entries |
list[(item, record)] |
width >= 1 |
table_records |
(names, records) |
|
table_record_map |
(names, record_map) |
width >= 2, col0 UNIQUE |
table_keyed_record_map |
(names, keyed_record_map) |
width >= 1, col0 UNIQUE |
table_record_entries |
(names, record_entries) |
width >= 2 |
table_keyed_record_entries |
(names, keyed_record_entries) |
width >= 1 |
row |
(item, ...) |
height == 1 |
get_row |
(item, ...) or None |
height <= 1 |
row_entry |
(item, rest_row) |
height == 1, width >= 2 |
get_row_entry |
(item, rest_row) or None |
height <= 1, width >= 2 |
rows |
list[row] |
|
row_map |
dict[item, rest_row] |
width >= 2, col0 UNIQUE |
keyed_row_map |
dict[item, row] |
width >= 1, col0 UNIQUE |
row_entries |
list[row_entry] |
width >= 2 |
keyed_row_entries |
list[(item, row)] |
width >= 1 |
table_rows |
(names, rows) |
|
table_row_map |
(names, row_map) |
width >= 2, col0 UNIQUE |
table_keyed_row_map |
(names, keyed_row_map) |
width >= 1, col0 UNIQUE |
table_row_entries |
(names, row_entries) |
width >= 2 |
table_keyed_row_entries |
(names, keyed_row_entries) |
width >= 1 |
item, get_item
|
|
|
'Joy'
|
map
|
|
|
{
'A': 'Joy',
'B': 'Ben',
'C': 'Jin',
}
|
table_map
|
|
|
(
('id', 'name'),
{
'A': 'Joy',
'B': 'Ben',
'C': 'Jin',
},
)
|
column, keys
|
|
|
[
'Joy',
'Ben',
'Jin',
]
|
columns
|
|
|
(
['A', 'B', 'C'],
['Joy', 'Ben', 'Jin'],
[59, 25, 40],
)
|
column_map
|
|
|
{
id: ['A', 'B', 'C'],
name: ['Joy', 'Ben', 'Jin'],
age: [59, 25, 40],
}
|
table_columns
|
|
|
(
('id', 'name', 'age'),
(
['A', 'B', 'C'],
['Joy', 'Ben', 'Jin'],
[59, 25, 40],
),
)
|
record, get_record
|
|
|
{ id: 'A', name: 'Joy', age: 59 }
|
record_entry, get_record_entry
|
|
|
('A', { name: 'Joy', age: 59 })
|
records
|
|
|
[
{ id: 'A', name: 'Joy', age: 59 },
{ id: 'B', name: 'Ben', age: 25 },
{ id: 'C', name: 'Jin', age: 40 },
]
|
record_map
|
|
|
{
'A': { name: 'Joy', age: 59 },
'B': { name: 'Ben', age: 25 },
'C': { name: 'Jin', age: 40 },
}
|
keyed_record_map
|
|
|
{
'A': { id: 'A', name: 'Joy', age: 59 },
'B': { id: 'B', name: 'Ben', age: 25 },
'C': { id: 'C', name: 'Jin', age: 40 },
}
|
record_entries
|
|
|
[
('A', { name: 'Joy', age: 59 }),
('B', { name: 'Ben', age: 25 }),
('C', { name: 'Jin', age: 40 }),
]
|
keyed_record_entries
|
|
|
[
('A', { id: 'A', name: 'Joy', age: 59 }),
('B', { id: 'B', name: 'Ben', age: 25 }),
('C', { id: 'C', name: 'Jin', age: 40 }),
]
|
table_records
|
|
|
(
('id', 'name', 'age'),
[
{ id: 'A', name: 'Joy', age: 59},
{ id: 'B', name: 'Ben', age: 25},
{ id: 'C', name: 'Jin', age: 40},
],
)
|
table_record_map
|
|
|
(
('id', 'name', 'age'),
{
'A': { name: 'Joy', age: 59 },
'B': { name: 'Ben', age: 25 },
'C': { name: 'Jin', age: 40 },
},
)
|
table_keyed_record_map
|
|
|
(
('id', 'name', 'age'),
{
'A': { id: 'A', name: 'Joy', age: 59 },
'B': { id: 'B', name: 'Ben', age: 25 },
'C': { id: 'C', name: 'Jin', age: 40 },
},
)
|
table_record_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', { name: 'Joy', age: 59 }),
('B', { name: 'Ben', age: 25 }),
('C', { name: 'Jin', age: 40 }),
],
)
|
table_keyed_record_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', { id: 'A', name: 'Joy', age: 59 }),
('B', { id: 'B', name: 'Ben', age: 25 }),
('C', { id: 'C', name: 'Jin', age: 40 }),
],
)
|
row, get_row
|
|
|
('A', 'Joy', 59)
|
row_entry, get_row_entry
|
|
|
('A', ('Joy', 59))
|
rows
|
|
|
[
('A', 'Joy', 59),
('B', 'Ben', 25),
('C', 'Jin', 40),
]
|
row_map
|
|
|
{
'A': ('Joy', 59),
'B': ('Ben', 25),
'C': ('Jin', 40),
}
|
keyed_row_map
|
|
|
{
'A': ('A', 'Joy', 59),
'B': ('B', 'Ben', 25),
'C': ('C', 'Jin', 40),
}
|
row_entries
|
|
|
[
('A', ('Joy', 59)),
('B', ('Ben', 25)),
('C', ('Jin', 40)),
]
|
keyed_row_entries
|
|
|
{
('A', ('A', 'Joy', 59)),
('B', ('B', 'Ben', 25)),
('C', ('C', 'Jin', 40)),
}
|
table_rows
|
|
|
(
('id', 'name', 'age'),
[
('A', 'Joy', 59),
('B', 'Ben', 25),
('C', 'Jin', 40),
],
)
|
table_row_map
|
|
|
(
('id', 'name', 'age'),
{
'A': ('Joy', 59),
'B': ('Ben', 25),
'C': ('Jin', 40),
},
)
|
table_keyed_row_map
|
|
|
(
('id', 'name', 'age'),
{
'A': ('A', 'Joy', 59),
'B': ('A', 'Ben', 25),
'C': ('A', 'Jin', 40),
},
)
|
table_row_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', ('Joy', 59)),
('B', ('Ben', 25)),
('C', ('Jin', 40)),
],
)
|
table_keyed_row_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', ('A', 'Joy', 59)),
('B', ('A', 'Ben', 25)),
('C', ('A', 'Jin', 40)),
],
)
|
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_polars-0.0.4.tar.gz.
File metadata
- Download URL: pydantic_polars-0.0.4.tar.gz
- Upload date:
- Size: 62.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0957850242af7b944390f558bafdb0e77f9c9fc591488899531e59af2ec15984
|
|
| MD5 |
0105bc5d5c0e4629f4f504297ad8e6df
|
|
| BLAKE2b-256 |
7a9a69ad1d968bdd7af34397f6291c554c5565d395016fc897ac64519491d4d1
|
File details
Details for the file pydantic_polars-0.0.4-py3-none-any.whl.
File metadata
- Download URL: pydantic_polars-0.0.4-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d72b7ea01e1a70209acf46f75926cecf430088ab17a8be46dad44dc7a93f2fab
|
|
| MD5 |
8aceb84bb6b3bd24288aeecd19bfe4ea
|
|
| BLAKE2b-256 |
9e9b62f7da0e3152e8fc29bd2828203e949200fb28f7cf17ec2ede084bb0ea7a
|