Bring polars data back to Python objects, safely. Validation, schema/query generation.
Project description
Pydantic, for Polars
Type-safe, maintainable interfaces between Polars and Python objects.
uv add pydantic-polars
pydantic_polars.validate
Go from Polars query -> Python objects
Provides an exhaustive set of distinct shape contracts.
Each is a structural guarantee, to which you can attach a type-form for parsing/validation.
from pydantic_polars import validate as plv
# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf) # -> list[dict[str, Any]]
# "I have a model now. Parse + validate a list of them."
users = plv.records[list[User]].collect(lf) # -> list[User]
# Can I have model methods on that list, so I can model_dump() it later?
users = plv.records[list[User]].collect_model(lf) # -> pydantic.RootModel[list[User]]
# My query produces *exactly* 1 user. It cannot produce 0.
user = plv.record[User].collect(lf.head(1)) # -> User
# Correction: Zero rows is possible
user = plv.get_record[User | None].collect(lf.head(1)) # -> User | None
# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserTuple]].collect_async(lf) # -> list[UserNamedTupleRow]
# Ok, that query was too big to fit into memory. Let's stream-compute + batch-collect
for users in plv.rows[list[UserTuple]].collect_batches(lf): # -> Iterator[list[UserTuple]]
write_somewhere(users)
# I need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age)) # -> dict[str, int]
# Everyone's names?
users_names = plv.column[list[str]].collect(lf.select(c.name)) # -> list[str]
# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max())) # -> int | None
# Can we parallelize those and await them, without confusing the type-checker?
users_names, oldest_age = await plv.collect_all_async(
plv.column[list[str]].defer(lf.select(c.name)),
plv.item[int | None].defer(lf.select(c.age.max())),
) # -> (list[str], int | None)
Each name after plv. is a shape.
plv.<shape>.collect(lf) # Returns primitive T for <shape>
plv.<shape>[T].collect(lf) # Returns T (yes, T can be any type form)
A shape is a fixed contract. Part of the contract is that all values in the dataframe are returned (materializing data you don't need is a bug).
Example: map makes a dict from 2 columns, but only if column 0 was unique. (if len(result) == input_df.height)
Step 1. Pick a Shape
- Scalar
item: One value.
- Row-oriented
record,records: Row(s) as dict(s).row,rows: tuple(s) instead of dict(s).(record|row)_entry,(record|row)_entries: Row(s) as '(col0, other_cols)' pair(s).keyed_(record|row)_entries: As '(col0, all_cols)' pairs.
- Uniquely-keyed rows
map: Rows as one tall dict from 2 columns: {unique_col0: col1}.(record|row)_map: Rows as one tall dict: {unique_col0: other_cols}.keyed_(record|row)_map: As {unique_col0: all_cols}.
- Column-oriented
column: One column as a list (usekeysif unique).columns: Tuple of any.column_map: One {name: column} dict of any columns.
- With table header
table_<shape>: (names, shapeT). For example,table_records: (names, records)
Step 2 (optional). Set a custom T for Pydantic to validate into
Examples:
plv.column[list[float]]
plv.column[tuple[float, ...]]
plv.column[list[float] | list[Decimal] | list[int]]
plv.column[MyCustomArrayType[float | None]]
[!TIP] Skipping this step (e.g.
plv.column.collect(lf)) means skipping Pydantic validation. Forcolumn, this means you getlf.collect().to_series().to_list()directly.
Step 3. Call a method to create T
All shapes have the same interface. These produce T:
# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df) # DataFrame equivalent to collect
# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))
# Streaming-compute, with batch materialization
for result_batch in shape.collect_batches(lf, chunk_size=1_000_000):
... # Each batch is still T
*_model variants of each method exist, to return T wrapped in pydantic.RootModel[T].
Shapes
| Shape | Default T |
Input df must have |
|---|---|---|
item |
Any |
height == 1, width == 1 |
get_item |
Any or None |
height <= 1, width == 1 |
map |
dict[item, item] |
width == 2, col0 UNIQUE |
table_map |
(names, map) |
width == 2, col0 UNIQUE |
column |
list[item] |
width == 1 |
keys |
list[item] |
width == 1, col0 UNIQUE |
columns |
(column, ...) |
|
column_map |
dict[name, column] |
|
table_columns |
(names, columns) |
|
record |
dict[name, item] |
height == 1 |
get_record |
dict[name, item] or None |
height <= 1 |
record_entry |
(item, rest_record) |
height == 1, width >= 2 |
get_record_entry |
(item, rest_record) or None |
height <= 1, width >= 2 |
records |
list[record] |
|
record_map |
dict[item, rest_record] |
width >= 2, col0 UNIQUE |
keyed_record_map |
dict[item, record] |
width >= 1, col0 UNIQUE |
record_entries |
list[record_entry] |
width >= 2 |
keyed_record_entries |
list[(item, record)] |
width >= 1 |
table_records |
(names, records) |
|
table_record_map |
(names, record_map) |
width >= 2, col0 UNIQUE |
table_keyed_record_map |
(names, keyed_record_map) |
width >= 1, col0 UNIQUE |
table_record_entries |
(names, record_entries) |
width >= 2 |
table_keyed_record_entries |
(names, keyed_record_entries) |
width >= 1 |
row |
(item, ...) |
height == 1 |
get_row |
(item, ...) or None |
height <= 1 |
row_entry |
(item, rest_row) |
height == 1, width >= 2 |
get_row_entry |
(item, rest_row) or None |
height <= 1, width >= 2 |
rows |
list[row] |
|
row_map |
dict[item, rest_row] |
width >= 2, col0 UNIQUE |
keyed_row_map |
dict[item, row] |
width >= 1, col0 UNIQUE |
row_entries |
list[row_entry] |
width >= 2 |
keyed_row_entries |
list[(item, row)] |
width >= 1 |
table_rows |
(names, rows) |
|
table_row_map |
(names, row_map) |
width >= 2, col0 UNIQUE |
table_keyed_row_map |
(names, keyed_row_map) |
width >= 1, col0 UNIQUE |
table_row_entries |
(names, row_entries) |
width >= 2 |
table_keyed_row_entries |
(names, keyed_row_entries) |
width >= 1 |
item, get_item
|
|
|
'Joy'
|
map
|
|
|
{
'A': 'Joy',
'B': 'Ben',
'C': 'Jin',
}
|
table_map
|
|
|
(
('id', 'name'),
{
'A': 'Joy',
'B': 'Ben',
'C': 'Jin',
},
)
|
column, keys
|
|
|
[
'Joy',
'Ben',
'Jin',
]
|
columns
|
|
|
(
['A', 'B', 'C'],
['Joy', 'Ben', 'Jin'],
[59, 25, 40],
)
|
column_map
|
|
|
{
id: ['A', 'B', 'C'],
name: ['Joy', 'Ben', 'Jin'],
age: [59, 25, 40],
}
|
table_columns
|
|
|
(
('id', 'name', 'age'),
(
['A', 'B', 'C'],
['Joy', 'Ben', 'Jin'],
[59, 25, 40],
),
)
|
record, get_record
|
|
|
{ id: 'A', name: 'Joy', age: 59 }
|
record_entry, get_record_entry
|
|
|
('A', { name: 'Joy', age: 59 })
|
records
|
|
|
[
{ id: 'A', name: 'Joy', age: 59 },
{ id: 'B', name: 'Ben', age: 25 },
{ id: 'C', name: 'Jin', age: 40 },
]
|
record_map
|
|
|
{
'A': { name: 'Joy', age: 59 },
'B': { name: 'Ben', age: 25 },
'C': { name: 'Jin', age: 40 },
}
|
keyed_record_map
|
|
|
{
'A': { id: 'A', name: 'Joy', age: 59 },
'B': { id: 'B', name: 'Ben', age: 25 },
'C': { id: 'C', name: 'Jin', age: 40 },
}
|
record_entries
|
|
|
[
('A', { name: 'Joy', age: 59 }),
('B', { name: 'Ben', age: 25 }),
('C', { name: 'Jin', age: 40 }),
]
|
keyed_record_entries
|
|
|
[
('A', { id: 'A', name: 'Joy', age: 59 }),
('B', { id: 'B', name: 'Ben', age: 25 }),
('C', { id: 'C', name: 'Jin', age: 40 }),
]
|
table_records
|
|
|
(
('id', 'name', 'age'),
[
{ id: 'A', name: 'Joy', age: 59},
{ id: 'B', name: 'Ben', age: 25},
{ id: 'C', name: 'Jin', age: 40},
],
)
|
table_record_map
|
|
|
(
('id', 'name', 'age'),
{
'A': { name: 'Joy', age: 59 },
'B': { name: 'Ben', age: 25 },
'C': { name: 'Jin', age: 40 },
},
)
|
table_keyed_record_map
|
|
|
(
('id', 'name', 'age'),
{
'A': { id: 'A', name: 'Joy', age: 59 },
'B': { id: 'B', name: 'Ben', age: 25 },
'C': { id: 'C', name: 'Jin', age: 40 },
},
)
|
table_record_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', { name: 'Joy', age: 59 }),
('B', { name: 'Ben', age: 25 }),
('C', { name: 'Jin', age: 40 }),
],
)
|
table_keyed_record_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', { id: 'A', name: 'Joy', age: 59 }),
('B', { id: 'B', name: 'Ben', age: 25 }),
('C', { id: 'C', name: 'Jin', age: 40 }),
],
)
|
row, get_row
|
|
|
('A', 'Joy', 59)
|
row_entry, get_row_entry
|
|
|
('A', ('Joy', 59))
|
rows
|
|
|
[
('A', 'Joy', 59),
('B', 'Ben', 25),
('C', 'Jin', 40),
]
|
row_map
|
|
|
{
'A': ('Joy', 59),
'B': ('Ben', 25),
'C': ('Jin', 40),
}
|
keyed_row_map
|
|
|
{
'A': ('A', 'Joy', 59),
'B': ('B', 'Ben', 25),
'C': ('C', 'Jin', 40),
}
|
row_entries
|
|
|
[
('A', ('Joy', 59)),
('B', ('Ben', 25)),
('C', ('Jin', 40)),
]
|
keyed_row_entries
|
|
|
{
('A', ('A', 'Joy', 59)),
('B', ('B', 'Ben', 25)),
('C', ('C', 'Jin', 40)),
}
|
table_rows
|
|
|
(
('id', 'name', 'age'),
[
('A', 'Joy', 59),
('B', 'Ben', 25),
('C', 'Jin', 40),
],
)
|
table_row_map
|
|
|
(
('id', 'name', 'age'),
{
'A': ('Joy', 59),
'B': ('Ben', 25),
'C': ('Jin', 40),
},
)
|
table_keyed_row_map
|
|
|
(
('id', 'name', 'age'),
{
'A': ('A', 'Joy', 59),
'B': ('A', 'Ben', 25),
'C': ('A', 'Jin', 40),
},
)
|
table_row_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', ('Joy', 59)),
('B', ('Ben', 25)),
('C', ('Jin', 40)),
],
)
|
table_keyed_row_entries
|
|
|
(
('id', 'name', 'age'),
[
('A', ('A', 'Joy', 59)),
('B', ('A', 'Ben', 25)),
('C', ('A', 'Jin', 40)),
],
)
|
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_polars-0.0.5.tar.gz.
File metadata
- Download URL: pydantic_polars-0.0.5.tar.gz
- Upload date:
- Size: 63.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab0bbb572d9bef1a7778981e2c1b4d694888b5677d11caf994ad1d6a090396fa
|
|
| MD5 |
05502c876434656621b2de219742edf7
|
|
| BLAKE2b-256 |
9e06927599b35ae34f51b87b15d2a66b25fdc28eda0fbddd537e70d3b2aec4a3
|
File details
Details for the file pydantic_polars-0.0.5-py3-none-any.whl.
File metadata
- Download URL: pydantic_polars-0.0.5-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c009384c1f8fb50c84077c53c377d3f21341d4e64c37964c1fc4b2f857ef77a
|
|
| MD5 |
ce4536575ededfaf8a91f4e7826ff455
|
|
| BLAKE2b-256 |
8f0a8fc97d9897c7f84d8d77ca6cebc8e0c8f048e61624bb44d9c99e214a9f16
|