Skip to main content

Bring polars data back to Python objects, safely. Validation, schema/query generation.

Project description

Pydantic, for Polars

PyPI Tests codecov License Supported Python versions Black Pyright

Type-safe, maintainable interfaces between Polars and Python objects.

uv add pydantic-polars

pydantic_polars.validate

Go from Polars query -> Python objects

Provides an exhaustive set of distinct shape contracts.

Each is a structural guarantee, to which you can attach a type-form for parsing/validation.

from pydantic_polars import validate as plv

# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf)  # -> list[dict[str, Any]]

# "I have a model now. Parse + validate a list of them."
users = plv.records[list[User]].collect(lf)  # -> list[User]
# Can I have model methods on that list, so I can model_dump() it later?
users = plv.records[list[User]].collect_model(lf)  # -> pydantic.RootModel[list[User]]

# My query produces *exactly* 1 user. It cannot produce 0.
user = plv.record[User].collect(lf.head(1))  # -> User
# Correction: Zero rows is possible
user = plv.get_record[User | None].collect(lf.head(1))  # -> User | None

# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserTuple]].collect_async(lf)  # -> list[UserNamedTupleRow]

# Ok, that query was too big to fit into memory. Let's stream-compute + batch-collect
for users in plv.rows[list[UserTuple]].collect_batches(lf):  # -> Iterator[list[UserTuple]]
    write_somewhere(users)

# I need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age))  # -> dict[str, int]

# Everyone's names?
users_names = plv.column[list[str]].collect(lf.select(c.name))  # -> list[str]
# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max()))  # -> int | None
# Can we parallelize those and await them, without confusing the type-checker?
users_names, oldest_age = await plv.collect_all_async(
    plv.column[list[str]].defer(lf.select(c.name)),
    plv.item[int | None].defer(lf.select(c.age.max())),
)  # -> (list[str], int | None)

Each name after plv. is a shape.

plv.<shape>.collect(lf)     # Returns primitive T for <shape>
plv.<shape>[T].collect(lf)  # Returns T  (yes, T can be any type form)

A shape is a fixed contract. Part of the contract is that all values in the dataframe are returned (materializing data you don't need is a bug).

Example: map makes a dict from 2 columns, but only if column 0 was unique. (if len(result) == input_df.height)

Step 1. Pick a Shape

  • Scalar
    • item: One value.
  • Row-oriented
    • record, records: Row(s) as dict(s). row, rows: tuple(s) instead of dict(s).
    • (record|row)_entry, (record|row)_entries: Row(s) as '(col0, other_cols)' pair(s).
      • keyed_(record|row)_entries: As '(col0, all_cols)' pairs.
  • Uniquely-keyed rows
    • map: Rows as one tall dict from 2 columns: {unique_col0: col1}.
    • (record|row)_map: Rows as one tall dict: {unique_col0: other_cols}.
      • keyed_(record|row)_map: As {unique_col0: all_cols}.
  • Column-oriented
    • column: One column as a list (use keys if unique). columns: Tuple of any.
    • column_map: One {name: column} dict of any columns.
  • With table header
    • table_<shape>: (names, shapeT). For example, table_records: (names, records)

Step 2 (optional). Set a custom T for Pydantic to validate into

Examples:

plv.column[list[float]]
plv.column[tuple[float, ...]]
plv.column[list[float] | list[Decimal] | list[int]]
plv.column[MyCustomArrayType[float | None]]

[!TIP] Skipping this step (e.g. plv.column.collect(lf)) means skipping Pydantic validation. For column, this means you get lf.collect().to_series().to_list() directly.

Step 3. Call a method to create T

All shapes have the same interface. These produce T:

# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df)  # DataFrame equivalent to collect

# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))

# Streaming-compute, with batch materialization
for result_batch in shape.collect_batches(lf, chunk_size=1_000_000):
    ...  # Each batch is still T

*_model variants of each method exist, to return T wrapped in pydantic.RootModel[T].

Shapes

Shape Default T Input df must have
item Any height == 1, width == 1
get_item Any or None height <= 1, width == 1
map dict[item, item] width == 2, col0 UNIQUE
table_map (names, map) width == 2, col0 UNIQUE
column list[item] width == 1
keys list[item] width == 1, col0 UNIQUE
columns (column, ...)
column_map dict[name, column]
table_columns (names, columns)
record dict[name, item] height == 1
get_record dict[name, item] or None height <= 1
record_entry (item, rest_record) height == 1, width >= 2
get_record_entry (item, rest_record) or None height <= 1, width >= 2
records list[record]
record_map dict[item, rest_record] width >= 2, col0 UNIQUE
keyed_record_map dict[item, record] width >= 1, col0 UNIQUE
record_entries list[record_entry] width >= 2
keyed_record_entries list[(item, record)] width >= 1
table_records (names, records)
table_record_map (names, record_map) width >= 2, col0 UNIQUE
table_keyed_record_map (names, keyed_record_map) width >= 1, col0 UNIQUE
table_record_entries (names, record_entries) width >= 2
table_keyed_record_entries (names, keyed_record_entries) width >= 1
row (item, ...) height == 1
get_row (item, ...) or None height <= 1
row_entry (item, rest_row) height == 1, width >= 2
get_row_entry (item, rest_row) or None height <= 1, width >= 2
rows list[row]
row_map dict[item, rest_row] width >= 2, col0 UNIQUE
keyed_row_map dict[item, row] width >= 1, col0 UNIQUE
row_entries list[row_entry] width >= 2
keyed_row_entries list[(item, row)] width >= 1
table_rows (names, rows)
table_row_map (names, row_map) width >= 2, col0 UNIQUE
table_keyed_row_map (names, keyed_row_map) width >= 1, col0 UNIQUE
table_row_entries (names, row_entries) width >= 2
table_keyed_row_entries (names, keyed_row_entries) width >= 1
item, get_item
┌──────┐
│ name │
╞══════╡
│ Joy  │
└──────┘
 
 
 
'Joy'
 
 
map
┌────┬──────┐
│ id ┆ name │
╞════╪══════╡
│ A  ┆ Joy  │
│ B  ┆ Ben  │
│ C  ┆ Jin  │
└────┴──────┘
 
 
{
  'A': 'Joy',
  'B': 'Ben',
  'C': 'Jin',
}
 
table_map
┌────┬──────┐
│ id ┆ name │
╞════╪══════╡
│ A  ┆ Joy  │
│ B  ┆ Ben  │
│ C  ┆ Jin  │
└────┴──────┘
(
  ('id', 'name'),
  {
    'A': 'Joy',
    'B': 'Ben',
    'C': 'Jin',
  },
)
column, keys
┌──────┐
│ name │
╞══════╡
│ Joy  │
│ Ben  │
│ Jin  │
└──────┘
 
 
[
  'Joy',
  'Ben',
  'Jin',
]
 
columns
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
(
  ['A', 'B', 'C'],
  ['Joy', 'Ben', 'Jin'],
  [59, 25, 40],
)
 
column_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  id:   ['A', 'B', 'C'],
  name: ['Joy', 'Ben', 'Jin'],
  age:  [59, 25, 40],
}
 
table_columns
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  (
    ['A', 'B', 'C'],
    ['Joy', 'Ben', 'Jin'],
    [59, 25, 40],
  ),
)
record, get_record
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 

 
{ id: 'A', name: 'Joy', age: 59 }
 
 
record_entry, get_record_entry
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 

 
('A', { name: 'Joy', age: 59 })
 
 
records
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  { id: 'A', name: 'Joy', age: 59 },
  { id: 'B', name: 'Ben', age: 25 },
  { id: 'C', name: 'Jin', age: 40 },
]
 
record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': { name: 'Joy', age: 59 },
  'B': { name: 'Ben', age: 25 },
  'C': { name: 'Jin', age: 40 },
}
 
keyed_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': { id: 'A', name: 'Joy', age: 59 },
  'B': { id: 'B', name: 'Ben', age: 25 },
  'C': { id: 'C', name: 'Jin', age: 40 },
}
 
record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', { name: 'Joy', age: 59 }),
  ('B', { name: 'Ben', age: 25 }),
  ('C', { name: 'Jin', age: 40 }),
]
 
keyed_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', { id: 'A', name: 'Joy', age: 59 }),
  ('B', { id: 'B', name: 'Ben', age: 25 }),
  ('C', { id: 'C', name: 'Jin', age: 40 }),
]
 
table_records
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    { id: 'A', name: 'Joy', age: 59},
    { id: 'B', name: 'Ben', age: 25},
    { id: 'C', name: 'Jin', age: 40},
  ],
)
table_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': { name: 'Joy', age: 59 },
    'B': { name: 'Ben', age: 25 },
    'C': { name: 'Jin', age: 40 },
  },
)
table_keyed_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': { id: 'A', name: 'Joy', age: 59 },
    'B': { id: 'B', name: 'Ben', age: 25 },
    'C': { id: 'C', name: 'Jin', age: 40 },
  },
)
table_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', { name: 'Joy', age: 59 }),
    ('B', { name: 'Ben', age: 25 }),
    ('C', { name: 'Jin', age: 40 }),
  ],
)
table_keyed_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', { id: 'A', name: 'Joy', age: 59 }),
    ('B', { id: 'B', name: 'Ben', age: 25 }),
    ('C', { id: 'C', name: 'Jin', age: 40 }),
  ],
)
row, get_row
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 
 
 
('A', 'Joy', 59)
 
 
row_entry, get_row_entry
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 
 
 
('A', ('Joy', 59))
 
 
rows
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', 'Joy', 59),
  ('B', 'Ben', 25),
  ('C', 'Jin', 40),
]
 
row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': ('Joy', 59),
  'B': ('Ben', 25),
  'C': ('Jin', 40),
}
 
keyed_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': ('A', 'Joy', 59),
  'B': ('B', 'Ben', 25),
  'C': ('C', 'Jin', 40),
}
 
row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', ('Joy', 59)),
  ('B', ('Ben', 25)),
  ('C', ('Jin', 40)),
]
 
keyed_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  ('A', ('A', 'Joy', 59)),
  ('B', ('B', 'Ben', 25)),
  ('C', ('C', 'Jin', 40)),
}
 
table_rows
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', 'Joy', 59),
    ('B', 'Ben', 25),
    ('C', 'Jin', 40),
  ],
)
table_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': ('Joy', 59),
    'B': ('Ben', 25),
    'C': ('Jin', 40),
  },
)
table_keyed_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': ('A', 'Joy', 59),
    'B': ('A', 'Ben', 25),
    'C': ('A', 'Jin', 40),
  },
)
table_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', ('Joy', 59)),
    ('B', ('Ben', 25)),
    ('C', ('Jin', 40)),
  ],
)
table_keyed_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', ('A', 'Joy', 59)),
    ('B', ('A', 'Ben', 25)),
    ('C', ('A', 'Jin', 40)),
  ],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_polars-0.0.5.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_polars-0.0.5-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_polars-0.0.5.tar.gz.

File metadata

  • Download URL: pydantic_polars-0.0.5.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pydantic_polars-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ab0bbb572d9bef1a7778981e2c1b4d694888b5677d11caf994ad1d6a090396fa
MD5 05502c876434656621b2de219742edf7
BLAKE2b-256 9e06927599b35ae34f51b87b15d2a66b25fdc28eda0fbddd537e70d3b2aec4a3

See more details on using hashes here.

File details

Details for the file pydantic_polars-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_polars-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9c009384c1f8fb50c84077c53c377d3f21341d4e64c37964c1fc4b2f857ef77a
MD5 ce4536575ededfaf8a91f4e7826ff455
BLAKE2b-256 8f0a8fc97d9897c7f84d8d77ca6cebc8e0c8f048e61624bb44d9c99e214a9f16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page