Skip to main content

Bring polars data back to Python objects, safely. Validation, schema/query generation.

Project description

Pydantic, for Polars

PyPI Tests codecov License Supported Python versions Black Pyright

Type-safe, maintainable interfaces between Polars and Python objects.

uv add pydantic-polars

pydantic_polars.validate

Go from Polars query -> Python objects

Provides an exhaustive set of distinct shape contracts.

Each is a structural guarantee, to which you can attach a type-form for parsing/validation.

from pydantic_polars import validate as plv

# Equivalent to `lf.collect().rows(named=True)`
users = plv.records.collect(lf)  # -> list[dict[str, Any]]

# "I have a model now. Parse + validate a list of them."
users = plv.records[list[User]].collect(lf)  # -> list[User]

# Can there be an api around that list, so I can model_dump?
users = plv.records[list[User]].collect_model(lf)  # -> pydantic.RootModel[list[User]]

# My query produces, at most, 1 user. But 0 rows may come back.
user = plv.get_record[User | None].collect(lf.filter(name='Mo').head(1))  # -> User | None

# My query produces *exactly* 1 user. It cannot produce 0.
user = plv.record[User].collect(lf.head(1))  # -> User

# Tuples instead of objects? Also...can we do async?
users = await plv.rows[list[UserNamedTuple]].collect_async(lf)  # -> list[UserNamedTupleRow]

# Need one huge {name: age} mapping. My query returns exactly 2 columns.
name_age_map = plv.map[dict[str, int]].collect(lf.select(c.name, c.age))

# Everyone's names, please
users_names = plv.column[list[str]].collect(lf.select(c.name))  # -> list[str]

# Age of oldest person?
oldest_age = plv.item[int | None].collect(lf.select(c.age.max()))  # -> int | None

# Can we parallelize those in Rust, on other threads?
users_names, oldest_age = await plv.collect_all_async(
    plv.column[list[str]].defer(lf.select(c.name)),
    plv.item[int | None].defer(lf.select(c.age.max())),
)  # -> (list[str], int | None)

# Only need his age, but 0 rows may come back. Safely get int or None.
age = plv.get_item[int | None].collect(
    lf.filter(c.name == 'jeff').select(c.age).head(1)
)  # -> int | None

Each name after plv. is a shape.

plv.<shape>.collect(lf)     # Returns primitive T for <shape>
plv.<shape>[T].collect(lf)  # Returns T  (yes, T can be any type form)

A shape is a fixed contract.

Part of the contract is that all values in the dataframe are returned (materializing data you don't need is a bug).

Example: map makes a dict from 2 columns, but only if column 0 was unique. (if len(result) == input_df.height)

A shape has only one meaning. It can't be configured to change the structure.

Example: item doesn't just grab a value, it asserts the dataframe has exactly 1 value. If it may have 0, that's a different shape: get_item.

Step 1. Pick a Shape

  • Scalar
    • item: One value.
  • Row-oriented
    • record, records: Row(s) as dict(s). row, rows: tuple(s) instead of dict(s).
    • (record|row)_entry, (record|row)_entries: Row(s) as '(col0, other_cols)' pair(s).
      • keyed_(record|row)_entries: As '(col0, all_cols)' pairs.
  • Uniquely-keyed rows
    • map: Rows as one tall dict from 2 columns: {unique_col0: col1}.
    • (record|row)_map: Rows as one tall dict: {unique_col0: other_cols}.
      • keyed_(record|row)_map: As {unique_col0: all_cols}.
  • Column-oriented
    • column: One column as a list (use keys if unique). columns: Tuple of any.
    • column_map: One {name: column} dict of any columns.
  • With table header
    • table_<shape>: (names, shapeT). For example, table_records: (names, records)

Step 2 (optional). Set a custom T for Pydantic to validate into

Examples:

plv.column[list[float]]
plv.column[tuple[float, ...]]
plv.column[list[float] | list[Decimal] | list[int]]
plv.column[MyCustomArrayType[float | None]]

[!TIP] Skipping this step (e.g. plv.column.collect(lf)) means skipping Pydantic validation. For column, this means you get lf.collect().to_series().to_list() directly.

Step 3. Call a method to create T

All shapes have the same methods. These ones return T:

# Single query
result = shape.collect(lf)
result = await shape.collect_async(lf)
result = shape.validate(df)  # DataFrame equivalent to collect
# Parallel queries
result1, result2 = plv.collect_all(shape.defer(lf1), shape.defer(lf2))
result1, result2 = await plv.collect_all_async(shape.defer(lf1), shape.defer(lf2))

*_model variants of each method also exist, to return T wrapped in pydantic.RootModel[T].

Shapes

Shape Default T Input df must have
item Any height == 1, width == 1
get_item Any or None height <= 1, width == 1
map dict[item, item] width == 2, col0 UNIQUE
table_map (names, map) width == 2, col0 UNIQUE
column list[item] width == 1
keys list[item] width == 1, col0 UNIQUE
columns (column, ...)
column_map dict[name, column]
table_columns (names, columns)
record dict[name, item] height == 1
get_record dict[name, item] or None height <= 1
record_entry (item, rest_record) height == 1, width >= 2
get_record_entry (item, rest_record) or None height <= 1, width >= 2
records list[record]
record_map dict[item, rest_record] width >= 2, col0 UNIQUE
keyed_record_map dict[item, record] width >= 1, col0 UNIQUE
record_entries list[record_entry] width >= 2
keyed_record_entries list[(item, record)] width >= 1
table_records (names, records)
table_record_map (names, record_map) width >= 2, col0 UNIQUE
table_keyed_record_map (names, keyed_record_map) width >= 1, col0 UNIQUE
table_record_entries (names, record_entries) width >= 2
table_keyed_record_entries (names, keyed_record_entries) width >= 1
row (item, ...) height == 1
get_row (item, ...) or None height <= 1
row_entry (item, rest_row) height == 1, width >= 2
get_row_entry (item, rest_row) or None height <= 1, width >= 2
rows list[row]
row_map dict[item, rest_row] width >= 2, col0 UNIQUE
keyed_row_map dict[item, row] width >= 1, col0 UNIQUE
row_entries list[row_entry] width >= 2
keyed_row_entries list[(item, row)] width >= 1
table_rows (names, rows)
table_row_map (names, row_map) width >= 2, col0 UNIQUE
table_keyed_row_map (names, keyed_row_map) width >= 1, col0 UNIQUE
table_row_entries (names, row_entries) width >= 2
table_keyed_row_entries (names, keyed_row_entries) width >= 1
item, get_item
┌──────┐
│ name │
╞══════╡
│ Joy  │
└──────┘
 
 
 
'Joy'
 
 
map
┌────┬──────┐
│ id ┆ name │
╞════╪══════╡
│ A  ┆ Joy  │
│ B  ┆ Ben  │
│ C  ┆ Jin  │
└────┴──────┘
 
 
{
  'A': 'Joy',
  'B': 'Ben',
  'C': 'Jin',
}
 
table_map
┌────┬──────┐
│ id ┆ name │
╞════╪══════╡
│ A  ┆ Joy  │
│ B  ┆ Ben  │
│ C  ┆ Jin  │
└────┴──────┘
(
  ('id', 'name'),
  {
    'A': 'Joy',
    'B': 'Ben',
    'C': 'Jin',
  },
)
column, keys
┌──────┐
│ name │
╞══════╡
│ Joy  │
│ Ben  │
│ Jin  │
└──────┘
 
 
[
  'Joy',
  'Ben',
  'Jin',
]
 
columns
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
(
  ['A', 'B', 'C'],
  ['Joy', 'Ben', 'Jin'],
  [59, 25, 40],
)
 
column_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  id:   ['A', 'B', 'C'],
  name: ['Joy', 'Ben', 'Jin'],
  age:  [59, 25, 40],
}
 
table_columns
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  (
    ['A', 'B', 'C'],
    ['Joy', 'Ben', 'Jin'],
    [59, 25, 40],
  ),
)
record, get_record
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 

 
{ id: 'A', name: 'Joy', age: 59 }
 
 
record_entry, get_record_entry
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 

 
('A', { name: 'Joy', age: 59 })
 
 
records
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  { id: 'A', name: 'Joy', age: 59 },
  { id: 'B', name: 'Ben', age: 25 },
  { id: 'C', name: 'Jin', age: 40 },
]
 
record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': { name: 'Joy', age: 59 },
  'B': { name: 'Ben', age: 25 },
  'C': { name: 'Jin', age: 40 },
}
 
keyed_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': { id: 'A', name: 'Joy', age: 59 },
  'B': { id: 'B', name: 'Ben', age: 25 },
  'C': { id: 'C', name: 'Jin', age: 40 },
}
 
record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', { name: 'Joy', age: 59 }),
  ('B', { name: 'Ben', age: 25 }),
  ('C', { name: 'Jin', age: 40 }),
]
 
keyed_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', { id: 'A', name: 'Joy', age: 59 }),
  ('B', { id: 'B', name: 'Ben', age: 25 }),
  ('C', { id: 'C', name: 'Jin', age: 40 }),
]
 
table_records
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    { id: 'A', name: 'Joy', age: 59},
    { id: 'B', name: 'Ben', age: 25},
    { id: 'C', name: 'Jin', age: 40},
  ],
)
table_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': { name: 'Joy', age: 59 },
    'B': { name: 'Ben', age: 25 },
    'C': { name: 'Jin', age: 40 },
  },
)
table_keyed_record_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': { id: 'A', name: 'Joy', age: 59 },
    'B': { id: 'B', name: 'Ben', age: 25 },
    'C': { id: 'C', name: 'Jin', age: 40 },
  },
)
table_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', { name: 'Joy', age: 59 }),
    ('B', { name: 'Ben', age: 25 }),
    ('C', { name: 'Jin', age: 40 }),
  ],
)
table_keyed_record_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', { id: 'A', name: 'Joy', age: 59 }),
    ('B', { id: 'B', name: 'Ben', age: 25 }),
    ('C', { id: 'C', name: 'Jin', age: 40 }),
  ],
)
row, get_row
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 
 
 
('A', 'Joy', 59)
 
 
row_entry, get_row_entry
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
└────┴──────┴─────┘
 
 
 
('A', ('Joy', 59))
 
 
rows
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', 'Joy', 59),
  ('B', 'Ben', 25),
  ('C', 'Jin', 40),
]
 
row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': ('Joy', 59),
  'B': ('Ben', 25),
  'C': ('Jin', 40),
}
 
keyed_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  'A': ('A', 'Joy', 59),
  'B': ('B', 'Ben', 25),
  'C': ('C', 'Jin', 40),
}
 
row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
[
  ('A', ('Joy', 59)),
  ('B', ('Ben', 25)),
  ('C', ('Jin', 40)),
]
 
keyed_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
 
 
{
  ('A', ('A', 'Joy', 59)),
  ('B', ('B', 'Ben', 25)),
  ('C', ('C', 'Jin', 40)),
}
 
table_rows
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', 'Joy', 59),
    ('B', 'Ben', 25),
    ('C', 'Jin', 40),
  ],
)
table_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': ('Joy', 59),
    'B': ('Ben', 25),
    'C': ('Jin', 40),
  },
)
table_keyed_row_map
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  {
    'A': ('A', 'Joy', 59),
    'B': ('A', 'Ben', 25),
    'C': ('A', 'Jin', 40),
  },
)
table_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', ('Joy', 59)),
    ('B', ('Ben', 25)),
    ('C', ('Jin', 40)),
  ],
)
table_keyed_row_entries
┌────┬──────┬─────┐
│ id ┆ name ┆ age │
╞════╪══════╪═════╡
│ A  ┆ Joy  ┆ 59  │
│ B  ┆ Ben  ┆ 25  │
│ C  ┆ Jin  ┆ 40  │
└────┴──────┴─────┘
(
  ('id', 'name', 'age'),
  [
    ('A', ('A', 'Joy', 59)),
    ('B', ('A', 'Ben', 25)),
    ('C', ('A', 'Jin', 40)),
  ],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_polars-0.0.4.tar.gz (62.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_polars-0.0.4-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_polars-0.0.4.tar.gz.

File metadata

  • Download URL: pydantic_polars-0.0.4.tar.gz
  • Upload date:
  • Size: 62.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pydantic_polars-0.0.4.tar.gz
Algorithm Hash digest
SHA256 0957850242af7b944390f558bafdb0e77f9c9fc591488899531e59af2ec15984
MD5 0105bc5d5c0e4629f4f504297ad8e6df
BLAKE2b-256 7a9a69ad1d968bdd7af34397f6291c554c5565d395016fc897ac64519491d4d1

See more details on using hashes here.

File details

Details for the file pydantic_polars-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_polars-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d72b7ea01e1a70209acf46f75926cecf430088ab17a8be46dad44dc7a93f2fab
MD5 8aceb84bb6b3bd24288aeecd19bfe4ea
BLAKE2b-256 9e9b62f7da0e3152e8fc29bd2828203e949200fb28f7cf17ec2ede084bb0ea7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page