Type hints for schemas of polars data frames
Project description
Polars-typed
Let your static type checker help you remember which columns are present in your dataframe, and which types they (should) have!
Why polars-typed?
- Static schema validation. No more typos in string column names.
- Autocomplete on the available columns of you dataframe!
- Catch schema errors before your code hits prod. Yes you wrote a unit test but you hardcoded some naive datetimes in the test data, and now it turns out your data lake is serving datetimes with time zones, causing exceptions in production.
polars-typedforces you to think about this ahead of release, so the problem never reaches prod.
What about alternatives?
What about pandera? Pandera does data validation, but not schema validation. polars-typed does schema validation, not data validation.
What about poldantic? Poldantic does not integrate with static type checkers. polars-typed aims to prevent a whole class of problems from ever reaching the first unit test (or prod..) by letting you validate all your data transforms with a single pyright call, or live diagnostics if you have a type checker configured in your IDE.
statically enforced, runtime-checked schemas for dataframes
Mypy/pyright/ty reminds you to add schema validation, and tells you what schema you can expect from a function. The actual data validation happens at runtime.
Schemas offer two modes of validation:
DataFrameSchema.validate performs strict validation, failing if the schema is not exactly as specified.
DataFrameSchema.coerce attempts to coerce before validation, so unnecessary columns are dropped and columns can be reordered. Optionally datatypes are cast (as long as this can be done without loss of information; casting 12345 to pl.Int8 will fail).
A DataFrameSchema must specify each column using polars.DataType types; any other type (eg str or int) will result in a type error.
Both DataFrames and LazyFrames are supported. Note that validation on LazyFrames is potentially expensive.
import polars as pl
from polars_typed import Column, DataFrame, DataFrameSchema
class TestSchema(DataFrameSchema):
foo = Column(pl.Boolean)
# Some polars datatypes carry metadata on the object level instead of the type level.
# For that reason it is necessary to assign (=) the columns, instead of defining them through type annotations (:)
bar = Column(pl.Datetime(time_unit="us", time_zone=None))
def typed_function(df: DataFrame[TestSchema]) -> DataFrame[TestSchema]:
df_untyped = df.filter(pl.col("foo"))
# return df_untyped # mypy/pyright/ty complains
return TestSchema.validate(df_untyped) # mypy/pyright/ty is happy
untyped_df = pl.DataFrame({"foo":[False], "bar":1})
typed_function(untyped_df) # mypy/pyright/ty complains
typed_df = typed_function(TestSchema.validate(untyped_df)) # mypy/pyright/ty is happy
# the Column wrapper type lets us use the schema as an enum of column identifiers as well
typed_df.select(TestSchema.foo)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_typed-0.1.0.tar.gz.
File metadata
- Download URL: polars_typed-0.1.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e3067b1064063ec2b8e6c8173625d43a990da53b6073e505d9f8cf862a2ea26
|
|
| MD5 |
206e55ed476c74c94a253a4b8f7ccfbe
|
|
| BLAKE2b-256 |
3bdbdd867b3aad76eab10d7103748134854ce7d3299991f7d2f39ed9622ccc45
|
Provenance
The following attestation bundles were made for polars_typed-0.1.0.tar.gz:
Publisher:
release.yml on sebasv/polars-typed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_typed-0.1.0.tar.gz -
Subject digest:
0e3067b1064063ec2b8e6c8173625d43a990da53b6073e505d9f8cf862a2ea26 - Sigstore transparency entry: 955860503
- Sigstore integration time:
-
Permalink:
sebasv/polars-typed@77e173956e0e821256ab0c16f6dc951383d1db6f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sebasv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@77e173956e0e821256ab0c16f6dc951383d1db6f -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_typed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: polars_typed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
480f6dbe1f2eb35c5fd3b7d477863ae48fd767e81360f852b22b0edb0f172cd9
|
|
| MD5 |
e0eddf414e0a4b6f0fb19061d325b0cd
|
|
| BLAKE2b-256 |
46e1b28b656240c32b15f64b3917be9463bb479a7aa3b0d16f03ea6caed8c91b
|
Provenance
The following attestation bundles were made for polars_typed-0.1.0-py3-none-any.whl:
Publisher:
release.yml on sebasv/polars-typed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_typed-0.1.0-py3-none-any.whl -
Subject digest:
480f6dbe1f2eb35c5fd3b7d477863ae48fd767e81360f852b22b0edb0f172cd9 - Sigstore transparency entry: 955860506
- Sigstore integration time:
-
Permalink:
sebasv/polars-typed@77e173956e0e821256ab0c16f6dc951383d1db6f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/sebasv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@77e173956e0e821256ab0c16f6dc951383d1db6f -
Trigger Event:
push
-
Statement type: