Skip to main content

Type hints for schemas of polars data frames

Project description

Polars-typed

Let your static type checker help you remember which columns are present in your dataframe, and which types they (should) have!

Why polars-typed?

  • Static schema validation. No more typos in string column names.
  • Autocomplete on the available columns of you dataframe!
  • Catch schema errors before your code hits prod. Yes you wrote a unit test but you hardcoded some naive datetimes in the test data, and now it turns out your data lake is serving datetimes with time zones, causing exceptions in production. polars-typed forces you to think about this ahead of release, so the problem never reaches prod.

What about alternatives?

What about pandera? Pandera does data validation, but not schema validation. polars-typed does schema validation, not data validation.

What about poldantic? Poldantic does not integrate with static type checkers. polars-typed aims to prevent a whole class of problems from ever reaching the first unit test (or prod..) by letting you validate all your data transforms with a single pyright call, or live diagnostics if you have a type checker configured in your IDE.

statically enforced, runtime-checked schemas for dataframes

Mypy/pyright/ty reminds you to add schema validation, and tells you what schema you can expect from a function. The actual data validation happens at runtime.

Schemas offer two modes of validation:

DataFrameSchema.validate performs strict validation, failing if the schema is not exactly as specified. DataFrameSchema.coerce attempts to coerce before validation, so unnecessary columns are dropped and columns can be reordered. Optionally datatypes are cast (as long as this can be done without loss of information; casting 12345 to pl.Int8 will fail).

A DataFrameSchema must specify each column using polars.DataType types; any other type (eg str or int) will result in a type error.

Both DataFrames and LazyFrames are supported. Note that validation on LazyFrames is potentially expensive.

import polars as pl
from polars_typed import Column, DataFrame, DataFrameSchema

class TestSchema(DataFrameSchema):
    foo = Column(pl.Boolean)
    # Some polars datatypes carry metadata on the object level instead of the type level.
    # For that reason it is necessary to assign (=) the columns, instead of defining them through type annotations (:)
    bar = Column(pl.Datetime(time_unit="us", time_zone=None))

def typed_function(df: DataFrame[TestSchema]) -> DataFrame[TestSchema]:
    df_untyped = df.filter(pl.col("foo"))
    # return df_untyped # mypy/pyright/ty complains
    return TestSchema.validate(df_untyped) # mypy/pyright/ty is happy

untyped_df = pl.DataFrame({"foo":[False], "bar":1})

typed_function(untyped_df) # mypy/pyright/ty complains
typed_df = typed_function(TestSchema.validate(untyped_df)) # mypy/pyright/ty is happy

# the Column wrapper type lets us use the schema as an enum of column identifiers as well
typed_df.select(TestSchema.foo)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_typed-0.1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_typed-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file polars_typed-0.1.0.tar.gz.

File metadata

  • Download URL: polars_typed-0.1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_typed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0e3067b1064063ec2b8e6c8173625d43a990da53b6073e505d9f8cf862a2ea26
MD5 206e55ed476c74c94a253a4b8f7ccfbe
BLAKE2b-256 3bdbdd867b3aad76eab10d7103748134854ce7d3299991f7d2f39ed9622ccc45

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_typed-0.1.0.tar.gz:

Publisher: release.yml on sebasv/polars-typed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_typed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: polars_typed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_typed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 480f6dbe1f2eb35c5fd3b7d477863ae48fd767e81360f852b22b0edb0f172cd9
MD5 e0eddf414e0a4b6f0fb19061d325b0cd
BLAKE2b-256 46e1b28b656240c32b15f64b3917be9463bb479a7aa3b0d16f03ea6caed8c91b

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_typed-0.1.0-py3-none-any.whl:

Publisher: release.yml on sebasv/polars-typed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page