Polars DataFrame validation using type hints
Project description
polars_validate: Polars DataFrame validation using type hints
Simple DataFrame validation, based on type hints.
from typing import Annotated
import polars as pl
from polars_validate import (
THIS,
ContainsPattern,
FrameValidators,
IntegerType,
IsBetween,
IsIn,
IsNotNull,
UniqueTogether,
validate,
)
tips = pl.DataFrame(
{
"restaurant": [1, 1, 1, 2],
"table": [1, 2, 3, 1],
"bill": [16.99, 10.34, 21.01, 23.68],
"tip": [1.01, 1.66, 3.5, None],
"sex": ["Female", "Male", None, "Male"],
"smoker": [False, True, True, False],
"time": ["45 min", "30 mins", "60 min", "50 min"],
}
)
class TipsSchema:
restaurant: Annotated[IntegerType, IsNotNull()]
table: Annotated[IntegerType, IsNotNull()]
bill: Annotated[pl.Float64, IsNotNull(), IsBetween(0.0, 50.0, closed="right")]
tip: pl.Float64
sex: Annotated[pl.String, IsNotNull(), IsIn(("Female", "Male"))]
smoker: Annotated[pl.Boolean, IsNotNull()]
time: Annotated[
pl.String,
ContainsPattern("^\\d+ min$"),
THIS.str.strip_suffix(" min").cast(pl.Int64, strict=False) < 120,
]
dataframe: FrameValidators = (
UniqueTogether(("restaurant", "table")),
pl.col("bill") > pl.col("tip"),
)
validate(TipsSchema, tips, eager=False)
#> polars_validate.base.ValidationError: Validation failed with 2 errors (16 passed):
#> ❌ 'sex': 'not null' check failed at offsets: [2]
#> ❌ 'time': 'pattern ^\d+ min$' check failed at offsets: [1]
Installation
pip install git+https://github.com/chris-mcdo/polars-validate
Features
Validate using built-in validation types, polars expressions, or arbitrary functions.
In-built validation:
IsNotNull: check for missing valuesIsIn: check for set membershipIsBetween: check values lie within an intervalContainsPattern: check a string contains / matches a regex patternTypeValidator: check typeUniqueTogether: check some columns uniquely identify rows
For inspiration, a few examples of how polars expressions can be used for validation:
# THIS represents the the current Series / column.
from polars_validate import THIS
# series-based validation
is_even = (THIS % 2) == 0
is_in_title_case = THIS.str.title() == THIS
starts_with_foo = THIS.str.starts_with("foo")
is_close_to_mean = (THIS - THIS.mean()).abs() < 5.0
is_unique = THIS.is_unique()
is_short_string = THIS.str.len() < 10
# dataframe-based validation
bounded = pl.col("col_a").is_between("col_b", "col_c")
at_least_one = pl.any_horizontal("a", "b", "c")
Arbitrary custom validation is also supported:
def is_valid_index(s: pl.Series) -> bool:
return s.is_sorted() and s[0] == 1
def smokers_tip_more(d: pl.DataFrame):
# arbitrary logic ...
class TipsSchema:
restaurant: Annotated[IntegerType, IsNotNull(), SeriesCallableValidator(is_valid_index, "valid index")]
# ...
dataframe: FrameValidators = (
# ...
CallableValidator(smokers_tip_more, "smokers tip more"),
)
User Guide
Define validation for individual Series (or DataFrame columns) using type annotations as shown above.
E.g. for series:
# simple schema - just validate type
SimpleSeriesSchema = pl.Float32
# add more complex validation using type metadata
StrictSeriesSchema = Annotated[pl.Float32, IsNotNull(), THIS.sqrt().round().mod(7).eq(0), ...]
validate_series(StrictSeriesSchema, my_series)
#> ...
To validate DataFrames, combine Series type annotations in a class as shown above.
To add validation which applies to the whole dataframe, add fields with the FrameValidators
annotation.
Internally, type annotations and metadata are translated into a sequence of Validator objects.
You can just use these objects directly if you want.
License
Licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_validate-0.1.0.tar.gz.
File metadata
- Download URL: polars_validate-0.1.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ebafbce8426c67c2b9e10749a4cc187da7d997d53250010de195a3942124940
|
|
| MD5 |
3cac5bf3d304e39731f567934b570715
|
|
| BLAKE2b-256 |
9d528beb8d20e345db217a8292a4d6e4735b0bd5fda768cc401d9ed0dfaef5f1
|
File details
Details for the file polars_validate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: polars_validate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e384d951c22b6ca55cd0c6bd89ae2fcc737fa43ae0621cb28839533efaef9c4
|
|
| MD5 |
be1c60293186c6ee7d29b678686b9e8e
|
|
| BLAKE2b-256 |
a44f1ff35b42e0bc900e27cb65d458eef6dd5fee7f72a078cca29518b4bac963
|