DataFrame validation library using Python Protocol for structural subtyping
Project description
Pavise
DataFrame validation library using Python Protocol for structural subtyping.
About the Name
A pavise was a large shield used by medieval crossbowmen, big enough to cover the entire body and provide strong protection.
Like its namesake, this library serves as a shield for your data. Whether you're working with small datasets or big data, pavise protects your code with type safety and validation.
Features
- Use Python Protocol to define DataFrame schemas
DataFrame[Schema]type annotation for static type checking- Structural subtyping: validate only required columns, ignore extra columns
- Covariant type parameters:
DataFrame[ChildSchema]is compatible withDataFrame[ParentSchema] - Optional runtime validation
- No inheritance required
- Support for both pandas and polars backends
Documentation
Full documentation is available at https://pavise.readthedocs.io/
Installation
# For pandas support
pip install pavise[pandas]
# For polars support
pip install pavise[polars]
# For both
pip install pavise[all]
Usage
Pandas Backend
from typing import Protocol
import pandas as pd
from pavise.pandas import DataFrame
class UserSchema(Protocol):
name: str
age: int
# Runtime validation when creating DataFrame[Schema]
raw_df = pd.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 17]})
validated_df = DataFrame[UserSchema](raw_df) # Validates column types at runtime
# Type hints work with static type checkers (mypy, pyright, etc.)
def process_users(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
return df[df['age'] >= 18]
result = process_users(validated_df)
Polars Backend
from typing import Protocol
import polars as pl
from pavise.polars import DataFrame
class UserSchema(Protocol):
name: str
age: int
# Runtime validation when creating DataFrame[Schema]
raw_df = pl.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 17]})
validated_df = DataFrame[UserSchema](raw_df) # Validates column types at runtime
# Type hints work with static type checkers (mypy, pyright, etc.)
def process_users(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
return df.filter(df['age'] >= 18)
result = process_users(validated_df)
Structural Subtyping
from typing import Protocol
import pandas as pd
from pavise.pandas import DataFrame
class UserSchema(Protocol):
name: str
class UserWithEmailSchema(Protocol):
name: str
email: str
def process_user(df: DataFrame[UserSchema]) -> None:
print(df['name'])
# This works! UserWithEmailSchema has all required columns of UserSchema
df = DataFrame[UserWithEmailSchema](pd.DataFrame({
'name': ['Alice'],
'email': ['alice@example.com']
}))
process_user(df) # OK - covariant type parameter
Using Validators
Add validators using typing.Annotated to enforce data quality constraints:
from typing import Annotated, Protocol
import pandas as pd
from pavise.pandas import DataFrame
from pavise.validators import Range, Regex
class UserSchema(Protocol):
name: str
age: Annotated[int, Range(0, 150)]
email: Annotated[str, Regex(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')]
# Valid data passes validation
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'email': ['alice@example.com', 'bob@example.com']
})
validated_df = DataFrame[UserSchema](df) # OK
# Invalid data raises ValidationError
invalid_df = pd.DataFrame({
'name': ['Charlie'],
'age': [200], # Exceeds maximum age
'email': ['invalid-email'] # Invalid email format
})
DataFrame[UserSchema](invalid_df) # ValidationError
Union Types
Union types allow columns to accept multiple different types:
from typing import Protocol, Union
import pandas as pd
from pavise.pandas import DataFrame
class MixedSchema(Protocol):
code: Union[int, str] # Can be int or str
value: float
# Accept int values
df1 = pd.DataFrame({'code': [1, 2, 3], 'value': [1.0, 2.0, 3.0]})
validated1 = DataFrame[MixedSchema](df1) # OK
# Accept str values
df2 = pd.DataFrame({'code': ['A', 'B', 'C'], 'value': [1.0, 2.0, 3.0]})
validated2 = DataFrame[MixedSchema](df2) # OK
# Accept mixed int/str values
df3 = pd.DataFrame({'code': [1, 'B', 3, 'D'], 'value': [1.0, 2.0, 3.0, 4.0]})
validated3 = DataFrame[MixedSchema](df3) # OK
# Union with None for nullable union types
class NullableUnionSchema(Protocol):
code: Union[int, str, None] # Can be int, str, or None
df4 = pd.DataFrame({'code': [1, 'B', None, 4]})
validated4 = DataFrame[NullableUnionSchema](df4) # OK
Extra Columns are Ignored
from typing import Protocol
import pandas as pd
from pavise.pandas import DataFrame
class SimpleSchema(Protocol):
a: int
# Extra columns are ignored during validation
df = pd.DataFrame({
'a': [1, 2, 3],
'b': ['x', 'y', 'z'], # Extra column - ignored
'c': [10.0, 20.0, 30.0] # Extra column - ignored
})
validated = DataFrame[SimpleSchema](df) # OK
Supported Types
Basic Types
int- Integer valuesfloat- Floating point valuesstr- String valuesbool- Boolean values
Date/Time Types
datetime- Date and time valuesdate- Date-only valuestimedelta- Time duration values
Generic Types
Optional[T]- Nullable types (e.g.,Optional[int],Optional[str])Union[T1, T2, ...]- Union types allowing multiple types (e.g.,Union[int, str],Union[int, str, float])- Can be combined with
Nonefor nullable unions:Union[int, str, None]
- Can be combined with
Literal[...]- Specific literal values (e.g.,Literal["a", "b", "c"],Literal[1, 2, 3])NotRequiredColumn[T]- Optional columns (e.g.,NotRequiredColumn[int],NotRequiredColumn[Optional[str]])
Backend-Specific Types
- pandas:
pd.CategoricalDtype,pd.Int64Dtype, and other Extension dtypes - polars:
pl.Categorical,pl.Int64, and other polars DataTypes
Development
# Install with dev dependencies (includes both pandas and polars)
uv pip install -e ".[dev]"
# Run all tests
uv run pytest
# Run tests for specific backend
uv run pytest tests/test_pandas.py
uv run pytest tests/test_polars.py
Testing with tox
# Run tests for all Python versions and backends
tox
# Run tests for specific environment
tox -e py312-pandas # Test pandas backend with Python 3.12
tox -e py312-polars # Test polars backend with Python 3.12
tox -e py312-all # Test both backends with Python 3.12
# Run linting
tox -e lint
# Run type checking
tox -e type
# Available Python versions: py39, py310, py311, py312
# Available backends: pandas, polars, all
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pavise-0.1.5.tar.gz.
File metadata
- Download URL: pavise-0.1.5.tar.gz
- Upload date:
- Size: 118.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1bde76ae83c41e4dc354414360700adfea082b716e4d664b9f471c1605d072e
|
|
| MD5 |
d594d40326483d10404958bf0e69583b
|
|
| BLAKE2b-256 |
a850cf1bf04aaf18792b21fc20b4c3710cec8fb13fb29a0f977d56f9538ac427
|
File details
Details for the file pavise-0.1.5-py3-none-any.whl.
File metadata
- Download URL: pavise-0.1.5-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8027cf4ea690131134b568a282e581875def03f1daadb788793ce34ded6543ac
|
|
| MD5 |
3ceb9b0401b96ac12ce2b393bf7b9840
|
|
| BLAKE2b-256 |
341f25db62bbfc021bf9610e04b88162428a5d7d2ab0f9f725fda95b5cb22197
|