A Python library for validating pandas DataFrames using schemas
Project description
pdschema
A Python library for validating pandas DataFrames using schemas, with support for type checking, custom validators, and function input/output validation.
Features
- Define schemas for pandas DataFrames with type checking and validation
- Support for custom validators (e.g., IsPositive, IsNonEmptyString, Range, etc.)
- Function decorator for validating input and output DataFrames
- PyArrow type integration for efficient type checking
- Schema inference from existing DataFrames
- Nullable column support
- Comprehensive type mapping between Python, pandas, and PyArrow types
Installation
Using pip
pip install pdschema
Using Poetry
poetry add pdschema
Quick Start
import pandas as pd
from pdschema import Column, IsNonEmptyString, IsPositive, Range, Schema
# Create a DataFrame
df = pd.DataFrame(
{
"idx": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"score": [85.5, 92.0, 78.5],
}
)
# Define a schema using programmatic syntax
schema = Schema(
[
Column("idx", int, nullable=False),
Column("name", str, nullable=False, validators=[IsNonEmptyString()]),
Column("age", int, validators=[IsPositive()]),
Column("score", float, validators=[Range(0, 100)]),
]
)
# Validate the DataFrame: raises ValueError if validation fails
schema.validate(df)
# Declarative Schema Definition
class MySchema(Schema):
idx = Column(dtype=int, nullable=False)
name = Column(dtype=str, nullable=False, validators=[IsNonEmptyString])
age = Column(dtype=int, nullable=False, validators=[IsPositive])
score = Column(dtype=float, nullable=False, validators=[Range(0, 100)])
MySchema().validate(df)
Function Validation
Use the @pdfunction decorator to validate function inputs and outputs:
from pdschema import pdfunction
@pdfunction(
arguments={
"df": Schema([Column("id", int), Column("value", float)]),
"threshold": float
},
outputs={
"result": Schema([Column("id", int), Column("filtered_value", float)])
}
)
def filter_values(df, threshold):
result = df[df["value"] > threshold]
return {"result": result}
Available Validators
The package comes builtin with many Validators you can use.
IsPositive: Ensures numeric values are positiveIsNonEmptyString: Ensures strings are non-emptyMax: Ensures values are less than or equal to a maximumMin: Ensures values are greater than or equal to a minimumGreaterThan: Ensures values are greater than a thresholdGreaterThanOrEqual: Ensures values are greater than or equal to a thresholdLessThan: Ensures values are less than a thresholdLessThanOrEqual: Ensures values are less than or equal to a thresholdChoice: Ensures values are in a list of allowed choicesLength: Ensures values have a specific length or length rangeRange: Ensures values are within a range
Schema Inference
You can infer a schema from an existing DataFrame:
df = pd.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35]
})
schema = Schema.infer_schema(df)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdschema-0.1.3.tar.gz.
File metadata
- Download URL: pdschema-0.1.3.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5746a717f77df1d03647e6accd718ca34c27d1a954caa01a0feb2dc981e57a87
|
|
| MD5 |
411f9089ae4edf4a06c072382c63fd7a
|
|
| BLAKE2b-256 |
cabbed17601ab3692d8bddc4a44b6d66ec6001eb8a0a91ea9d854191ef0c16f9
|
File details
Details for the file pdschema-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pdschema-0.1.3-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50187f12f72a4a554127abce84372fdbfffab241a7b4b424d73371b564072510
|
|
| MD5 |
9880bd3a2f57f89c33b20b46f322e494
|
|
| BLAKE2b-256 |
7b226f4edfebcf6e1703378df6f0aca2e51b6db1c4437825c330bc29031aa97e
|