A Python project
Project description
PdSchema
A Python library for schema validation and type inference of pandas DataFrames using PyArrow types.
Features
- Schema validation for pandas DataFrames
- Type inference from pandas Series to PyArrow types
- Rich set of built-in validators
- Support for both Python types and pandas dtypes
- Nullability checks
- Custom validator support
Installation
pip install pyschema
Quick Start
import pandas as pd
from pyschema import Schema, Column
from pyschema.validators import IsPositive, IsNonEmptyString, Range
# Define your schema
schema = Schema([
Column("age", int, nullable=False, validators=[IsPositive(), Range(0, 120)]),
Column("name", str, nullable=False, validators=[IsNonEmptyString()]),
Column("score", float, validators=[Range(0.0, 100.0)]),
])
# Create a DataFrame
df = pd.DataFrame({
"age": [25, 30, 35],
"name": ["Alice", "Bob", "Charlie"],
"score": [95.5, 88.0, 91.2],
})
# Validate the DataFrame
schema.validate(df) # Returns True if valid, raises ValueError if invalid
Built-in Validators
PySchema provides a rich set of built-in validators:
from pyschema.validators import (
IsPositive,
IsNonEmptyString,
Max,
Min,
Range,
GreaterThan,
LessThan,
Choice,
Length,
)
# Examples
Column("age", int, validators=[IsPositive(), Max(120)])
Column("score", float, validators=[Range(0.0, 100.0)])
Column("status", str, validators=[Choice(["active", "inactive", "pending"])])
Column("description", str, validators=[Length(min_length=10, max_length=500)])
Type Support
PySchema supports both Python types and pandas dtypes, mapping them to appropriate PyArrow types:
Python Types
int→pa.int64()float→pa.float64()str→pa.string()bool→pa.bool_()datetime→pa.timestamp("us")date→pa.date32()time→pa.time64("us")Decimal→pa.decimal128(38, 18)
Pandas Dtypes
Int64Dtype→pa.int64()Float64Dtype→pa.float64()StringDtype→pa.string()BooleanDtype→pa.bool_()DatetimeTZDtype→pa.timestamp("us")CategoricalDtype→pa.dictionary(pa.int32(), pa.string())
API Reference
Schema
class Schema:
def __init__(self, columns: list[Column]):
"""Initialize a schema with a list of columns."""
pass
def validate(self, df: pd.DataFrame) -> bool:
"""Validate a DataFrame against the schema.
Returns:
bool: True if valid
Raises:
ValueError: If validation fails, with detailed error messages
"""
pass
Column
class Column:
def __init__(
self,
name: str,
dtype: type,
nullable: bool = True,
validators: list[Validator] | None = None,
):
"""Initialize a column definition.
Args:
name: Column name
dtype: Python type or pandas dtype
nullable: Whether the column can contain null values
validators: List of validators to apply
"""
pass
Validators
All validators inherit from the Validator abstract base class and implement the validate method:
class Validator(ABC):
@abstractmethod
def validate(self, value) -> bool:
"""Return True if value is valid, else False."""
pass
Development
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -
-
Install dependencies:
poetry install -
Install pre-commit hooks:
poetry run pre-commit install
-
Run tests:
poetry run pytest
License
[Your chosen license]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdschema-0.1.0.tar.gz.
File metadata
- Download URL: pdschema-0.1.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda2fc0b53735794ff9d16356376cf1ee6e662824fdabb048dd83c84a9c09fcb
|
|
| MD5 |
854318c486059113239ae88425bd58f7
|
|
| BLAKE2b-256 |
f2ba0d0a4ce7f4702a79073bc1bfae16c92528a522277e890f08064798a60914
|
File details
Details for the file pdschema-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdschema-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0bca54f930ea5367eedd282a6defde4b056124ef9eb2f674c013e9bc0c77d6a
|
|
| MD5 |
a0a28d889efbbd6387d258c209f4df41
|
|
| BLAKE2b-256 |
f9c43ac5ce40849d539e5d984b4e283102445d381c1bd019354f4bcdbf35a56c
|