Skip to main content

A Python library for validating pandas DataFrames using schemas

Project description

pdschema

A Python library for validating pandas DataFrames using schemas, with support for type checking, custom validators, and function input/output validation.

Features

  • Define schemas for pandas DataFrames with type checking and validation
  • Support for custom validators (e.g., IsPositive, IsNonEmptyString, Range, etc.)
  • Function decorator for validating input and output DataFrames
  • PyArrow type integration for efficient type checking
  • Schema inference from existing DataFrames
  • Nullable column support
  • Comprehensive type mapping between Python, pandas, and PyArrow types

Installation

Using pip

pip install pdschema

Using Poetry

poetry add pdschema

Quick Start

import pandas as pd
from pdschema import Schema, Column, IsPositive, IsNonEmptyString

# Define a schema
schema = Schema([
    Column("id", int, nullable=False),
    Column("name", str, nullable=False, validators=[IsNonEmptyString()]),
    Column("age", int, validators=[IsPositive()]),
    Column("score", float, validators=[Range(0, 100)])
])

# Create a DataFrame
df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "score": [85.5, 92.0, 78.5]
})

# Validate the DataFrame
schema.validate(df)  # Raises ValueError if validation fails

Function Validation

Use the @pdfunction decorator to validate function inputs and outputs:

from pdschema import pdfunction

@pdfunction(
    arguments={
        "df": Schema([Column("id", int), Column("value", float)]),
        "threshold": float
    },
    outputs={
        "result": Schema([Column("id", int), Column("filtered_value", float)])
    }
)
def filter_values(df, threshold):
    result = df[df["value"] > threshold]
    return {"result": result}

Available Validators

  • IsPositive: Ensures numeric values are positive
  • IsNonEmptyString: Ensures strings are non-empty
  • Max: Ensures values are less than or equal to a maximum
  • Min: Ensures values are greater than or equal to a minimum
  • GreaterThan: Ensures values are greater than a threshold
  • GreaterThanOrEqual: Ensures values are greater than or equal to a threshold
  • LessThan: Ensures values are less than a threshold
  • LessThanOrEqual: Ensures values are less than or equal to a threshold
  • Choice: Ensures values are in a list of allowed choices
  • Length: Ensures values have a specific length or length range
  • Range: Ensures values are within a range

Schema Inference

You can infer a schema from an existing DataFrame:

df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
})

schema = Schema.infer_schema(df)

Contributing

  1. Fork the repository
  2. Create a new branch for your feature
  3. Install development dependencies: poetry install --with dev
  4. Make your changes
  5. Run tests: poetry run pytest
  6. Run linting: poetry run ruff check . && poetry run ruff format .
  7. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdschema-0.1.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdschema-0.1.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file pdschema-0.1.1.tar.gz.

File metadata

  • Download URL: pdschema-0.1.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8127b9d66401793508f0dc51cb10f7c37cf74582829dfffe0021421cb4b51763
MD5 e34f3f37765ec47fa68ff61869a5e4a8
BLAKE2b-256 d0f47d1c7d5c15562b23035f6789c2598088353360e3b3ff575e6ebb89051da9

See more details on using hashes here.

File details

Details for the file pdschema-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pdschema-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5bf9b6e5e73faed4b5eea176cebee7f74b5bedc8e28e34316eb4fd2850fb5e27
MD5 0a9317522885cecd20e0556bdb183494
BLAKE2b-256 325df66c1e4bde98127e0cb53a92c8a2e1ec35e7f182347b520d0b3cb3ec81e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page