Skip to main content

A Python library for validating pandas DataFrames using schemas

Project description

pdschema

A Python library for validating pandas DataFrames using schemas, with support for type checking, custom validators, and function input/output validation.

Features

  • Define schemas for pandas DataFrames with type checking and validation
  • Support for custom validators (e.g., IsPositive, IsNonEmptyString, Range, etc.)
  • Function decorator for validating input and output DataFrames
  • PyArrow type integration for efficient type checking
  • Schema inference from existing DataFrames
  • Nullable column support
  • Comprehensive type mapping between Python, pandas, and PyArrow types

Installation

Using pip

pip install pdschema

Using Poetry

poetry add pdschema

Quick Start

import pandas as pd
from pdschema import Schema, Column, IsPositive, IsNonEmptyString

# Define a schema
schema = Schema([
    Column("id", int, nullable=False),
    Column("name", str, nullable=False, validators=[IsNonEmptyString()]),
    Column("age", int, validators=[IsPositive()]),
    Column("score", float, validators=[Range(0, 100)])
])

# Create a DataFrame
df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "score": [85.5, 92.0, 78.5]
})

# Validate the DataFrame
schema.validate(df)  # Raises ValueError if validation fails

Function Validation

Use the @pdfunction decorator to validate function inputs and outputs:

from pdschema import pdfunction

@pdfunction(
    arguments={
        "df": Schema([Column("id", int), Column("value", float)]),
        "threshold": float
    },
    outputs={
        "result": Schema([Column("id", int), Column("filtered_value", float)])
    }
)
def filter_values(df, threshold):
    result = df[df["value"] > threshold]
    return {"result": result}

Available Validators

  • IsPositive: Ensures numeric values are positive
  • IsNonEmptyString: Ensures strings are non-empty
  • Max: Ensures values are less than or equal to a maximum
  • Min: Ensures values are greater than or equal to a minimum
  • GreaterThan: Ensures values are greater than a threshold
  • GreaterThanOrEqual: Ensures values are greater than or equal to a threshold
  • LessThan: Ensures values are less than a threshold
  • LessThanOrEqual: Ensures values are less than or equal to a threshold
  • Choice: Ensures values are in a list of allowed choices
  • Length: Ensures values have a specific length or length range
  • Range: Ensures values are within a range

Schema Inference

You can infer a schema from an existing DataFrame:

df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
})

schema = Schema.infer_schema(df)

Contributing

  1. Fork the repository
  2. Create a new branch for your feature
  3. Install development dependencies: poetry install --with dev
  4. Make your changes
  5. Run tests: poetry run pytest
  6. Run linting: poetry run ruff check . && poetry run ruff format .
  7. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdschema-0.1.2.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdschema-0.1.2-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file pdschema-0.1.2.tar.gz.

File metadata

  • Download URL: pdschema-0.1.2.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.2.tar.gz
Algorithm Hash digest
SHA256 334fdab6bed227674b38beeeaac7afef072949a24a95a11a39d065f3182ba35e
MD5 790729d05deadaf549fd6a675df56c70
BLAKE2b-256 2d5d955dc1d7e7b02deb82a7bc59e75038207c83c9dc9c614eb087885be5766c

See more details on using hashes here.

File details

Details for the file pdschema-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pdschema-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b4231e6e9e22cb554849f5888ae1529674677b70a9019969e3407cc2f57a40ae
MD5 dbe46dc617fa8e08fe5a06c17e52f283
BLAKE2b-256 d842eb3f5b411ee1f339741287b706929656d49c6badd43b08538a922d5b8696

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page