Skip to main content

A Python library for validating pandas DataFrames using schemas

Project description

pdschema

A Python library for validating pandas DataFrames using schemas, with support for type checking, custom validators, and function input/output validation.

Features

  • Define schemas for pandas DataFrames with type checking and validation
  • Support for custom validators (e.g., IsPositive, IsNonEmptyString, Range, etc.)
  • Function decorator for validating input and output DataFrames
  • PyArrow type integration for efficient type checking
  • Schema inference from existing DataFrames
  • Nullable column support
  • Comprehensive type mapping between Python, pandas, and PyArrow types

Installation

Using pip

pip install pdschema

Using Poetry

poetry add pdschema

Quick Start

import pandas as pd

from pdschema import Column, IsNonEmptyString, IsPositive, Range, Schema

# Create a DataFrame
df = pd.DataFrame(
    {
        "idx": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "age": [25, 30, 35],
        "score": [85.5, 92.0, 78.5],
    }
)

# Define a schema using programmatic syntax
schema = Schema(
    [
        Column("idx", int, nullable=False),
        Column("name", str, nullable=False, validators=[IsNonEmptyString()]),
        Column("age", int, validators=[IsPositive()]),
        Column("score", float, validators=[Range(0, 100)]),
    ]
)

# Validate the DataFrame: raises ValueError if validation fails
schema.validate(df)

# Declarative Schema Definition
class MySchema(Schema):
    idx = Column(dtype=int, nullable=False)
    name = Column(dtype=str, nullable=False, validators=[IsNonEmptyString])
    age = Column(dtype=int, nullable=False, validators=[IsPositive])
    score = Column(dtype=float, nullable=False, validators=[Range(0, 100)])

MySchema().validate(df)

Function Validation

Use the @pdfunction decorator to validate function inputs and outputs:

from pdschema import pdfunction

@pdfunction(
    arguments={
        "df": Schema([Column("id", int), Column("value", float)]),
        "threshold": float
    },
    outputs={
        "result": Schema([Column("id", int), Column("filtered_value", float)])
    }
)
def filter_values(df, threshold):
    result = df[df["value"] > threshold]
    return {"result": result}

Available Validators

The package comes builtin with many Validators you can use.

  • IsPositive: Ensures numeric values are positive
  • IsNonEmptyString: Ensures strings are non-empty
  • Max: Ensures values are less than or equal to a maximum
  • Min: Ensures values are greater than or equal to a minimum
  • GreaterThan: Ensures values are greater than a threshold
  • GreaterThanOrEqual: Ensures values are greater than or equal to a threshold
  • LessThan: Ensures values are less than a threshold
  • LessThanOrEqual: Ensures values are less than or equal to a threshold
  • Choice: Ensures values are in a list of allowed choices
  • Length: Ensures values have a specific length or length range
  • Range: Ensures values are within a range

Schema Inference

You can infer a schema from an existing DataFrame:

df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
})

schema = Schema.infer_schema(df)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdschema-0.1.3.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdschema-0.1.3-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file pdschema-0.1.3.tar.gz.

File metadata

  • Download URL: pdschema-0.1.3.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5746a717f77df1d03647e6accd718ca34c27d1a954caa01a0feb2dc981e57a87
MD5 411f9089ae4edf4a06c072382c63fd7a
BLAKE2b-256 cabbed17601ab3692d8bddc4a44b6d66ec6001eb8a0a91ea9d854191ef0c16f9

See more details on using hashes here.

File details

Details for the file pdschema-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pdschema-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.12 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for pdschema-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 50187f12f72a4a554127abce84372fdbfffab241a7b4b424d73371b564072510
MD5 9880bd3a2f57f89c33b20b46f322e494
BLAKE2b-256 7b226f4edfebcf6e1703378df6f0aca2e51b6db1c4437825c330bc29031aa97e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page