Skip to main content

Gone are the days of black-box dataframes in otherwise type-safe code! Pandantic builds off the Pydantic API to enable validation and filtering of the usual dataframe types (i.e., pandas, etc.)

Project description

pandantic

pandantic introduces the ability to validate (pandas) DataFrames using the pydantic.BaseModel. The package is still in development and wants to focus on more dataframe types in the future (like polars and spark) besides pandas. Currently, only the pandas type is supported together with a pandas plugin.

First, install pandantic by using pip (or any other package managing tool).

pip install pandantic

Docs

Documentation can be found here

from pydantic import BaseModel
from pydantic.types import StrictInt

from pandantic import Pandantic


# Define your schema using Pydantic BaseModel
class DataFrameSchema(BaseModel):
    """Example schema for testing."""
    example_str: str
    example_int: StrictInt

# Create a validator instance
validator = Pandantic(schema=DataFrameSchema)

# Example DataFrame with some invalid data
df_invalid = pd.DataFrame(
    data={
        "example_str": ["foo", "bar", 1],  # Last value is invalid (int instead of str)
        "example_int": ["1", 2, 3.0],      # First and last values are invalid (str and float)
    }
)

# Validate with error raising
try:
    validator.validate(dataframe=df_invalid, errors="raise")
except ValueError:
    print("Validation failed!")

# Or filter out invalid rows
df_valid = validator.validate(dataframe=df_invalid, errors="skip")
# Only the second row remains as it's the only valid one

The validator supports two modes:

  • errors="raise": Raises a ValueError if any row fails validation
  • errors="skip": Returns a new DataFrame with only the valid rows

Pandas plugin

Another way to use pandantic is via our pandas.DataFrame extension plugin. This adds the following methods to pandas (once "registered" by import pandantic.plugins.pandas):

  • DataFrame.pandantic.validate(schema:PandanticBaseModel), which returns a boolean for all valid inputs.
  • DataFrame.pandantic.filter(schema:PandanticBaseModel), which wraps PandanticBaseModel.parse_obj(errors="filter") and returns as dataframe.

Example:

import pandas as pd
from pydantic import BaseModel

import pandantic.plugins.pandas


df1: pd.DataFrame = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})
class MyModel(BaseModel):
    a: int
    b: str

df1.pandantic.validate(MyModel)  # returns True
df1.pandantic.filter(MyModel)  # returns the same dataframe

# but if we have a mixed DataFrame
df2: pd.DataFrame = pd.DataFrame({"a": [1, 2, "3"], "b": ["a", 3, "c"]})

df2.pandantic.validate(MyModel)  # returns False
df2.pandantic.filter(MyModel)  # returns the filtered DataFrame with only the first row

Advanced Features

Strict Type Validation

The validator supports Pydantic's strict types for more rigorous validation:

from pydantic import BaseModel
from pydantic.types import StrictInt
from pandantic import Pandantic

class StrictSchema(BaseModel):
    example_str: str
    example_int: StrictInt  # Will only accept actual integers

validator = Pandantic(schema=StrictSchema)
df = pd.DataFrame({
    "example_str": ["foo", "bar"],
    "example_int": [1, "2"]  # Second value will fail as it's a string
})

# This will only keep the first row
df_valid = validator.validate(dataframe=df, errors="skip")

Custom Validators

You can still use all of Pydantic's validation features in your schema:

from pydantic import BaseModel, field_validator
from pandantic import Pandantic

class CustomSchema(BaseModel):
    example_str: str
    example_int: int

    @field_validator("example_int")
    def must_be_even(cls, v: int) -> int:
        if v % 2 != 0:
            raise ValueError("Number must be even")
        return v

validator = Pandantic(schema=CustomSchema)

Optional Fields

As the DataFrame is being parsed into a dict, a None value is considered as a nan value in cases there are different values in the dict. Therefore, specifying Optional columns (where the value can be empty) can be speciyfied by using the custom pandantic.Optional type. This type is a replacement for typing.Optional.

from pydantic import BaseModel
from pandantic import Optional  # pylint: disable=import-outside-toplevel

# GIVEN
class Model(BaseModel):
    a: Optional[int] = None
    b: int

df_example = pd.DataFrame({"a": [1, None, 2], "b": ["str", 2, 3]})

validator = Pandantic(schema=Model)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandantic-1.0.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandantic-1.0.1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file pandantic-1.0.1.tar.gz.

File metadata

  • Download URL: pandantic-1.0.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for pandantic-1.0.1.tar.gz
Algorithm Hash digest
SHA256 dfea9e40bf6b246d9866860761d4a449a92491e57ddd84c83bdda0d5e9787be0
MD5 eefd9067f162acec16e672c32b38bc3e
BLAKE2b-256 b5ded78b30e8242a6d2dcda6fa40f2d501df3e18afba1bc64976177448b31559

See more details on using hashes here.

File details

Details for the file pandantic-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pandantic-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for pandantic-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d13330cc041265ab13d01b2de22f18d8898e102ae435a0d5174414f40ee278f8
MD5 02e5b47ad859b678ef223b03932f452b
BLAKE2b-256 390df999cbaed47e84b592db9383f832960bb314ba084a7e231a21d1267e98fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page