Gone are the days of black-box dataframes in otherwise type-safe code! Pandantic builds off the Pydantic API to enable validation and filtering of the usual dataframe types (i.e., pandas, etc.)

These details have not been verified by PyPI

Project description

pandantic

pandantic introduces the ability to validate (pandas) DataFrames using pydantic.BaseModels. The pandantic package is using the V2 version of pydantic as it has significant improvements over its V1 versions (a performance increase up to 50 times).

First, install pandantic by using pip (or any other package managing tool).

pip install pandantic

Docs

Documentation can be found here

parse_df

To validate pd.DataFrames using Pydantic BaseModels make sure to import the BaseModel class from the pandantic package.

from pandantic import BaseModel

The pandantic.BaseModel subclasses the original pydantic.BaseModel which means the pandantic.BaseModel includes all functionality from the original pydantic.BaseModel but it adds the parse_df class method which should be used to parse DataFrames.

A quick example

Enough of the talking, lets just make things easier by showing a very minor but quick example. Make sure to import the BaseModel class from pandantic and create a schema like we normally would when using pydantic.

from pydantic.types import StrictInt

from pandantic import BaseModel


class DataFrameSchema(BaseModel):
    """Example schema for testing."""

    example_str: str
    example_int: StrictInt

Let's try this schema on a simple pandas.DataFrame. Use the class method parse_df from the freshly defined DataFrameSchema and specify the dataframe that should be validated using the arguments of the method. In this example, we want to filter out the bad records (there are more options like the good old raise to raise a ValueError after validating the whole DataFrame). In this case, only the second record would be kept in the returned DataFrame.

df_invalid = pd.DataFrame(
    data={
        "example_str": ["foo", "bar", 1],
        "example_int": ["1", 2, 3.0],
    }
)

df_filtered = DataFrameSchema.parse_df(
    dataframe=df_invalid,
    errors="filter",
)

Pandas plugin

Another way to use pandantic is via our pandas.DataFrame extension plugin. This adds the following methods to pandas (once "registered" by import pandantic.plugins.pandas):

DataFrame.pandantic.validate(schema:PandanticBaseModel), which returns a boolean for all valid inputs.
DataFrame.pandantic.filter(schema:PandanticBaseModel), which wraps PandanticBaseModel.parse_obj(errors="filter") and returns as dataframe.

Example:

from pandantic import BaseModel
import pandantic.plugins.pandas

df1: pd.DataFrame = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})
class MyModel(BaseModel):
    a: int
    b: str

df1.pandantic.validate(MyModel)  # returns True
df1.pandantic.filter(MyModel)  # returns the same dataframe

# but if we have a mixed DataFrame
df2: pd.DataFrame = pd.DataFrame({"a": [1, 2, "3"], "b": ["a", 3, "c"]})

df2.pandantic.validate(MyModel)  # returns False
df2.pandantic.filter(MyModel)  # returns the filtered DataFrame with only the first row

Custom validator example

One of the great features of Pydantic is the ability to create custom validators. Luckily, those custom validators will also work when parsing DataFrames using pandantic. Make sure to import the original decorator from the pydantic package and keep in mind that pandantic is using the V2 of Pydantic (so field_validation it is). In the example below the BaseModel will validate the example_int field and makes sure it is an even number.

from pydantic import ValidationError, field_validator


class DataFrameSchema(BaseModel):
    """Example schema for testing."""

    example_str: str
    example_int: int

    @field_validator("example_int")
    def validate_even_integer(  # pylint: disable=invalid-name, no-self-argument
        cls, x: int
    ) -> int:
        """Example custom validator to validate if int is even."""
        if x % 2 != 0:
            raise ValidationError(f"example_int must be even, is {x}.")
        return x

By setting the errors argument to raise, the code will raise an ValueError after validating every row as the first row contains an uneven number.

example_df_invalid = pd.DataFrame(
    data={
        "example_str": ["foo", "bar", "baz"],
        "example_int": [1, 4, 12],
    }
)

df_raised_error = DataFrameSchema.parse_df(
    dataframe=example_df_invalid,
    errors="raise",
)

Special fields and types

Optional

As the DataFrame is being parsed into a dict, a None value is considered as a nan value in cases there are different values in the dict. Therefore, specifying Optional columns (where the value can be empty) can be speciyfied by using the custom pandantic.Optional type. This type is a replacement for typing.Optional.

from pandantic import BaseModel, Optional

class Model(BaseModel):
    a: Optional[int] = None
    b: int

df_example = pd.DataFrame({"a": [1, None, 2], "b": ["str", 2, 3]})

df_filtered = Model.parse_df(df_example, errors="filter", verbose=True)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.1

Apr 14, 2025

This version

1.0.0

Jan 10, 2025

0.3.1

Oct 3, 2024

0.3.0

Sep 5, 2023

0.2.2

Aug 28, 2023

0.2.1

Jul 6, 2023

0.2.0

May 2, 2023

0.1.2

Apr 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandantic-1.0.0.tar.gz (7.3 kB view details)

Uploaded Jan 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pandantic-1.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Jan 10, 2025 Python 3

File details

Details for the file pandantic-1.0.0.tar.gz.

File metadata

Download URL: pandantic-1.0.0.tar.gz
Upload date: Jan 10, 2025
Size: 7.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.0 CPython/3.10.16 Linux/6.8.0-1017-azure

File hashes

Hashes for pandantic-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8d6f9093f49b82a8e9297382c4c321ef258e72480b9809516951cd1fb765a653`
MD5	`afb56251cf9143bb7dbecafd2074433e`
BLAKE2b-256	`e3f3dd9c02b5240aa8bcadeb8119d68bd38029f964182aa4bf8037d7b6524825`

See more details on using hashes here.

File details

Details for the file pandantic-1.0.0-py3-none-any.whl.

File metadata

Download URL: pandantic-1.0.0-py3-none-any.whl
Upload date: Jan 10, 2025
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.0 CPython/3.10.16 Linux/6.8.0-1017-azure

File hashes

Hashes for pandantic-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`106bf4a19cca4f72a3347f3ee33544b33330e91f5b5f84be2a599f05fbd08d6c`
MD5	`6b2098096e5ee0dddc4a394bc03a2198`
BLAKE2b-256	`fa54240ffd2164dc995e6cb98e2719416fe7865f5bc38f9ecbaa6286567a8e66`

See more details on using hashes here.

pandantic 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

pandantic

Docs

parse_df

A quick example

Pandas plugin

Custom validator example

Special fields and types

Optional

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes