Skip to main content

A light-weight and flexible data validation and testing tool for statistical data objects.

Project description


The Open-source Framework for Dataset Validation

📊 🔎 ✅

Data validation for scientists, engineers, and analysts seeking correctness.


CI Build Documentation Status PyPI version shields.io PyPI license pyOpenSci Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status codecov PyPI pyversions DOI asv Total Downloads Conda Downloads Slack

Pandera is a Union.ai open source project that provides a flexible and expressive API for performing data validation on dataframe-like objects. The goal of Pandera is to make data processing pipelines more readable and robust with statistically typed dataframes.

Install

Pandera supports multiple dataframe libraries, including pandas, polars, pyspark, and more. To validate pandas DataFrames, install Pandera with the pandas extra:

With pip:

pip install 'pandera[pandas]'

With uv:

uv pip install 'pandera[pandas]'

With conda:

conda install -c conda-forge pandera-pandas

Get started

First, create a dataframe:

import pandas as pd
import pandera.pandas as pa

# data to validate
df = pd.DataFrame({
    "column1": [1, 2, 3],
    "column2": [1.1, 1.2, 1.3],
    "column3": ["a", "b", "c"],
})

Validate the data using the object-based API:

# define a schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, pa.Check.ge(0)),
    "column2": pa.Column(float, pa.Check.lt(10)),
    "column3": pa.Column(
        str,
        [
            pa.Check.isin([*"abc"]),
            pa.Check(lambda series: series.str.len() == 1),
        ]
    ),
})

print(schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

Or validate the data using the class-based API:

# define a schema
class Schema(pa.DataFrameModel):
    column1: int = pa.Field(ge=0)
    column2: float = pa.Field(lt=10)
    column3: str = pa.Field(isin=[*"abc"])

    @pa.check("column3")
    def custom_check(cls, series: pd.Series) -> pd.Series:
        return series.str.len() == 1

print(Schema.validate(df))
#    column1  column2 column3
# 0        1      1.1       a
# 1        2      1.2       b
# 2        3      1.3       c

[!WARNING] Pandera v0.24.0 introduces the pandera.pandas module, which is now the (highly) recommended way of defining DataFrameSchemas and DataFrameModels for pandas data structures like DataFrames. Defining a dataframe schema from the top-level pandera module will produce a FutureWarning:

import pandera as pa

schema = pa.DataFrameSchema({"col": pa.Column(str)})

Update your import to:

import pandera.pandas as pa

And all of the rest of your pandera code should work. Using the top-level pandera module to access DataFrameSchema and the other pandera classes or functions will be deprecated in version 0.29.0

Next steps

See the official documentation to learn more.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandera-0.31.0rc1.tar.gz (718.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandera-0.31.0rc1-py3-none-any.whl (379.2 kB view details)

Uploaded Python 3

File details

Details for the file pandera-0.31.0rc1.tar.gz.

File metadata

  • Download URL: pandera-0.31.0rc1.tar.gz
  • Upload date:
  • Size: 718.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pandera-0.31.0rc1.tar.gz
Algorithm Hash digest
SHA256 9d2b57db8e9d51b8b6749ecca0ec71db18920ca4748f6fd1ef3662b3bb83557a
MD5 6eb914c9ea54341b615d633901871a47
BLAKE2b-256 8378a4f0d1eab94b1ffe217559538aff6abddecaa309de47418dee81f5d77e68

See more details on using hashes here.

File details

Details for the file pandera-0.31.0rc1-py3-none-any.whl.

File metadata

  • Download URL: pandera-0.31.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 379.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pandera-0.31.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f5c4c54d22da6a9b816f1e39ee72bec0f0b3a1d4b266914bd76d1b856d17561
MD5 af45f178253f822a7487e6c150b5fc39
BLAKE2b-256 fa965f636b3bb9cc248cd17988c5b9ff65ba4c4f7fc257507dc9470dec77d7f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page