Skip to main content

A light-weight and flexible validation package for pandas data structures.

Project description


A light-weight and flexible validation package for pandas data structures.


Build Status PyPI version shields.io PyPI license Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status codecov

Supports: python 2.7, 3.5, 3.6

Why?

Because pandas data structures hide a lot of information, and explicitly validating them in production-critical or reproducible research settings is a good idea.

And it also makes it easier to review pandas code :)

Documentation

The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io

Install

pip install pandera

Example Usage

DataFrameSchema

import pandas as pd

from pandera import Column, DataFrameSchema, Float, Int, String, Check


# validate columns
schema = DataFrameSchema({
    # the check function expects a series argument and should output a boolean
    # or a boolean Series.
    "column1": Column(Int, Check(lambda s: s <= 10)),
    "column2": Column(Float, Check(lambda s: s < -1.2)),
    # you can provide a list of validators
    "column3": Column(String, [
        Check(lambda s: s.str.startswith("value_")),
        Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

# alternatively, you can pass strings representing the legal pandas datatypes:
# http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes
schema = DataFrameSchema({
    "column1": Column("int64", Check(lambda s: s <= 10)),
    ...
})

df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"]
})

validated_df = schema.validate(df)
print(validated_df)

#     column1  column2  column3
#  0        1     -1.3  value_1
#  1        4     -1.4  value_2
#  2        0     -2.9  value_3
#  3       10    -10.1  value_2
#  4        9    -20.4  value_1

Tests

pip install pytest
pytest tests

Contributing to pandera GitHub contributors

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the contributing guide on GitHub.

Issues

Go here to submit feature requests or bugfixes.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandera-0.1.3.tar.gz (12.4 kB view hashes)

Uploaded Source

Built Distribution

pandera-0.1.3-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page