Skip to main content

A light-weight and flexible validation package for pandas data structures.

Project description


A light-weight and flexible validation package for pandas data structures.


Build Status PyPI version shields.io PyPI license Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation Status codecov

Supports: python 2.7, 3.5, 3.6

Why?

Because pandas data structures hide a lot of information, and explicitly validating them in production-critical or reproducible research settings is a good idea.

And it also makes it easier to review pandas code :)

Documentation

The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io

Install

pip install pandera

Example Usage

DataFrameSchema

import pandas as pd

from pandera import Column, DataFrameSchema, Float, Int, String, Check


# validate columns
schema = DataFrameSchema({
    # the check function expects a series argument and should output a boolean
    # or a boolean Series.
    "column1": Column(Int, Check(lambda s: s <= 10)),
    "column2": Column(Float, Check(lambda s: s < -1.2)),
    # you can provide a list of validators
    "column3": Column(String, [
        Check(lambda s: s.str.startswith("value_")),
        Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

# alternatively, you can pass strings representing the legal pandas datatypes:
# http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes
schema = DataFrameSchema({
    "column1": Column("int64", Check(lambda s: s <= 10)),
    ...
})

df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"]
})

validated_df = schema.validate(df)
print(validated_df)

#     column1  column2  column3
#  0        1     -1.3  value_1
#  1        4     -1.4  value_2
#  2        0     -2.9  value_3
#  3       10    -10.1  value_2
#  4        9    -20.4  value_1

Tests

pip install pytest
pytest tests

Contributing to pandera GitHub contributors

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the contributing guide on GitHub.

Issues

Go here to submit feature requests or bugfixes.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandera-0.1.3.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

pandera-0.1.3-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file pandera-0.1.3.tar.gz.

File metadata

  • Download URL: pandera-0.1.3.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.1

File hashes

Hashes for pandera-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8d9f4f3afb88bbe9331be6bbd5fb6d23a877a4c8bbb27e213c4635a6cc9b0a46
MD5 4f8f00eb7390ed754bdc06584b0e1771
BLAKE2b-256 583fcf51553bdfe64ff507bb731e6f68de7084d08130bc2c560d46bd57cfd22e

See more details on using hashes here.

File details

Details for the file pandera-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pandera-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.1

File hashes

Hashes for pandera-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 94d3e560deb42c353dd0ef5eaa5281e446fd4d64ab09969b818b0dc367d6b22a
MD5 88e7c14a81240cf68e291b61f3ab9e85
BLAKE2b-256 58a649420e359ff9c855f9401dabe19e6b290b5506d987b04d49bdf8071c1513

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page