A light-weight and flexible validation package for pandas data structures.
Project description
A light-weight and flexible validation package for pandas data structures.
Supports: python 2.7, 3.5, 3.6
Why?
Because pandas data structures hide a lot of information, and explicitly validating them in production-critical or reproducible research settings is a good idea.
And it also makes it easier to review pandas code :)
Documentation
The official documentation is hosted on ReadTheDocs: https://pandera.readthedocs.io
Install
pip install pandera
Example Usage
DataFrameSchema
import pandas as pd
from pandera import Column, DataFrameSchema, Float, Int, String, Check
# validate columns
schema = DataFrameSchema({
# the check function expects a series argument and should output a boolean
# or a boolean Series.
"column1": Column(Int, Check(lambda s: s <= 10)),
"column2": Column(Float, Check(lambda s: s < -1.2)),
# you can provide a list of validators
"column3": Column(String, [
Check(lambda s: s.str.startswith("value_")),
Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
]),
})
# alternatively, you can pass strings representing the legal pandas datatypes:
# http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes
schema = DataFrameSchema({
"column1": Column("int64", Check(lambda s: s <= 10)),
...
})
df = pd.DataFrame({
"column1": [1, 4, 0, 10, 9],
"column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
"column3": ["value_1", "value_2", "value_3", "value_2", "value_1"]
})
validated_df = schema.validate(df)
print(validated_df)
# column1 column2 column3
# 0 1 -1.3 value_1
# 1 4 -1.4 value_2
# 2 0 -2.9 value_3
# 3 10 -10.1 value_2
# 4 9 -20.4 value_1
Tests
pip install pytest
pytest tests
Contributing to pandera
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
A detailed overview on how to contribute can be found in the contributing guide on GitHub.
Issues
Go here to submit feature requests or bugfixes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandera-0.1.3.tar.gz
.
File metadata
- Download URL: pandera-0.1.3.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d9f4f3afb88bbe9331be6bbd5fb6d23a877a4c8bbb27e213c4635a6cc9b0a46 |
|
MD5 | 4f8f00eb7390ed754bdc06584b0e1771 |
|
BLAKE2b-256 | 583fcf51553bdfe64ff507bb731e6f68de7084d08130bc2c560d46bd57cfd22e |
File details
Details for the file pandera-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: pandera-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94d3e560deb42c353dd0ef5eaa5281e446fd4d64ab09969b818b0dc367d6b22a |
|
MD5 | 88e7c14a81240cf68e291b61f3ab9e85 |
|
BLAKE2b-256 | 58a649420e359ff9c855f9401dabe19e6b290b5506d987b04d49bdf8071c1513 |