Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.2.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.2.1-py3.5.egg (18.4 kB view details)

Uploaded Source

pandas_schema-0.2.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.2.1.tar.gz.

File metadata

File hashes

Hashes for pandas_schema-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c3feab651e81b8e1640b5f0252ded5c78af65e898f4f77e3d35d5b51690526e9
MD5 d7600ba95a9b8b0883fae6aabdecf774
BLAKE2b-256 c753b09804eaa55818d60dc53ea43810c675cc5eb9c9ae9bf866cb3a23795aef

See more details on using hashes here.

File details

Details for the file pandas_schema-0.2.1-py3.5.egg.

File metadata

File hashes

Hashes for pandas_schema-0.2.1-py3.5.egg
Algorithm Hash digest
SHA256 f9a48e0ee689ae589259fa755087594506fa098e02a9848e5cd2cca1c68c52f3
MD5 83a01c2b9d2499ac4effc02c78a659da
BLAKE2b-256 001f04246d1d4c0ca2df6faf4f32b88e6c9da47c0742351d55414c5d4664c6c4

See more details on using hashes here.

File details

Details for the file pandas_schema-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0b6d2953f744bcc6871cb35c0ef9d6de509a04b4289ac6140a9ec68d4f65470e
MD5 dab5d6d2f751a5ddd0ebc7fd015713a5
BLAKE2b-256 f560f00e957d92b231d049159008cafc9257a6e696a9ddab3cb6d571b24f2e62

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page