Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.2.tar.gz (13.5 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.3.2-py3.6.egg (19.4 kB view details)

Uploaded Source

pandas_schema-0.3.2-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.2.tar.gz.

File metadata

File hashes

Hashes for pandas_schema-0.3.2.tar.gz
Algorithm Hash digest
SHA256 21293757a29052c15fdc09e54539a34b94fd0d93792792411fa911c4702ae95c
MD5 dbcda14b8d47fd563d99048d298aad9b
BLAKE2b-256 516fd609e55ee9821c96b43b2ac654384c46de5360a4e617b4497d9e6b90b35d

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.2-py3.6.egg.

File metadata

File hashes

Hashes for pandas_schema-0.3.2-py3.6.egg
Algorithm Hash digest
SHA256 97350845fc26f9574cefe4c8e826dfd8a49049d5e708e44be8d654c84828f0f7
MD5 f50e8f6fd0565b8dc72df103811cd3ae
BLAKE2b-256 7a2113a0f648832814408e48b2c6d42acf64f8c88f093abe7c12b80d8d58ca92

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d1e5ac4b3794e5d7b5cbadd91c1a61453459356c1c4915804f17f781e2a99b45
MD5 075ca7a8fae950f62055e94492e34386
BLAKE2b-256 45acc41262d809a9935668503108ed9d09283759ec3942c01bad9d103cba7643

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page