Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesRegexValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesRegexValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the regex "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.1.0-py3.5.egg (2.1 kB view details)

Uploaded Egg

pandas_schema-0.1.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.1.0.tar.gz.

File metadata

  • Download URL: pandas_schema-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pandas_schema-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb51c504b153c406eb246f71e5176cbb7e7cad6c52cc0c5c528455b4a5aba730
MD5 ec7f98f1496109b5495c7aeffe0c00b1
BLAKE2b-256 db86210c6d9e8e7decef028308a7093d3f367c8a5642868b65860e4b3f22d3da

See more details on using hashes here.

File details

Details for the file pandas_schema-0.1.0-py3.5.egg.

File metadata

File hashes

Hashes for pandas_schema-0.1.0-py3.5.egg
Algorithm Hash digest
SHA256 b028ead7298326cec7ebd3bc2db7f7699af14d12de12b3c1296b63b50aabf799
MD5 0c7bf302774e4985da839b2618b4c308
BLAKE2b-256 726cec81361abff411daad7cc993403ac513ae486eecb05f58ba7528f3782ee5

See more details on using hashes here.

File details

Details for the file pandas_schema-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90fd5214e03b0dfa88becfb7de5dd7b0a278658418484ff1a63cfd546ba10797
MD5 150ff6e896c5533fdbe870d2dc8f5989
BLAKE2b-256 85d62917c4f7a4a45b8a2893a321610d7926696f58ec2a2e84e2e30310c2a206

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page