Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.2.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.2.0-py3.5.egg (18.4 kB view details)

Uploaded Egg

pandas_schema-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.2.0.tar.gz.

File metadata

  • Download URL: pandas_schema-0.2.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pandas_schema-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bee06f42741ba5b8c3c6eadb54f7d12d390a91e52399f8856aab2d8983e15112
MD5 2160587e3662612b53475a9df83e6dd1
BLAKE2b-256 d46304d5d615a8663c524c8834d8fd2e30ab571f2c3b559e585373e44b06d570

See more details on using hashes here.

File details

Details for the file pandas_schema-0.2.0-py3.5.egg.

File metadata

File hashes

Hashes for pandas_schema-0.2.0-py3.5.egg
Algorithm Hash digest
SHA256 ad1041ff8e482b6f1d8dd41e9261b9c09ba7336e45dd8c2258d4f144acff3e23
MD5 b9360c86a3eaf77ee640ca764773e220
BLAKE2b-256 f47f20118866ae95648e52cf2323848ac3cdda043977ab3f065b651ea669b703

See more details on using hashes here.

File details

Details for the file pandas_schema-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebaf02e583e4764c040feda52dccc8cf4ef3b0301bebdbefeffe8d84443c22a
MD5 1a58f98765c44908d4e53867977b6055
BLAKE2b-256 def60c6098bd13c6daed642d2b05983d400904fb8b0a3494ee5ba694d75277b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page