Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.5.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

pandas_schema-0.3.5-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.5.tar.gz.

File metadata

  • Download URL: pandas_schema-0.3.5.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for pandas_schema-0.3.5.tar.gz
Algorithm Hash digest
SHA256 e68e93862cf728d2c4cec5237cf62e410d19be168c1c6d68fa5ed13bf543c6d6
MD5 6e74d6dc5d43029151a35fdd9f57642a
BLAKE2b-256 e4fea208f23ec8d91c91f16a35fd4dceb0ba01a9adbacb27a07a5119119f4c86

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: pandas_schema-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for pandas_schema-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2018a5843f9a8facfcf9332d557eb29dcb5f0758b8ba124d137d021a6e633f74
MD5 a8985858411e6e969db7c338e5f9b1fc
BLAKE2b-256 9c036d87ce8719dc57e44688096c05fb0efa61a08c6838816c9d991b1ece5b24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page