Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.4.tar.gz (14.2 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.3.4-py3.6.egg (19.8 kB view details)

Uploaded Source

pandas_schema-0.3.4-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.4.tar.gz.

File metadata

  • Download URL: pandas_schema-0.3.4.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.4.tar.gz
Algorithm Hash digest
SHA256 1b4e86332d3f8e4efa162472fe87005a4fbf2613cacb292b8ba613484d6999ee
MD5 35d8fe25568709f7d18684c5ea285ade
BLAKE2b-256 3af97710e5be3819956f06730b3b6920e658eb25610e1d75b4e5e0890bd196d4

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.4-py3.6.egg.

File metadata

  • Download URL: pandas_schema-0.3.4-py3.6.egg
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.4-py3.6.egg
Algorithm Hash digest
SHA256 e205758624c34911768393b65aeebcaaa5aa0f3e937fa9627b2e1456781ec5eb
MD5 e1cde107a054c4dba101c22b4cd40c64
BLAKE2b-256 2032a6e52db96a66c47dfa14bc67de1153a6a20657452a5979e58d67cc29dfde

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: pandas_schema-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 33043e8bc102b28cde641a7c73466f725e19d73f198c514a1bbfc6cd9c85d204
MD5 c779f52e0d29c5484e13b72de4e8881c
BLAKE2b-256 591933bb47cc354a540980a6f3ec5a15d69cca5fbb240eecced4524c4e934e0c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page