Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.3.tar.gz (13.9 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.3.3-py3.6.egg (19.6 kB view details)

Uploaded Source

pandas_schema-0.3.3-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.3.tar.gz.

File metadata

  • Download URL: pandas_schema-0.3.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.3.tar.gz
Algorithm Hash digest
SHA256 9c9ee8c4ed7d15c69c4779f995cf18733b83604d289d4de787e3b4bbe9861e89
MD5 c4cac51af47b721b8454872d5a6dca24
BLAKE2b-256 7abacee9ccdce8093cbce4535cc8f3aaadfc4c4778805da9eb4e962762537bbd

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.3-py3.6.egg.

File metadata

  • Download URL: pandas_schema-0.3.3-py3.6.egg
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.3-py3.6.egg
Algorithm Hash digest
SHA256 439912ea57d7d8f9343d4ab85a4c330bde690fe506526de0c1e69c428a9b57ff
MD5 78b9ffc5c2fa75d32e6d8f5ea9f7efc9
BLAKE2b-256 5dcd69c52acb797b898a3de71100eddd34f3a7b8f792b941639fb3650fa3ebad

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: pandas_schema-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.7

File hashes

Hashes for pandas_schema-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 74f499b428c25322b6198c269fe42c6a74d0ba9f7d0aee0713445240ba745a2d
MD5 f29d142f867701e410c2847eb2614f3b
BLAKE2b-256 3ef8090e00bbf4de86d2c608f21235d96b0e867c26a6134fe679c4e1dd8d73a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page