Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.3.1-py3.5.egg (18.9 kB view details)

Uploaded Egg

pandas_schema-0.3.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.1.tar.gz.

File metadata

  • Download URL: pandas_schema-0.3.1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pandas_schema-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b751a2fd7da28129417cfef7a2823653d41b83a10f4adec27e210986bc273371
MD5 544588ac9b519e0d82640c6a5cd0bcd0
BLAKE2b-256 86068a3ec6fc1e8696b469d5a8f51594b5622ff49f579ce95763fc4a1fda548c

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.1-py3.5.egg.

File metadata

File hashes

Hashes for pandas_schema-0.3.1-py3.5.egg
Algorithm Hash digest
SHA256 018fb01812ed62a248ec287c4db1e7214e72a33bf7285d7d313fc19997ce323a
MD5 b61132ab7320f6083cd61eed52303ef0
BLAKE2b-256 994a8bdcddcefba1c1290689487705fb34b7076a5fa7165261aeeac25ad8a2d5

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 61d2d19350989f56d19d3b4c65049dc40035b9c792812c7de0b01894b8def157
MD5 4f4dd1fa10cf2892cdaaac656df2bf45
BLAKE2b-256 f28c5aef19521e6a1d2006721497acbb100ef93d13602cc93da3ccc56a4a6e18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page