Skip to main content

A validation library for Pandas data frames using user-friendly schemas

Project description

For the full documentation, refer to the Github Pages Website.


PandasSchema is a module for validating tabulated data, such as CSVs (Comma Separated Value files), and TSVs (Tab Separated Value files). It uses the incredibly powerful data analysis tool Pandas to do so quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

Given Name,Family Name,Age,Sex,Customer ID
Gerald,Hampton,82,Male,2582GABK
Yuuwa,Miyake,27,Male,7951WVLW
Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the correct format:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])

test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
Gerald ,Hampton,82,Male,2582GABK
Yuuwa,Miyake,270,male,7951WVLW
Edyta,Majewska ,50,Female,775ANSID
'''))

errors = schema.validate(test_data)

for error in errors:
    print(error)

PandasSchema would then output

{row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
{row: 1, column: "Age"}: "270" was not in the range [0, 120)
{row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
{row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
{row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_schema-0.3.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distributions

pandas_schema-0.3.0-py3.5.egg (18.8 kB view details)

Uploaded Source

pandas_schema-0.3.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file pandas_schema-0.3.0.tar.gz.

File metadata

File hashes

Hashes for pandas_schema-0.3.0.tar.gz
Algorithm Hash digest
SHA256 2b10de22f61e4e4119e6de3e41871d4b484176eb70e36b820530cde58e2fc7a6
MD5 d41117a8ac1f2aee8fe4999be1e87c7d
BLAKE2b-256 f11dd77f27258cc99377d27e81f7bd8b5b3cbedc0ea3a88f6412cb4020060f3d

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.0-py3.5.egg.

File metadata

File hashes

Hashes for pandas_schema-0.3.0-py3.5.egg
Algorithm Hash digest
SHA256 b634ab16fcd1d1d09c5157cfa39c9121f02da29b8ec45c28eb0deabf29218fe3
MD5 437b11cc3f0cd0ff50df2d7a4b1720a9
BLAKE2b-256 1b7f6234c9697a131ea9cb66958e00c5b665afde39aae19c4053614cfa7dc78b

See more details on using hashes here.

File details

Details for the file pandas_schema-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_schema-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da82fdfa54ff128d055e89c18b3bfbcb9524772f68488b4b3e7ce8a4320c87ea
MD5 79e2f7988756adcb70cdb794e7917e53
BLAKE2b-256 9188729f6c138e1b083db1a14fe31a57c0fe70f1ff805cb28359fb6179e99bf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page