Skip to main content

Are your data meeting all your expecations

Project description

Data Expectations

Are your data meeting your expectations?


License Regression Suite Static Analysis codecov Downloads Code style: black PyPI Latest Release FOSSA Status

Data Expectations is a Python library which takes a delarative approach to asserting qualities of your datasets. Instead of tests like is_sorted to determine if a column is ordered, the expectation is column_values_are_increasing. Most of the time you don't need to know how it got like that, you are only interested what the data looks like.

Expectations can be used alongside, or in place of a schema validator, however Expectations is intended to perform validation of the data in a dataset, not just the structure of a table. Records should be a Python dictionary (or dictionary-like object) and can be processed one-by-one, or against an entire list of dictionaries.

Data Expectations was inspired by the great Great Expectations library, but we wanted something lighter and easier to quickly set up and run. Data Expectations can do less, but it does it with a fraction of the effort and has zero dependencies. Data Expectations was written to run as a step in data processing pipelines, testing the data before it is committed to the warehouse.

Provided Expectations

  • expect_column_to_exist (column)
  • expect_column_names_to_match_set (columns, ignore_excess:true)
  • expect_column_values_to_not_be_null (column)
  • expect_column_values_to_be_of_type (column, expected_type, ignore_nulls:true)
  • expect_column_values_to_be_in_type_list (column, type_list, ignore_nulls:true)
  • expect_column_values_to_be_more_than (column, threshold, ignore_nulls:true)
  • expect_column_values_to_be_less_than (column, threshold, ignore_nulls:true)
  • expect_column_values_to_be_between (column, maximum, minimum, ignore_nulls:true)
  • expect_column_values_to_be_increasing (column, ignore_nulls:true)
  • expect_column_values_to_be_decreasing (column, ignore_nulls:true)
  • expect_column_values_to_be_in_set (column, symbols, ignore_nulls:true)
  • expect_column_values_to_match_regex (column, regex, ignore_nulls:true)
  • expect_column_values_to_match_like (column, like, ignore_nulls:true)
  • expect_column_values_length_to_be_be (column, length, ignore_nulls:true)
  • expect_column_values_length_to_be_between (column, maximum, minimum, ignore_nulls:true)

Install

pip install data_expectations

Data Expectations has no external dependencies, can be used ad hoc and in-the-moment without complex set up.

Example Usage

import data_expectations as de

TEST_DATA = {"name":"charles","age":12}

set_of_expectations = [
    {"expectation": "expect_column_to_exist", "column": "name"},
    {"expectation": "expect_column_to_exist", "column": "age"},
    {"expectation": "expect_column_values_to_be_between", "column": "age", "minimum": 0, "maximum": 120},
]

expectations = de.Expectations(set_of_expectations)
try:
    de.evaluate_record(expectations, TEST_DATA)
except de.errors.ExpectationNotMetError:
    print("Data Didn't Meet Expectations")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_expectations-1.5.0.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

data_expectations-1.5.0-py3-none-any.whl (15.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page