Skip to main content

Data quality checks that don't suck.

Project description

Data Checks

License Python

Create, schedule, and deploy data quality checks.

Overview

Exisiting data observability solutions are painfully static. data_checks provides a dynamic data observability framework that allows you to reuse existing Python code and/or write new Python code to define data quality checks that can then be easily scheduled and monitored. Inspired by Python's unittest, data_checks allows you to write data quality checks as easily and seamlessly as you would write unittests on your code.

Quickstart

1) Installation

Install the latest version of data_checks using pip:

pip install pydata-checks

2) Start a Data Check project

Initialize a new data_checks project by using the init command from your project directory (/Users/USERNAME/Desktop/PROJECT_NAME):

python -m data_checks.init

This will start a series of prompts that will guide you through the process of initializing a new data_checks project. For example:

$ python -m data_checks.init
Enter the relative file path of the directory where suites will be stored: my_first_data_checks_project/suites
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/suites' does not exist.
Would you like to create it? [y/n]: y
Enter the relative file path of the directory where checks will be stored: my_first_data_checks_project/checks
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/checks' does not exist.
Would you like to create it? [y/n]: y
Enter the default CRON schedule: * * * * *
Enter the database URL: database_url
Enter the alerting endpoint URL:
check_settings.py generated.
my_first_data_check.py generated.

This will create a new directory with the following structure:

PROJECT_NAME
├── my_first_data_checks_project
│   ├── __init__.py
│   ├── checks
│   │   ├── __init__.py
│   │   └── my_first_data_check.py
│   ├── suites
│   │   ├── __init__.py
├── check_settings.py

3) Set the CHECK_SETTINGS_MODULE to point to the check_settings.py file

export CHECK_SETTINGS_MODULE=check_settings

4) Run the autogenerated data check

python -m data_checks.do.run_check MyFirstDataCheck

Output:

[1/1 checks] MyFirstDataCheck
	[1/2 Rules] rule_my_first_successful_rule
		rule_my_first_successful_rule took 9.5367431640625e-07 seconds
	[2/2 Rules] rule_my_first_failed_rule
This rule failed

5) Modify the autogenerated data check

Open up the my_first_data_checks_project/checks.my_first_data_check.py file and customize the data check to your liking. For instance, you can modify the rule_my_first_failed_rule to always pass by removing the exception:

from data_checks.classes.data_check import DataCheck


class MyFirstDataCheck(DataCheck):
    ...

    def rule_my_first_failed_rule(self):
        # This rule will always pass
        assert True, "This rule failed"

    ...

Rerun the data check:

python -m data_checks.do.run_check MyFirstDataCheck

Output:

[1/1 checks] MyFirstDataCheck
	[1/2 Rules] rule_my_first_successful_rule
		rule_my_first_successful_rule took 9.5367431640625e-07 seconds
	[2/2 Rules] rule_my_first_failed_rule
		rule_my_first_failed_rule took 9.5367431640625e-07 seconds

:tada: Congrats! :tada: You've created and executed your first data check! See the documentation for more information on how writing more advanced checks, suites, and other features like scheduling and alerting.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydata_checks-0.0.81.tar.gz (28.1 kB view hashes)

Uploaded Source

Built Distribution

pydata_checks-0.0.81-py3-none-any.whl (47.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page