Data quality checks that don't suck.
Project description
Data Checks
Create, schedule, and deploy data quality checks.
Overview
Exisiting data observability solutions are painfully static. data_checks provides a dynamic data observability framework that allows you to reuse existing Python code and/or write new Python code to define data quality checks that can then be easily scheduled and monitored. Inspired by Python's unittest, data_checks allows you to write data quality checks as easily and seamlessly as you would write unittests on your code.
Quickstart
1) Installation
Install the latest version of data_checks using pip:
pip install pydata-checks
2) Start a Data Check project
Initialize a new data_checks project by using the init
command from your project directory (/Users/USERNAME/Desktop/PROJECT_NAME
):
python -m data_checks.init
This will start a series of prompts that will guide you through the process of initializing a new data_checks project. For example:
$ python -m data_checks.init
Enter the relative file path of the directory where suites will be stored: my_first_data_checks_project/suites
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/suites' does not exist.
Would you like to create it? [y/n]: y
Enter the relative file path of the directory where checks will be stored: my_first_data_checks_project/checks
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/checks' does not exist.
Would you like to create it? [y/n]: y
Enter the default CRON schedule: * * * * *
Enter the database URL: database_url
Enter the alerting endpoint URL:
check_settings.py generated.
my_first_data_check.py generated.
This will create a new directory with the following structure:
PROJECT_NAME
├── my_first_data_checks_project
│ ├── __init__.py
│ ├── checks
│ │ ├── __init__.py
│ │ └── my_first_data_check.py
│ ├── suites
│ │ ├── __init__.py
├── check_settings.py
3) Set the CHECK_SETTINGS_MODULE
to point to the check_settings.py
file
export CHECK_SETTINGS_MODULE=check_settings
4) Run the autogenerated data check
python -m data_checks.do.run_check MyFirstDataCheck
Output:
[1/1 checks] MyFirstDataCheck
[1/2 Rules] rule_my_first_successful_rule
rule_my_first_successful_rule took 9.5367431640625e-07 seconds
[2/2 Rules] rule_my_first_failed_rule
This rule failed
5) Modify the autogenerated data check
Open up the my_first_data_checks_project/checks.my_first_data_check.py
file and customize the data check to your liking. For instance, you can modify the rule_my_first_failed_rule
to always pass by removing the exception:
from data_checks.classes.data_check import DataCheck
class MyFirstDataCheck(DataCheck):
...
def rule_my_first_failed_rule(self):
# This rule will always pass
assert True, "This rule failed"
...
Rerun the data check:
python -m data_checks.do.run_check MyFirstDataCheck
Output:
[1/1 checks] MyFirstDataCheck
[1/2 Rules] rule_my_first_successful_rule
rule_my_first_successful_rule took 9.5367431640625e-07 seconds
[2/2 Rules] rule_my_first_failed_rule
rule_my_first_failed_rule took 9.5367431640625e-07 seconds
:tada: Congrats! :tada: You've created and executed your first data check! See the documentation for more information on how writing more advanced checks, suites, and other features like scheduling and alerting.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pydata_checks-0.0.81-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85bc1ae73822b9cf99e1d1dfbce8a791362fe2f45832b41842e41dc00b8d9294 |
|
MD5 | d938af4eaa0f835f89370a1436ad40c9 |
|
BLAKE2b-256 | fe0070d37d1edb8df931da3faf279a4ff879535b987cc29c8d89479d584cf581 |