🐼 Patrol your data tests
Project description
[PLACEHOLDER (for panda patrol image)] Panda Patrol
Dashboards, alerts, and silencingl
Questions and feedback Email: ivanzhangofficial@gmail.com Call: https://calendly.com/aivanzhang/chat
See Examples for examples like anomaly and PII detection.
See Docs for a comprehensive list of all avalilable features.
Overview
Exisiting data observability solutions are painfully static. data_checks provides a dynamic data observability framework that allows you to reuse existing Python code and/or write new Python code to define data quality checks that can then be easily scheduled and monitored. Inspired by Python's unittest, data_checks allows you to write data quality checks as easily and seamlessly as you would write unittests on your code.
Quickstart
1) Installation
Install the latest version of data_checks using pip:
pip install pydata-checks
2) Start a Data Check project
Initialize a new data_checks project by using the init
command from your project directory (/Users/USERNAME/Desktop/PROJECT_NAME
):
python -m data_checks.init
This will start a series of prompts that will guide you through the process of initializing a new data_checks project. For example:
$ python -m data_checks.init
Enter the relative file path of the directory where suites will be stored: my_first_data_checks_project/suites
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/suites' does not exist.
Would you like to create it? [y/n]: y
Enter the relative file path of the directory where checks will be stored: my_first_data_checks_project/checks
Directory '/Users/USERNAME/Desktop/PROJECT_NAME/my_first_data_checks_project/checks' does not exist.
Would you like to create it? [y/n]: y
Enter the default CRON schedule: * * * * *
Enter the database URL: database_url
Enter the alerting endpoint URL:
check_settings.py generated.
my_first_data_check.py generated.
This will create a new directory with the following structure:
PROJECT_NAME
├── my_first_data_checks_project
│ ├── __init__.py
│ ├── checks
│ │ ├── __init__.py
│ │ └── my_first_data_check.py
│ ├── suites
│ │ ├── __init__.py
├── check_settings.py
3) Set the CHECK_SETTINGS_MODULE
to point to the check_settings.py
file
export CHECK_SETTINGS_MODULE=check_settings
4) Run the autogenerated data check
python -m data_checks.do.run_check MyFirstDataCheck
Output:
[1/1 checks] MyFirstDataCheck
[1/2 Rules] rule_my_first_failed_rule
This rule failed
DataCheckException(severity=1.0, exception=This rule failed, metadata={'rule': 'rule_my_first_failed_rule', 'params': {'args': (), 'kwargs': {}}})
[2/2 Rules] rule_my_first_successful_rule
rule_my_first_successful_rule took 0.0 seconds
5) Modify the autogenerated data check
Open up the my_first_data_checks_project/checks.my_first_data_check.py
file and customize the data check to your liking. For instance, you can modify the rule_my_first_failed_rule
to always pass by removing the exception:
from data_checks.classes.data_check import DataCheck
class MyFirstDataCheck(DataCheck):
...
def rule_my_first_failed_rule(self):
# This rule will now succeed
assert True, "This rule now succeeds"
...
Rerun the data check:
python -m data_checks.do.run_check MyFirstDataCheck
Output:
[1/1 checks] MyFirstDataCheck
[1/2 Rules] rule_my_first_successful_rule
rule_my_first_successful_rule took 9.5367431640625e-07 seconds
[2/2 Rules] rule_my_first_failed_rule
rule_my_first_failed_rule took 9.5367431640625e-07 seconds
:tada: Congrats! :tada: You've created and executed your first data check! See the documentation for more information on how writing more advanced checks, suites, and other features like scheduling and alerting.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for panda_patrol-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8387a2a936b46ddc906706fa3a25983119298596e48ac0d3653784d370edeb87 |
|
MD5 | f37ec664cad461327106876c3ac9459b |
|
BLAKE2b-256 | 73ebfd7f9fd940ca59ebaa5edffdfaa1371aa2b2c7569ebb7cc2c8be3a9270db |