🐼 Patrol your data tests
Project description
Panda Patrol
Add dashboards, alerting, and silencing to your data tests with < 10 lines of code.
Questions and feedback
Email: ivanzhangofficial@gmail.com
Call: https://calendly.com/aivanzhang/chat
Overview
Wrap your existing data tests to automatically generate dashboards, alerting, and silencing. Currently this library does not deal with the orchestration of these data tests. However this may be added in the future depending on demand.
Quickstart
1) Installation
Install the latest version of panda-patrol using pip:
pip install panda-patrol
2) Setup the environment variables
In an existing or new .env
file, set the following environment variables:
PANDA_PATROL_URL
PANDA_PATROL_ENV
PANDA_DATABASE_URL
SMTP_SERVER
SMTP_PORT
SMTP_USER
SMTP_PASS
PATROL_EMAIL
See .env.example
for more information about how to set these environment variables. See Environment Variables for more information about each environment variable.
3) Start the panda-patrol server. This will spin up a website at PANDA_PATROL_URL
.
python -m panda_patrol
4) Wrap your existing data tests
Spin up a new data test dashboard by wrapping your existing data tests with patrol_group
and @patrol
. The following example shows how to wrap a data test in a dagster pipeline. However, you can use whatever Python-based data pipeline.
At a high level, you do the following:
- Import
patrol_group
and@patrol
- Group several data tests with
patrol_group
- Wrap each individual existing data test with
@patrol
from panda_patrol.patrols import patrol_group
...
with patrol_group(PATROL_GROUP_NAME) as patrol:
@patrol(PATROL_NAME)
def DATA_TEST_NAME(patrol_id):
...
Here is a more detailed example of how to wrap a data test in a dagster pipeline. Before (hello-dagster.py
from https://docs.dagster.io/getting-started/hello-dagster):
def hackernews_top_stories(context: AssetExecutionContext):
"""Get items based on story ids from the HackerNews items endpoint."""
with open("hackernews_top_story_ids.json", "r") as f:
hackernews_top_story_ids = json.load(f)
results = []
# Get information about each item including the url
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)
# DATA TEST: Make sure that the item's URL is a valid URL
for item in results:
print(item["url"])
get_item_response = requests.get(item["url"])
assert get_item_response.status_code == 200
...
After:
+ from panda_patrol.patrols import patrol_group
...
def hackernews_top_stories(context: AssetExecutionContext):
"""Get items based on story ids from the HackerNews items endpoint."""
with open("hackernews_top_story_ids.json", "r") as f:
hackernews_top_story_ids = json.load(f)
results = []
# Get information about each item including the url
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)
# DATA TEST: Make sure that the item's URL is a valid URL
+ with patrol_group("Hackernews Items are Valid") as patrol:
+ @patrol("URLs work")
+ def urls_work(patrol_id):
"""URLs for stories should work."""
for item in results:
print(item["url"])
get_item_response = requests.get(item["url"])
assert get_item_response.status_code == 200
return len(results)
...
❗IMPORTANT
Note that each data test method (i.e.urls_work
) should have only one parameterpatrol_id
. This parameter will be useful when defining parameters for your data tests in the Parameters.
5) Run your data pipeline
Start your data pipelines as you normally would. Then run the step in the pipeline with the test. Here we use dagster to run the data tests. However, you can use whatever Python-based data pipeline.
dagster dev -f hello-dagster.py
6) View the results
Go to PANDA_PATROL_URL
to view the results of your data tests. You should see something like this:
Dashboard
Run Details
:tada: Congrats! :tada: You have created your first data test dashboard! See the documentation for more information on other features like adjustable parameters, alerting, and silencing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for panda_patrol-0.0.71-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e261ca6dbc0e7cea439edead80e8b8217d06fe027195af1cc48d299231f01a5e |
|
MD5 | e01fef1a6a3ac09b9395bca87e28b4bf |
|
BLAKE2b-256 | 146ea6f5e5e617750946edc95ce0dfefe909398006801980be1503adc4c9d1f3 |