Skip to main content

🐼 Patrol your data tests

Project description

Panda Patrol Panda Patrol

License Python

Gain greater visibility and context into your data pipelines with dashboards, alerting, silencing and other features built around your existing data tests and data profiling tools — within each step of your data pipelines. Add less than 5 lines of code and just run your pipelines as you normally would. Panda Patrol will take care of the rest.

Questions and feedback

Email: ivanzhangofficial@gmail.com

Call: https://calendly.com/aivanzhang/chat

Table of Contents

Integrations

  • Custom python data pipeline python
  • Airflow airflow
  • Dagster dagster
  • Prefect prefect
  • dbt core (>=1.5) dbt-core
  • dbt cloud dbt-cloud

For examples of each integration, see examples.

Features

This section describes the features of Panda Patrol at a high level. See demo for a short walkthrough of each feature. See wiki to learn how to implement each feature and more details.

AI-Generated Data Tests

Don't know what data tests to write? No problem. Panda Patrol can generate data tests for you. Just pass in the headers, a preview of the data, and optional additional context.

General Data Tests

Want to get started with a few quick, easy, general, and important data tests? Panda Patrol comes pre-built with a few data tests that run on your data. The best part? It only takes one function call to run these tests.

Anomaly Detection

Want to quickly check for anomalies in a column? Panda Patrol can do that for you. Panda Patrol uses the ECOD anomaly detection model from the pyod open source anomaly detection library. Just pass in the excepted distribution of the column and the current distribution of the column. Panda Patrol detect and surface any anomalies. Even better, customize your own anomaly detection model and pass it in to Panda Patrol.

Data Test Results

Write Python-based data tests right within your pipelines. Panda Patrol will store the results of each data test — the test code, logs, return value, start time, end time, exception (if any), and more — in a database. These results can be tracked in a general dashboard (with high level context like test status) and a dashboard for each pipeline run (with all the context w.r.t. the test).

Data Test Parameters

Data changes all the time. Your data tests should change to accomodate these changes. With Panda Patrol, you can pass in parameters to your data tests and later change these parameters on the frontend — with just one function call.

Monitor Data Pipeline Steps

Monitor each step of your pipeline so that you know each step is running as expected. Panda Patrol will store the start time, end time, and status of each step in a database. This gives you a high-level overview of your pipeline and allows you to drill down into each step to see more details.

Alerting

Be notified when your data tests fail. Configure your own email and Slack settings to receive alerts. Alerts provide all the details you see in the dashboards so you get all the context you need to debug pipelines.

Silencing

Want to skip and silence a data test? No problem. Silencing a data test is as easy as clicking a button and choosing a time.

Data Profiles

Using a custom data profiling tool? Or an open-source tool like ydata-profiling? Store data profiles (that are in JSON or HTML format) and check them to see what your data looks like at each step of your pipeline.

Fully Self-Hosted

The best part? Panda Patrol can be fully self-hosted; this repository contains its backend and frontend code. You can run it on your own infrastructure and have full control over your data. No need to worry about data privacy and security.

Demo

See demo here: https://www.loom.com/share/0468aef48b1843f381146399f1652b81?sid=107df0c0-3e53-4d3c-b9f2-1159d3f23bdf

Getting Started

Check out the Quickstart guide to get started.

You can also look at examples of how Panda Patrol fits into your data pipeline. For example, if you use dagster, see examples/dagster for a guide on how to get started with dagster. All guides should take no longer than 10 minutes to complete. See examples for all examples.

For documentation on how to use Panda Patrol and more details on each feature, see the wiki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panda-patrol-0.0.102.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panda_patrol-0.0.102-py3-none-any.whl (4.0 MB view details)

Uploaded Python 3

File details

Details for the file panda-patrol-0.0.102.tar.gz.

File metadata

  • Download URL: panda-patrol-0.0.102.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for panda-patrol-0.0.102.tar.gz
Algorithm Hash digest
SHA256 8a1d19ea5b2b439f10e8cac57ff3db5928b125dd6f12deb03cf84073c9bc5c72
MD5 1c614b3629b9af38ae1eda2e31890d16
BLAKE2b-256 060621ab19b2cc704f8b708aeecbd0ff8af14e58971cb12ebb049311992a2c83

See more details on using hashes here.

File details

Details for the file panda_patrol-0.0.102-py3-none-any.whl.

File metadata

  • Download URL: panda_patrol-0.0.102-py3-none-any.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for panda_patrol-0.0.102-py3-none-any.whl
Algorithm Hash digest
SHA256 7b2d9c7a09ba20737c229abc54a81e595fef74c7bfcc5e5b2231474d3b6378f4
MD5 1371f4410a3e1ac9cc8a5ba595ff5c42
BLAKE2b-256 1424aa0ec4dac51e44799b7b7ab25259b04998b3a1d5658fb35f8e826fd31278

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page