Skip to main content

Open Source Data Quality Monitoring

Project description

Logo

Open Source Data Quality Monitoring.

License Versions coverage coverage Status

⭐️ If you like it, star the repo

| Documentations | Slack Community |

Why Data Monitoring?

APM (Application Performance Monitoring) tools are used to monitor the performance of applications. APM tools are mandatory part of dev stack. Without AMP tools, it is very difficult to monitor the performance of applications.

why_data_observability

But for Data products regular APM tools are not enough. We need a new kind of tools that can monitor the performance of Data applications. Data monitoring tools are used to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.

What is datachecks?

Datachecks is an open-source data monitoring tool that helps to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.

Datachecks can generate several reliability, uniqueness, completeness metrics from several data sources

Reports: Data Quality Visualisation

You can generate with just one command. It generates a beautiful data quality report with all the metrics. This html report can be shared with the team.

why_data_observability

CLI: Data Quality Visualisation in Bash

Data quality report can be generated in the terminal. It is very useful for debugging. All it takes is one command.

why_data_observability

Getting Started

Install datachecks with the command that is specific to the database.

Install Datachecks

To install all datachecks dependencies, use the below command.

pip install dcs-core -U

Create the config file

With a simple config file, you can generate data quality reports for your data sources. Below is the sample config example. For more details, please visit the config guide

Run from CLI

Generate Report in Terminal

dcs-core inspect -C config.yaml

Generate HTML Report

dcs-core inspect -C config.yaml  --html-report

Please visit the Quick Start Guide

Supported Data Sources

Datachecks supports sql and search data sources. Below are the list of supported data sources.

Data Source Type Supported
Postgres Transactional Database :thumbsup:
MySql Transactional Database :thumbsup:
MS SQL Server Transactional Database :thumbsup:
Oracle Transactional Database :thumbsup:
DB2 Transactional Database :thumbsup:
SAP Sybase Transactional Database :thumbsup:
OpenSearch Search Engine :thumbsup:
Elasticsearch Search Engine :thumbsup:
GCP BigQuery Data Warehouse :thumbsup:
DataBricks Data Warehouse :thumbsup:
Snowflake Data Warehouse :thumbsup:
AWS RedShift Data Warehouse :thumbsup:

Metric Types

Validation Funtions Description
Reliability Reliability functions detect whether tables/indices/collections are updating with timely data
Numeric Distribution Numeric Distribution functions detect changes in the numeric distributions i.e. of values, variance, skew and more
Uniqueness Uniqueness functions detect when data constraints are breached like duplicates, number of distinct values etc
Completeness Completeness functions detect when there are missing values in datasets i.e. Null, empty value
Validity Validity functions detect whether data is formatted correctly and represents a valid value

Overview

datacheck_architecture

What Datacheck does not do?

Community & Support

For additional information and help, you can use one of these channels:

  • Slack (Live chat with the team, support, discussions, etc.)
  • GitHub issues (Bug reports, feature requests)

Contributions

:raised_hands: We greatly appreciate contributions - be it a bug fix, new feature, or documentation!

Check out the contributions guide and open issues.

Datachecks contributors: :blue_heart:

Telemetry

Usage Analytics & Data Privacy

License

This project is licensed under the terms of the APACHE 2 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcs_core-0.9.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcs_core-0.9.2-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file dcs_core-0.9.2.tar.gz.

File metadata

  • Download URL: dcs_core-0.9.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dcs_core-0.9.2.tar.gz
Algorithm Hash digest
SHA256 e7b73c65ac3c06a2b3b44aafc32dfc218a31d31d556b0e075b9e83f9a568db5a
MD5 7205a79777d8a71ee6e073391a9844a6
BLAKE2b-256 6bb4c78308048966834d78e5278a69e7867f0ee6561b5beb06f61d6145f29bf4

See more details on using hashes here.

File details

Details for the file dcs_core-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: dcs_core-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dcs_core-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6dd089c1affb9a5c83accf08858d9c12fd2e6fd8b30663e52f8ea8f9ac4d612a
MD5 11e5fd5731323eea3af6d90116ab3ed6
BLAKE2b-256 164fe795e87f62121387d2be4a221ee44dd348a0c6bfbd19b710051b70616c62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page