Open Source Data Quality Monitoring
Project description
Datachecks
Open Source Data Quality Monitoring.
What is datachecks
?
Datachecks is a opensource data quality monitoring tool. It helps to monitor the data quality of the data pipelines. It helps to identify the data quality issues in the databases and data pipelines.
Getting Started
Install datachecks
Install datachecks
with the command that is specific to the database.
Postgres
pip install datachecks 'datachecks[Postgres]' -U
OpenSearch
pip install datachecks 'datachecks[OpenSearch]' -U
Running Datachecks
Datachecks can be run using the command line interface. The command line interface takes the config file as input. The config file contains the data sources and the metrics to be monitored.
datachecks inspect -C config.yaml
Example Config
Data Source Configuration
Declare the data sources in the data_sources
section of the config file.
The data sources can be of type postgres
or opensearch
.
data_sources:
- name: search
type: opensearch
connection:
host: 127.0.0.1
port: 9201
username: admin
password: admin
- name: content
type: postgres
connection:
host: 127.0.0.1
port: 5431
username: postgres
password: changeme
database: postgres
Metric Configuration
Metrics are defined in the metrics
section of the config file.
metrics:
content:
count_content_hat:
metric_type: row_count
table: table_1
filter:
sql_query: "category = 'HAT' AND is_valid is True"
count_content_non_valid:
metric_type: row_count
table: table_1
filter:
sql_query: "is_valid is False"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datachecks-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ca635dd69ca22b06eb3b8a9caa469e73eadef9d70e65d5e773b929d9f58d42 |
|
MD5 | a7fe23db093a0e9aab25e8a3f86f9114 |
|
BLAKE2b-256 | 30530d9b4fb85966b3780d6bd1b7aa16412d291ad28567ebc6215bb3d51349f7 |