Skip to main content

Data quality inspection tool. Identify issues before your CTO detects them!

Project description

data_watchtower

数据监控校验工具

在你的CTO发现问题前, 发现问题

安装

pip install data-watchtower

数据加载器

加载数据到内存中,供校验器使用

校验器

校验加载器加载的数据是否符合预期

内置加载器

  • ExpectColumnValuesToNotBeNull
  • ExpectColumnRecentlyUpdated
  • ExpectColumnStdToBeBetween
  • ExpectColumnMeanToBeBetween
  • ExpectColumnNullRatioToBeBetween
  • ExpectRowCountToBeBetween
  • ExpectColumnDistinctValuesToContainSet
  • ExpectColumnDistinctValuesToEqualSet
  • ExpectColumnValuesToNotBeNull
  • ExpectColumnDistinctValuesToBeInSet

自定义加载器

。。。

通过自定义宏, 可以在监控项中引用一些自定义的变量, 比如日期, 配置文件等

生效范围

  • Watchtower的名称
  • 校验器的参数
  • 数据加载器的参数

自定义宏

。。。

支持的数据库

  • MySQL
  • Postgresql
  • SQLite
  • ...

示例

import datetime
from data_watchtower import (DbServices, Watchtower, DatabaseLoader,
                             ExpectRowCountToBeBetween, ExpectColumnValuesToNotBeNull)

dw_test_data_db_url = "sqlite:///test.db"
dw_backend_db_url = "sqlite:///data.db"

# 自定义宏模板
custom_macro_map = {
    'today': {'impl': lambda: datetime.datetime.today().strftime("%Y-%m-%d")},
    'start_date': '2024-04-01',
    'column': 'name',
}
# 设置数据加载器,用来加载需要校验的数据
query = "SELECT * FROM score where date='${today}'"
data_loader = DatabaseLoader(query=query, connection=dw_test_data_db_url)
data_loader.load()
# 创建监控项
wt = Watchtower(name='score of ${today}', data_loader=data_loader, custom_macro_map=custom_macro_map)
# 添加校验器
params = ExpectRowCountToBeBetween.Params(min_value=20, max_value=None)
wt.add_validator(ExpectRowCountToBeBetween(params))

params = ExpectColumnValuesToNotBeNull.Params(column='${column}')
wt.add_validator(ExpectColumnValuesToNotBeNull(params))

result = wt.run()
print(result['success'])

# 保存监控配置以及监控结果
db_svr = DbServices(dw_backend_db_url)
# 创建表
db_svr.create_tables()
# 保存监控配置
db_svr.add_watchtower(wt)
# 保存监控结果
db_svr.save_result(wt, result)
# 重新计算监控项的成功状态
db_svr.update_watchtower_success_status(wt)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_watchtower-0.0.4.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

data_watchtower-0.0.4-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file data_watchtower-0.0.4.tar.gz.

File metadata

  • Download URL: data_watchtower-0.0.4.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.13 Windows/10

File hashes

Hashes for data_watchtower-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4a458f8b241c08fe9f2f836a64d6ac8135b3503c9edaa2dd3bac068cb19f682a
MD5 e1e3e7206ceae75c128a39eef13fc670
BLAKE2b-256 55f11695e5fe525a4d92f43a837befa73d3b0993050bb20aa3d9af6822d5bc23

See more details on using hashes here.

File details

Details for the file data_watchtower-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: data_watchtower-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.13 Windows/10

File hashes

Hashes for data_watchtower-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4f2ddaa1e5d75aa5e28592d0a922352d5dabd323e254efa39741ee5fc5b62c85
MD5 cdde536d20bbd343213edb24aa8905e0
BLAKE2b-256 e3d6cb5d78b08ed219ebcbb11031edfb844dcb0ebb889697192461ccba71cc3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page