Skip to main content

Enterprise-grade data quality framework with YAML configuration, LLM-friendly design, and advanced statistical validation

Project description

Weiser

Data Quality Framework

Introduction

Weiser is a data quality framework designed to help you ensure the integrity and accuracy of your data. It provides a set of tools and checks to validate your data and detect anomalies. It also includes a dashboard to visualize the results of the checks.

Installation

To install Weiser, use the following command:

pip install weiser-ai

Usage

Run example checks

Connections are defined at the datasources section in the config file see: examples/example.yaml.

Run checks in verbose mode:

weiser run examples/example.yaml -v

Watch the CLI Demo

Compile checks only in verbose mode:

weiser compile examples/example.yaml -v

Run dashboard

cd weiser-ui
pip install -r requirements.txt
streamlit run app.py

Watch the Dashboard Demo

Configuration

Simple count check defintion

- name: test row_count
  dataset: orders
  type: row_count
  condition: gt
  threshold: 0

Custom sql definition

- name: test numeric
  dataset: orders
  type: numeric
  measure: sum(budgeted_amount::numeric::float)
  condition: gt
  threshold: 0

Target multiple datasets with the same check definition

- name: test row_count
  dataset: [orders, vendors]
  type: row_count
  condition: gt
  threshold: 0

Check individual group by values in a check

- name: test row_count groupby
  dataset: vendors
  type: row_count
  dimensions:
    - tenant_id
  condition: gt
  threshold: 0

Time aggregation check with granularity

- name: test numeric gt sum yearly
  dataset: orders
  type: sum
  measure: budgeted_amount::numeric::float
  condition: gt
  threshold: 0
  time_dimension:
    name: _updated_at
    granularity: year

Custom SQL expression for dataset and filter usage

- name: test numeric completed
  dataset: >
    SELECT * FROM orders o LEFT JOIN orders_status os ON o.order_id = os.order_id
  type: numeric
  measure: sum(budgeted_amount::numeric::float)
  condition: gt
  threshold: 0
  filter: status = 'FULFILLED'

Missing values check

- name: customer data quality
  dataset: orders
  type: not_empty
  dimensions: ["customer_id", "product_id", "order_date"]
  condition: le
  # Allow up to 5 NULL values per dimension
  threshold: 5
  filter: "status = 'active'"

Anomaly detection check

- name: test anomaly
  # anomaly test should always target metrics metadata dataset
  dataset: metrics
  type: anomaly
  # References Orders row count.
  check_id: c5cee10898e30edd1c0dde3f24966b4c47890fcf247e5b630c2c156f7ac7ba22
  condition: between
  # long tails of normal distribution for Z-score.
  threshold: [-3.5, 3.5]

Contributing

We welcome contributions!

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weiser_ai-0.1.14.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weiser_ai-0.1.14-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file weiser_ai-0.1.14.tar.gz.

File metadata

  • Download URL: weiser_ai-0.1.14.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for weiser_ai-0.1.14.tar.gz
Algorithm Hash digest
SHA256 a7b3ac3ee11e3a9de4aa6df5490c436c9a922c16f2b8c89904aaac2460b54d05
MD5 739445a0520f3b815e93c4fc49c115bc
BLAKE2b-256 022d47525c247be2f8efc2c55fdf325f4ed38ccecff43c7f8025ff9935cf8b1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for weiser_ai-0.1.14.tar.gz:

Publisher: publish.yaml on weiser-ai/weiser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file weiser_ai-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: weiser_ai-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for weiser_ai-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 b08465bd1393a5e384a8f077db6477bb16e2a1f2174302f0a7211260268bd7ed
MD5 d0519866a29b40432366f752994b9134
BLAKE2b-256 51a993c28e3a3ec6716890f925485b92c20a70976bfce8e7ec7276cf4ce5bfca

See more details on using hashes here.

Provenance

The following attestation bundles were made for weiser_ai-0.1.14-py3-none-any.whl:

Publisher: publish.yaml on weiser-ai/weiser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page