Skip to main content

No project description provided

Project description

dq_tool

Data Quality Tool. Built on top of Great Expectations

Demo

If you want to see / show someone DQ Tool in action, use the Demo Guide

Build

DQ Tool uses poetry for dependency management and wheel building. Follow the installation notes, please.

poetry build

The wheel will end up in the dist folder.

Databricks Installation

As of now, only Databricks runtime 7.x is supported. There have been issues installing the package on 6.x. However if you need to use 6.x get in touch and we'll figure it out.

Install dq_tool from the wheel you built on a cluster or just for a notebook.

Storing Expectations

We support two approaches to storing your expectations: in a Database or in notebooks. These approaches can be combined.

Expectation Store

Expectations can be stored in an external database. This database can store expectation definitions and validation results. The validation results can be viewed using our frontend. For the infrastructure setup see our Deployment Guide

Usage - Expectation Store

Start with the following code to check that you can connect to the database. Replace the host, port, database, username and password with the credentials to your database. We highly recommend storing your password in a secure way, in dbutils secrets or Azure Key Vault.

Running this code also creates the database schema if it's not there yet.

from dq_tool import DQTool
dq_tool = DQTool(
    spark=spark,
    db_store_connection={
        'drivername': 'postgresql',
        'host': 'apostgres.postgres.database.azure.com',
        'port': '5432',
        'database': 'postgres',
        'username': 'postgres@apostgres',
        'password': dbutils.secrets.get(scope='dq_tool', key='postgres_store_password')
    }
)

See the expectation store guide for details on how to use the store.

Expectations in Notebooks

Expectation definitions can also be stored in notebooks as python dicst or code.

Usage - no Store

from dq_tool import DQTool
dq_tool = DQTool(spark=spark)

See the notebook expectations guide for details on how to work with expectation definitions in notebooks.

Guides

The following guides can be used both for expectations stored in a database and in a notebook.

Expectations with Expressions

See the expressions guide

Custom Expectations

See the custom expectations guide

Profiling (beta)

See the profiling guide

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq_tool-0.0.4.tar.gz (23.7 kB view hashes)

Uploaded Source

Built Distribution

dq_tool-0.0.4-py3-none-any.whl (29.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page