Skip to main content

No project description provided

Project description

dq_tool

Data Quality Tool. Built on top of Great Expectations

Demo

If you want to see / show someone DQ Tool in action, use the Demo Guide

Build

DQ Tool uses poetry for dependency management and wheel building. Follow the installation notes, please.

poetry build

The wheel will end up in the dist folder.

Databricks Installation

As of now, only Databricks runtime 7.x is supported. There have been issues installing the package on 6.x. However if you need to use 6.x get in touch and we'll figure it out.

Install dq_tool from the wheel you built on a cluster or just for a notebook.

Storing Expectations

We support two approaches to storing your expectations: in a Database or in notebooks. These approaches can be combined.

Expectation Store

Expectations can be stored in an external database. This database can store expectation definitions and validation results. The validation results can be viewed using our frontend. For the infrastructure setup see our Deployment Guide

Usage - Expectation Store

Start with the following code to check that you can connect to the database. Replace the host, port, database, username and password with the credentials to your database. We highly recommend storing your password in a secure way, in dbutils secrets or Azure Key Vault.

Running this code also creates the database schema if it's not there yet.

from dq_tool import DQTool
dq_tool = DQTool(
    spark=spark,
    db_store_connection={
        'drivername': 'postgresql',
        'host': 'apostgres.postgres.database.azure.com',
        'port': '5432',
        'database': 'postgres',
        'username': 'postgres@apostgres',
        'password': dbutils.secrets.get(scope='dq_tool', key='postgres_store_password')
    }
)

See the expectation store guide for details on how to use the store.

Expectations in Notebooks

Expectation definitions can also be stored in notebooks as python dicst or code.

Usage - no Store

from dq_tool import DQTool
dq_tool = DQTool(spark=spark)

See the notebook expectations guide for details on how to work with expectation definitions in notebooks.

Guides

The following guides can be used both for expectations stored in a database and in a notebook.

Expectations with Expressions

See the expressions guide

Custom Expectations

See the custom expectations guide

Profiling (beta)

See the profiling guide

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dq_tool-0.0.4.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dq_tool-0.0.4-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file dq_tool-0.0.4.tar.gz.

File metadata

  • Download URL: dq_tool-0.0.4.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Darwin/18.7.0

File hashes

Hashes for dq_tool-0.0.4.tar.gz
Algorithm Hash digest
SHA256 a30a73f686d6634edcf9f6ba78244a1f6d77abc0b6e81357fcafed3b37e8b7dd
MD5 3611e5c3fc7555333c039c1d6c341878
BLAKE2b-256 87d2f8920d25b06dc7ad3465ce4287e754f03806ba0f61a3909f76c1d9919007

See more details on using hashes here.

File details

Details for the file dq_tool-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: dq_tool-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Darwin/18.7.0

File hashes

Hashes for dq_tool-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 700909193d22f185ab50c3204c76e9fe26e8edf9f48d2677fb4a4d87cb4b7310
MD5 310a7627b071784648039de19ee16c17
BLAKE2b-256 36462249c4e0d8c9c14ee79945ee55aa159677d4596164e4cc6276d623699c3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page