No project description provided
Project description
dq_tool
Data Quality Tool. Built on top of Great Expectations
Demo
If you want to see / show someone DQ Tool in action, use the Demo Guide
Build
DQ Tool uses poetry for dependency management and wheel building. Follow the installation notes, please.
poetry build
The wheel will end up in the dist folder.
Databricks Installation
As of now, only Databricks runtime 7.x is supported. There have been issues installing the package on 6.x. However if you need to use 6.x get in touch and we'll figure it out.
Install dq_tool from the wheel you built on a cluster or just for a notebook.
Storing Expectations
We support two approaches to storing your expectations: in a Database or in notebooks. These approaches can be combined.
Expectation Store
Expectations can be stored in an external database. This database can store expectation definitions and validation results. The validation results can be viewed using our frontend. For the infrastructure setup see our Deployment Guide
Usage - Expectation Store
Start with the following code to check that you can connect to the database. Replace the host, port, database, username and password with the credentials to your database. We highly recommend storing your password in a secure way, in dbutils secrets or Azure Key Vault.
Running this code also creates the database schema if it's not there yet.
from dq_tool import DQTool
dq_tool = DQTool(
spark=spark,
db_store_connection={
'drivername': 'postgresql',
'host': 'apostgres.postgres.database.azure.com',
'port': '5432',
'database': 'postgres',
'username': 'postgres@apostgres',
'password': dbutils.secrets.get(scope='dq_tool', key='postgres_store_password')
}
)
See the expectation store guide for details on how to use the store.
Expectations in Notebooks
Expectation definitions can also be stored in notebooks as python dicst or code.
Usage - no Store
from dq_tool import DQTool
dq_tool = DQTool(spark=spark)
See the notebook expectations guide for details on how to work with expectation definitions in notebooks.
Guides
The following guides can be used both for expectations stored in a database and in a notebook.
Expectations with Expressions
See the expressions guide
Custom Expectations
See the custom expectations guide
Profiling (beta)
See the profiling guide
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dq_tool-0.0.4.tar.gz.
File metadata
- Download URL: dq_tool-0.0.4.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.7.5 Darwin/18.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a30a73f686d6634edcf9f6ba78244a1f6d77abc0b6e81357fcafed3b37e8b7dd
|
|
| MD5 |
3611e5c3fc7555333c039c1d6c341878
|
|
| BLAKE2b-256 |
87d2f8920d25b06dc7ad3465ce4287e754f03806ba0f61a3909f76c1d9919007
|
File details
Details for the file dq_tool-0.0.4-py3-none-any.whl.
File metadata
- Download URL: dq_tool-0.0.4-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.7.5 Darwin/18.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
700909193d22f185ab50c3204c76e9fe26e8edf9f48d2677fb4a4d87cb4b7310
|
|
| MD5 |
310a7627b071784648039de19ee16c17
|
|
| BLAKE2b-256 |
36462249c4e0d8c9c14ee79945ee55aa159677d4596164e4cc6276d623699c3f
|