Skip to main content

DataKitchen Inc. Data Quality Engine

Project description

DataOps Data Quality TestGen

apache 2.0 license Badge PRs Badge Latest Version Docker Pulls Documentation Static Badge

DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.

Features

What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and find data issues in new data.

DatKitchen Open Source Data Quality TestGen Features - New Data

It constantly watches your data for data quality anomalies and lets you drill into problems.

DataKitchen Open Source Data Quality TestGen Features - Data Ingestion and Quality Testing

A single place to manage Data Quality across data sets, locations, and teams.

DataKitchen Open Source Data Quality TestGen Features - Single Place

Installation with dk-installer

The dk-installer program installs DataOps Data Quality TestGen.

Install the prerequisite software

Software Tested Versions Command to check version
Python
- Most Linux and macOS systems have Python pre-installed.
- On Windows machines, you will need to download and install it.
3.9, 3.10, 3.11, 3.12 python3 --version
Docker
Docker Compose
25.0.3, 26.1.1,
2.24.6, 2.27.0, 2.28.1
docker -v
docker compose version

Download the installer

On Unix-based operating systems, use the following command to download it to the current directory. We recommend creating a new, empty directory.

curl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py'
  • Alternatively, you can manually download the dk-installer.py file from the data-observability-installer repository.
  • All commands listed below should be run from the folder containing this file.
  • For usage help and command options, run python3 dk-installer.py --help or python3 dk-installer.py <command> --help.

Install the TestGen application

The installation downloads the latest Docker images for TestGen and deploys a new Docker Compose application. The process may take 5~10 minutes depending on your machine and network connection.

python3 dk-installer.py tg install

The --port option may be used to set a custom localhost port for the application (default: 8501).

To enable SSL for HTTPS support, use the --ssl-cert-file and --ssl-key-file options to specify local file paths to your SSL certificate and key files.

Once the installation completes, verify that you can login to the UI with the URL and credentials provided in the output.

Optional: Run the TestGen demo setup

The Data Observability quickstart walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.

python3 dk-installer.py tg run-demo

In the TestGen UI, you will see that new data profiling and test results have been generated.

Installation with pip

Install the prerequisite software

Software Tested Versions Command to check version
Python
- Most Linux and macOS systems have Python pre-installed.
- On Windows machines, you will need to download and install it.
3.10, 3.11, 3.12 python3 --version

In python's terminal

pip install data-ops-testgen

Set the following environment variables:

TESTGEN_USERNAME=
TESTGEN_PASSWORD=
TG_DECRYPT_SALT=
TG_DECRYPT_PASSWORD=

Optional: Run the TestGen demo setup

Create a DataBase: With an observability api key:

testgen quick-start --delete-target-db --observability-api-key <OBSERVABILITY_API_KEY> --observability-api-url <OBSERVABILITY_API_URL>

Without an observability api key:

testgen quick-start --delete-target-db

Run profile:

testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d

Run test generation:

testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d

Run test execution:

testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1

Simulate a fast forward in the target data:

testgen quick-start --simulate-fast-forward

Modify Streamlit's internals with custom static files for the first time:

testgen ui patch-streamlit -f

To run TestGen UI using pip

Run TestGen UI

testgen ui run

Product Documentation

DataOps Data Quality TestGen

Useful Commands

The dk-installer and docker compose CLI can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the dk-installer.py and docker-compose.yml files used by the installation.

Remove demo data

After completing the quickstart, you can remove the demo data from the application with the following command.

python3 dk-installer.py tg delete-demo

Upgrade to latest version

New releases of TestGen are announced on the #releases channel on Data Observability Slack, and release notes can be found on the DataKitchen documentation portal. Use the following command to upgrade to the latest released version.

python3 dk-installer.py tg upgrade

Uninstall the application

The following command uninstalls the Docker Compose application and removes all data, containers, and images related to TestGen from your machine.

python3 dk-installer.py tg delete

Access the testgen CLI

The testgen command line can be accessed within the running container.

docker compose exec engine bash

Use exit to return to the regular terminal.

Stop the application

docker compose down

Restart the application

docker compose up -d

What Next?

Getting started guide

We recommend you start by going through the Data Observability Overview Demo.

Support

For support requests, join the Data Observability Slack 👋 and post on the #support channel.

Connect to your database

Follow these instructions to improve the quality of data in your database.

Community

Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.

Join our community here:

Contributing

For details on contributing or running the project for development, check out our contributing guide.

License

DataKitchen's DataOps Data Quality TestGen is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataops-testgen-2.2.0.tar.gz (4.3 MB view details)

Uploaded Source

Built Distribution

dataops_testgen-2.2.0-py3-none-any.whl (4.4 MB view details)

Uploaded Python 3

File details

Details for the file dataops-testgen-2.2.0.tar.gz.

File metadata

  • Download URL: dataops-testgen-2.2.0.tar.gz
  • Upload date:
  • Size: 4.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for dataops-testgen-2.2.0.tar.gz
Algorithm Hash digest
SHA256 1aa90d936a4595926140cc3a7f1286170880f20b1368ca06ba04e92be9ba1851
MD5 e2b647e74ac2c8dd566d56df98d19d0c
BLAKE2b-256 e8ea7ccd6e3ea9230b696f78d53bb7387900148d7222771f6e67adebec174ab7

See more details on using hashes here.

File details

Details for the file dataops_testgen-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dataops_testgen-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1351e766893ff44333cebc0bace22c714a7a7cc54c1a84efc18d10fc0de9ce4
MD5 e5c603bc521b86605b057efb69de9bf8
BLAKE2b-256 a0e76b96e23036cc15b4d00ba1673eb5578e2ed7d7f5bc96e3e7e66f54f10e38

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page