Versatile Data Kit Heartbeat and Health Test
Project description
Versatile Data Kit Heartbeat tool
Heartbeat tool for verifying deployed SDK and Control Service are functional and working correctly.
It checks that a job can be created, deployed, run and deleted.
- Health monitoring - one can schedule this run every few minutes and get heartbeat of their installation
- Regression testing after change in Versatile Data Kit, as clients would be able to customize vdk, they should have a way to check for regressions.
- Installation/Upgrade of VDK Control Service acceptance test.
What it does ?
It simulates Data Engineer workflow:
- Creates a data job, downloads keytab
- Deploys the data job with pre-defined scripts to run on a scheduled basis (every minute)
- Different data jobs and run tests can be run depending on the configuration.
- This way it can be run in different modes. See config.py and the DATAJOB_DIRECTORY_* and JOB_RUN_TEST_* configuration options.
- Undeploys and deletes the data job.
Prerequisites
See heartbeat_config_example.ini and complete the TODOs inside
Installation
# TODO: Change to public PYPI index
pip install -i https://test.pypi.org/simple/ vdk-heartbeat
Configuration
See config.py for details on what can be configured.
Running
You can run the test locally, part of your CICD or schedule it to run periodically.
The test is passed or fail test.
If it fails it returns non-zero error code and prints the error.
It also produces a tests.xml file in junit xml format.
- Specify configuration in environment variables or in a file (use the file for things that can be in source control)
- Example:
export DATABASE_PASS=xxx
vdk-heartbeat -f heartbeat_config.ini
Extensibility
Users can replace the data job being deployed and executed and the run test which is used to verify the job run/execution.
See config.py DATAJOB_DIRECTORY_* and JOB_RUN_TEST_* configuration options.
Build, test, and release
See or run cicd/build.sh
to build and test the project locally.
Release
Releases are made to PyPI.
Versioning follows https://semver.org.
- A release step in Gitlab CI is automatically triggered after merging changes if build/tests are successful.
- Update major or minor version when necessary only.
Tests
Database ingestion
The testing job ingests data into a database and reads it from that database to verify the results.
Configuration
Target
Target identifies where the data should be ingested into.
The value for this parameter depends on the ingest method chosen.
- For "http" method, it would require an HTTP URL. Example: http://example.com///
- For "file" method, it would require a file name or path.
export VDK_HEARTBEAT_INGEST_TARGET="datasource"
Method
Indicates the ingestion method to be used. Example:
- method="file" -> ingest to file
- method="http" -> ingest using HTTP POST requests
- method="kafka" -> ingest to kafka endpoint
export VDK_HEARTBEAT_INGEST_METHOD="http"
Destination table
The name of the table, where the data should be ingested into. This parameter does not need to be passed, in case the table is included in the payload itself.
export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="destination_table"
Database type
export DB_DEFAULT_TYPE="trino"
Database name
export DATABASE_TEST_DB="memory.default"
Scenarios
Trino ingestion
VDK_HEARTBEAT_INGEST_METHOD is set to "TRINO" and DB_DEFAULT_TYPE is set to "TRINO" and the connection settings for both is the same (same Trino database instance).
export VDK_HEARTBEAT_INGEST_TARGET="trino-http-datasource"
export VDK_HEARTBEAT_INGEST_METHOD="TRINO"
export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="sample_destination_table"
export DB_DEFAULT_TYPE="trino"
export DATABASE_TEST_DB="memory.default"
Trino HTTP ingestion
VDK_HEARTBEAT_INGEST_METHOD is set to "HTTP" and DB_DEFAULT_TYPE is set to "TRINO" and connection settings are set to same Trino instance.
export VDK_HEARTBEAT_INGEST_TARGET="trino-http-datasource"
export VDK_HEARTBEAT_INGEST_METHOD="http"
export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="sample_destination_table"
export DB_DEFAULT_TYPE="trino"
export DATABASE_TEST_DB="memory.default"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file vdk-heartbeat-0.4.412526815.tar.gz
.
File metadata
- Download URL: vdk-heartbeat-0.4.412526815.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f777cb44c9ab0e0f060b99d0caca20b5b901dd16968732e0cf302fd818d6b1f6 |
|
MD5 | 344d548817ad6740530c1a1f832d26e0 |
|
BLAKE2b-256 | 6ee4f3a6b075bc2dc8b370fbf8b69880f331d6ee8b60ca66691b1de86f4b1bc1 |