Skip to main content

Versatile Data Kit Heartbeat and Health Test

Project description

Versatile Data Kit Heartbeat tool

monthly download count for vdk-heartbeat

Heartbeat tool for verifying deployed SDK and Control Service are functional and working correctly.
It checks that a job can be created, deployed, run and deleted.

  • Health monitoring - one can schedule this run every few minutes and get heartbeat of their installation
  • Regression testing after change in Versatile Data Kit, as clients would be able to customize vdk, they should have a way to check for regressions.
  • Installation/Upgrade of VDK Control Service acceptance test.

What it does ?

It simulates Data Engineer workflow:

  • Creates a data job, downloads keytab
  • Deploys the data job with pre-defined scripts to run on a scheduled basis (every minute)
  • Different data jobs and run tests can be run depending on the configuration.
    • This way it can be run in different modes. See config.py and the DATAJOB_DIRECTORY_* and JOB_RUN_TEST_* configuration options.
  • Undeploys and deletes the data job.

Prerequisites

See heartbeat_config_example.ini and complete the TODOs inside

Installation

# TODO: Change to public PYPI index
pip install -i https://test.pypi.org/simple/ vdk-heartbeat

Configuration

See config.py for details on what can be configured.

Running

You can run the test locally, part of your CICD or schedule it to run periodically.

The test is passed or fail test.
If it fails it returns non-zero error code and prints the error.
It also produces a tests.xml file in junit xml format.

  • Specify configuration in environment variables or in a file (use the file for things that can be in source control)
  • Example:
export DATABASE_PASS=xxx
vdk-heartbeat -f heartbeat_config.ini

Extensibility

Users can replace the data job being deployed and executed and the run test which is used to verify the job run/execution.

See config.py DATAJOB_DIRECTORY_* and JOB_RUN_TEST_* configuration options.

Build, test, and release

See or run cicd/build.sh to build and test the project locally.

Release

Releases are made to PyPI.
Versioning follows https://semver.org.

  • A release step in Gitlab CI is automatically triggered after merging changes if build/tests are successful.
  • To trigger a control-service integration tests image rebuild, commit any control_service_change_locations-defined change in CONTRIBUTING.md except version.txt- until automated
  • Update major or minor version when necessary only.

Tests

Database ingestion

The testing job ingests data into a database and reads it from that database to verify the results.

Configuration

Target

Target identifies where the data should be ingested into.

The value for this parameter depends on the ingest method chosen.

  • For "http" method, it would require an HTTP URL. Example: http://example.com///
  • For "file" method, it would require a file name or path.
export VDK_HEARTBEAT_INGEST_TARGET="datasource"
Method

Indicates the ingestion method to be used. Example:

  • method="file" -> ingest to file
  • method="http" -> ingest using HTTP POST requests
  • method="kafka" -> ingest to kafka endpoint
export VDK_HEARTBEAT_INGEST_METHOD="http"
Destination table

The name of the table, where the data should be ingested into. This parameter does not need to be passed, in case the table is included in the payload itself.

export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="destination_table"
Database type
export DB_DEFAULT_TYPE="trino"
Database name
export DATABASE_TEST_DB="memory.default"

Scenarios

Trino ingestion

VDK_HEARTBEAT_INGEST_METHOD is set to "TRINO" and DB_DEFAULT_TYPE is set to "TRINO" and the connection settings for both is the same (same Trino database instance).

export VDK_HEARTBEAT_INGEST_TARGET="trino-http-datasource"
export VDK_HEARTBEAT_INGEST_METHOD="TRINO"
export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="sample_destination_table"
export DB_DEFAULT_TYPE="trino"
export DATABASE_TEST_DB="memory.default"
Trino HTTP ingestion

VDK_HEARTBEAT_INGEST_METHOD is set to "HTTP" and DB_DEFAULT_TYPE is set to "TRINO" and connection settings are set to same Trino instance.

export VDK_HEARTBEAT_INGEST_TARGET="trino-http-datasource"
export VDK_HEARTBEAT_INGEST_METHOD="http"
export VDK_HEARTBEAT_INGEST_DESTINATION_TABLE="sample_destination_table"
export DB_DEFAULT_TYPE="trino"
export DATABASE_TEST_DB="memory.default"

Ping the frontend

No additional configuration is needed for this test.

Point heartbeat to the correct module and class

JOB_RUN_TEST_MODULE_NAME=vdk.internal.heartbeat.ping_frontend_test
JOB_RUN_TEST_CLASS_NAME=PingFrontendTest

The CONTROL_API_URL environment variable doubles as the frontend url, since both th control service and the frontend are deployed to the same host by the helm chart.

CONTROL_API_URL=http://cicd-control-service-svc.cicd.svc.cluster.local:8092

The test sends a GET request to the URL and expects a success response

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk-heartbeat-0.6.1184833162.tar.gz (26.8 kB view details)

Uploaded Source

File details

Details for the file vdk-heartbeat-0.6.1184833162.tar.gz.

File metadata

File hashes

Hashes for vdk-heartbeat-0.6.1184833162.tar.gz
Algorithm Hash digest
SHA256 d5fabd591a9b42b100becf6fc9324ff0d6b0cbc271a6054bf315874f0c4ecb0e
MD5 9ace1fb04a424c4fd5b5cf6eb08be7a0
BLAKE2b-256 8abd9c02b47cd791fceacf5ae8256c5b0f0871b6c4be6f82f7e175d5f8964dfb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page