Run DataHub metadata ingestion tasks remotely via subprocess isolation with S3 log storage
Project description
Acryl Executor
Remote execution agent for running DataHub metadata ingestion tasks via subprocess isolation — with S3 log storage, plugin-based extensibility, and first-class support for the DataHub UI-triggered ingestion workflow.
Features
- Subprocess isolation — each ingestion run executes in a dedicated subprocess and virtual environment, preventing dependency conflicts between connectors
- UI-triggered ingestion — integrates directly with the DataHub UI to execute
RUN_INGESTtasks on demand - Connection testing — validate source/destination connectivity before running full ingestion via
TEST_CONNECTIONtasks - S3 log storage — automatically compress and upload execution logs and artifacts to S3; optionally clean up local files after a successful upload
- Plugin architecture — register custom task implementations via Python entry points
- AWS Secrets Manager & GCP Secret Manager — built-in secret store plugins for retrieving credentials from AWS SM or GCP SM
Installation
pip install acryl-executor
Quick Start
python3 -m venv venv
source venv/bin/activate
pip install acryl-executor
Task Types
| Task | Description |
|---|---|
RUN_INGEST (SubProcessIngestionTask) |
Runs metadata ingestion in a subprocess; supports per-run DataHub versions and connector plugins |
TEST_CONNECTION |
Validates connectivity to a data source before ingestion |
Cloud Logging (S3)
Set these environment variables to enable S3 log uploads:
| Variable | Description |
|---|---|
DATAHUB_CLOUD_LOG_BUCKET |
S3 bucket to write logs to |
DATAHUB_CLOUD_LOG_PATH |
S3 path prefix for logs |
DATAHUB_CLOUD_LOG_CLEANUP |
Set true to remove local files after a successful upload (default: false) |
Logs are tar-gzipped and stored at:
s3://<BUCKET>/<PATH>/<pipeline_id>/year=<Y>/month=<M>/day=<D>/<run_id>/
When cleanup is enabled, a .s3 sentinel file replaces each uploaded file, recording the S3 URI, upload timestamp, and original file size.
Plugin Registration
Custom tasks register via the datahub.executor.task.plugins entry point:
entry_points = {
"datahub.executor.task.plugins": [
"my_task = my_package.tasks:MyTask"
]
}
Links
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file acryl_executor-0.3.17.tar.gz.
File metadata
- Download URL: acryl_executor-0.3.17.tar.gz
- Upload date:
- Size: 54.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
042c73d398001c5d419206d6795e364baa46dfe84c767027b822c4464abfa1be
|
|
| MD5 |
6705bbc67f6ba024762c099928a0f396
|
|
| BLAKE2b-256 |
0f849724459acfa8f7d62c1cc40bd07cf937fbc702c576a4c248ee6434007978
|
File details
Details for the file acryl_executor-0.3.17-py3-none-any.whl.
File metadata
- Download URL: acryl_executor-0.3.17-py3-none-any.whl
- Upload date:
- Size: 76.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9351d8ebe1c0bad633f254b16b885d8bc8c49357e06d4c6eaf1d9a8c9c10f2a
|
|
| MD5 |
c99c2c565f02d0a23bae18757c151056
|
|
| BLAKE2b-256 |
521573fcc217d9e07fb7737405695cfabf5797d7d3ef7e1d8bdb76d89e1cdb0b
|