Skip to main content

Run DataHub metadata ingestion tasks remotely via subprocess isolation with S3 log storage

Project description

Acryl Executor

Remote execution agent for running DataHub metadata ingestion tasks via subprocess isolation — with S3 log storage, plugin-based extensibility, and first-class support for the DataHub UI-triggered ingestion workflow.

Features

  • Subprocess isolation — each ingestion run executes in a dedicated subprocess and virtual environment, preventing dependency conflicts between connectors
  • UI-triggered ingestion — integrates directly with the DataHub UI to execute RUN_INGEST tasks on demand
  • Connection testing — validate source/destination connectivity before running full ingestion via TEST_CONNECTION tasks
  • S3 log storage — automatically compress and upload execution logs and artifacts to S3; optionally clean up local files after a successful upload
  • Plugin architecture — register custom task implementations via Python entry points
  • AWS Secrets Manager & GCP Secret Manager — built-in secret store plugins for retrieving credentials from AWS SM or GCP SM

Installation

pip install acryl-executor

Quick Start

python3 -m venv venv
source venv/bin/activate
pip install acryl-executor

Task Types

Task Description
RUN_INGEST (SubProcessIngestionTask) Runs metadata ingestion in a subprocess; supports per-run DataHub versions and connector plugins
TEST_CONNECTION Validates connectivity to a data source before ingestion

Cloud Logging (S3)

Set these environment variables to enable S3 log uploads:

Variable Description
DATAHUB_CLOUD_LOG_BUCKET S3 bucket to write logs to
DATAHUB_CLOUD_LOG_PATH S3 path prefix for logs
DATAHUB_CLOUD_LOG_CLEANUP Set true to remove local files after a successful upload (default: false)

Logs are tar-gzipped and stored at:

s3://<BUCKET>/<PATH>/<pipeline_id>/year=<Y>/month=<M>/day=<D>/<run_id>/

When cleanup is enabled, a .s3 sentinel file replaces each uploaded file, recording the S3 URI, upload timestamp, and original file size.

Plugin Registration

Custom tasks register via the datahub.executor.task.plugins entry point:

entry_points = {
    "datahub.executor.task.plugins": [
        "my_task = my_package.tasks:MyTask"
    ]
}

Links

License

Apache License 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acryl_executor-0.3.17.tar.gz (54.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acryl_executor-0.3.17-py3-none-any.whl (76.4 kB view details)

Uploaded Python 3

File details

Details for the file acryl_executor-0.3.17.tar.gz.

File metadata

  • Download URL: acryl_executor-0.3.17.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for acryl_executor-0.3.17.tar.gz
Algorithm Hash digest
SHA256 042c73d398001c5d419206d6795e364baa46dfe84c767027b822c4464abfa1be
MD5 6705bbc67f6ba024762c099928a0f396
BLAKE2b-256 0f849724459acfa8f7d62c1cc40bd07cf937fbc702c576a4c248ee6434007978

See more details on using hashes here.

File details

Details for the file acryl_executor-0.3.17-py3-none-any.whl.

File metadata

File hashes

Hashes for acryl_executor-0.3.17-py3-none-any.whl
Algorithm Hash digest
SHA256 c9351d8ebe1c0bad633f254b16b885d8bc8c49357e06d4c6eaf1d9a8c9c10f2a
MD5 c99c2c565f02d0a23bae18757c151056
BLAKE2b-256 521573fcc217d9e07fb7737405695cfabf5797d7d3ef7e1d8bdb76d89e1cdb0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page