An action framework to work with DataHub real time changes.
Project description
⚡ DataHub Actions Framework
Welcome to DataHub Actions! The Actions framework makes responding to changes to your Metadata Graph in realtime easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture.
For a detailed introduction, check out the original announcement of the DataHub Actions Framework at the DataHub April 2022 Town Hall. For a more in-depth look at use cases and concepts, check out DataHub Actions Concepts.
Quickstart
To get started right away, check out the DataHub Actions Quickstart Guide.
Installation
Prerequisites
The DataHub Actions CLI commands are an extension of the base datahub
CLI commands. We recommend
first installing the datahub
CLI:
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
datahub --version
Next, simply install the acryl-datahub-actions
package from PyPi:
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub-actions
datahub actions --version
Configuring an Action
Actions are configured using a YAML file, much in the same way DataHub ingestion sources are. An action configuration file consists of the following
- Action Pipeline Name (Should be unique and static)
- Source Configurations
- Transform + Filter Configurations
- Action Configuration
With each component being independently pluggable and configurable.
# 1. Required: Action Pipeline Name
name: <action-pipeline-name>
# 2. Required: Event Source - Where to source event from.
source:
type: <source-type>
config:
# Event Source specific configs (map)
# 3a. Optional: Filter to run on events (map)
filter:
event_type: <filtered-event-type>
event:
# Filter event fields by exact-match
<filtered-event-fields>
# 3b. Optional: Custom Transformers to run on events (array)
transform:
- type: <transformer-type>
config:
# Transformer-specific configs (map)
# 4. Required: Action - What action to take on events.
action:
type: <action-type>
config:
# Action-specific configs (map)
Example: Hello World
An simple configuration file for a "Hello World" action, which simply prints all events it receives, is
# 1. Action Pipeline Name
name: "hello_world"
# 2. Event Source: Where to source event from.
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
# 3. Action: What action to take on events.
action:
type: "hello_world"
We can modify this configuration further to filter for specific events, by adding a "filter" block.
# 1. Action Pipeline Name
name: "hello_world"
# 2. Event Source - Where to source event from.
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
# 3. Filter - Filter events that reach the Action
filter:
event_type: "EntityChangeEvent_v1"
event:
category: "TAG"
operation: "ADD"
modifier: "urn:li:tag:pii"
# 4. Action - What action to take on events.
action:
type: "hello_world"
Running an Action
To run a new action, just use the datahub-actions CLI to start an actions listener.
datahub actions -c <config.yml>
If successful, you'll see a message like the following in the CLI output:
Actions Pipeline with name '<name>' is now running.
Running multiple Actions
You can run multiple actions pipeline within the same command. Simply provide multiple config files by restating the "-c" command line argument.
For example,
datahub actions -c <config-1.yaml> -c <config-2.yaml>
Running in debug mode
Simply append the --debug
flag to the CLI to run your action in debug mode.
datahub actions -c <config.yaml> --debug
Supported Events
Two event types are currently supported. Read more about them below.
Supported Event Sources
Currently, the only event source that is officially supported is kafka
, which polls for events
via a Kafka Consumer.
Supported Actions
By default, DataHub supports a set of standard actions plugins. These can be found inside the folder
src/datahub_actions/plugins
.
Some pre-included Actions include
Development
Build and Test
Notice that we support all actions command using a separate datahub-actions
CLI entry point. Feel free
to use this during development.
# Build datahub-actions module
./gradlew datahub-actions:build
# Drop into virtual env
cd datahub-actions && source venv/bin/activate
# Start hello world action
datahub-actions actions -c ../examples/hello_world.yaml
# Start ingestion executor action
datahub-actions actions -c ../examples/executor.yaml
# Start multiple actions
datahub-actions actions -c ../examples/executor.yaml -c ../examples/hello_world.yaml
Developing a Transformer
To develop a new Transformer, check out the Developing a Transformer guide.
Developing an Action
To develop a new Action, check out the Developing an Action guide.
Contributing
Contributing guidelines follow those of the main DataHub project. We are accepting contributions for Actions, Transformers, and general framework improvements (tests, error handling, etc).
Resources
Check out the original announcement of the DataHub Actions Framework at the DataHub April 2022 Town Hall.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for acryl-datahub-actions-0.0.1rc6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fe3ad3c86b436689b840920418b467582ef838d75de2ae8eb786f11b2162037 |
|
MD5 | 15386d15f90c0fa5f68b2cc7dfdaa465 |
|
BLAKE2b-256 | 3930f6758d859dffdb475ee7e552e4edb83c886ec2887dfdb95a9126e701cf6b |
Hashes for acryl_datahub_actions-0.0.1rc6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1161aa725a59d482a20b497279873e4c299a65de1ea597f8899feae3e0113e9e |
|
MD5 | 8781a4ab1b5ccc06829f4abc71ddf49e |
|
BLAKE2b-256 | 8d68ce7b111a99b49d3f48d3f93627487addeb7ce82106fe7a89e8a677c30f7c |