Airflow provider for Versatile Data Kit.
Project description
Versatile Data Kit Airflow provider
A set of Airflow operators, sensors and a connection hook intended to help schedule Versatile Data Kit jobs using Apache Airflow.
Usage
To install it simply run:
pip install airflow-provider-vdk
Then you can create a workflow of data jobs (deployed by VDK Control Service) like this:
from datetime import datetime
from airflow import DAG
from vdk_provider.operators.vdk import VDKOperator
with DAG(
"airflow_example_vdk",
schedule_interval=None,
start_date=datetime(2022, 1, 1),
catchup=False,
tags=["example", "vdk"],
) as dag:
trino_job1 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job1",
team_name="taurus",
task_id="trino-job1",
)
trino_job2 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job2",
team_name="taurus",
task_id="trino-job2",
)
transform_job = VDKOperator(
conn_id="vdk-default",
job_name="airflow-transform-job",
team_name="taurus",
task_id="transform-job",
)
[trino_job1, trino_job2] >> transform_job
Example
Demo
You can see demo during one of the community meetings here: https://www.youtube.com/watch?v=c3j1aOALjVU&t=690s
Architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for airflow-provider-vdk-0.0.1066314998.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 325d338b90c39ecbb9b5faf68521701f4b5a96f8d053edaca1ad73aea444d331 |
|
MD5 | f3e574a10f1f976eb27090106d087ac9 |
|
BLAKE2b-256 | 4046e25c625f088450c65dbf339f3560e583276f1e50d469e5d4fd108c2666f8 |