Airflow provider for Versatile Data Kit.
Project description
Versatile Data Kit Airflow provider
A set of Airflow operators, sensors and a connection hook intended to help schedule Versatile Data Kit jobs using Apache Airflow.
Usage
To install it simply run:
pip install airflow-provider-vdk
Then you can create a workflow of data jobs (deployed by VDK Control Service) like this:
from datetime import datetime
from airflow import DAG
from vdk_provider.operators.vdk import VDKOperator
with DAG(
"airflow_example_vdk",
schedule_interval=None,
start_date=datetime(2022, 1, 1),
catchup=False,
tags=["example", "vdk"],
) as dag:
trino_job1 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job1",
team_name="taurus",
task_id="trino-job1",
)
trino_job2 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job2",
team_name="taurus",
task_id="trino-job2",
)
transform_job = VDKOperator(
conn_id="vdk-default",
job_name="airflow-transform-job",
team_name="taurus",
task_id="transform-job",
)
[trino_job1, trino_job2] >> transform_job
Example
Demo
You can see demo during one of the community meetings here: https://www.youtube.com/watch?v=c3j1aOALjVU&t=690s
Architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for airflow-provider-vdk-0.0.758850982.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83ac4e68f14a11cd3dda9497eb1e50d7228aeab6a2fd2319cd60bcd9e4eaa12e |
|
MD5 | 0806de3b908023895862f0526db7478e |
|
BLAKE2b-256 | 37ff93ae32e9edad0d4f8c65959d553bee98732693184bf50adc36d233f929ca |