Marquez integration with Airflow
Project description
marquez-airflow
A library that integrates Airflow DAGs
with Marquez for automatic metadata collection.
Status
This library is under active development at Datakin.
Requirements
Installation
$ pip3 install marquez-airflow
To install from source, run:
$ python3 setup.py install
Usage
from datetime import datetime
from marquez_airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'datascience',
'depends_on_past': False,
'start_date': days_ago(1),
'email_on_failure': False,
'email_on_retry': False,
'email': ['datascience@datakin.com']
}
dag = DAG(
'orders_popular_day_of_week',
schedule_interval='@weekly',
default_args=default_args,
description='Determines the popular day of week orders are placed.'
)
t1 = PostgresOperator(
task_id='if_not_exists',
postgres_conn_id='food_delivery_db',
sql='''
CREATE TABLE IF NOT EXISTS popular_orders_day_of_week (
order_day_of_week VARCHAR(64) NOT NULL,
order_placed_on TIMESTAMP NOT NULL,
orders_placed INTEGER NOT NULL
);''',
dag=dag
)
t2 = PostgresOperator(
task_id='insert',
postgres_conn_id='food_delivery_db',
sql='''
INSERT INTO popular_orders_day_of_week (order_day_of_week, order_placed_on, orders_placed)
SELECT EXTRACT(ISODOW FROM order_placed_on) AS order_day_of_week,
order_placed_on,
COUNT(*) AS orders_placed
FROM top_delivery_times
GROUP BY order_placed_on;
''',
dag=dag
)
t1 >> t2
Contributing
See CONTRIBUTING.md for more details about how to contribute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
marquez-airflow-0.2.1.tar.gz
(11.5 kB
view hashes)