Skip to main content
Donate to the Python Software Foundation or Purchase a PyCharm License to Benefit the PSF! Donate Now

Marquez integration with Airflow

Project description

marquez-airflow

CircleCI

A library that integrates Airflow DAGs with Marquez for automatic metadata collection.

Status

This library is under active development at The We Company.

Requirements

Installation

$ pip install marquez-airflow

To install from source run:

$ python setup.py install

Usage

Once the library is installed in your system, your current DAGs need to be modified slightly by changing the import of airflow.models.DAG to marquez.airflow.DAG, see example below:

from marquez_airflow import DAG
from airflow.operators.dummy_operator import DummyOperator


DAG_NAME = 'my_DAG_name'

default_args = {
    'marquez_location': 'github://data-dags/dag_location/',
    'marquez_input_urns': ["s3://some_data", "s3://more_data"],
    'marquez_output_urns': ["s3://output_data"],
    
    'owner': ...,
    'depends_on_past': False,
    'start_date': ...,
}

dag = DAG(DAG_NAME, schedule_interval='*/10 * * * *',
          default_args=default_args, description="yet another DAG")

run_this = DummyOperator(task_id='run_this', dag=dag)
run_this_too = DummyOperator(task_id='run_this_too', dag=dag)
run_this_too.set_upstream(run_this)

Contributing

See CONTRIBUTING.md for more details about how to contribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
marquez-airflow-0.1.6.tar.gz (4.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page