Skip to main content

An Airflow plugin to launch and monitor Spark applications on the Data Mechanics platform

Project description

# Data Mechanics Airflow integration

## Spin up Airflow

If you haven’t got just on your machine already, install it with

`bash brew install just `

Then run Airflow with

`bash just serve `

> The script will ask you to install docker-compose if you haven’t got it on your machine already. > You can find it [here](https://docs.docker.com/install/).

> The first run will be long because Docker images are downloaded.

Shut down Airflow with Ctrl+C.

## Before the demo: one-time operations

Open Airflow at [http://localhost:8080](http://localhost:8080).

These are some of the gotchas you might run into when you’re not used to Airflow.

  1. Activate the DAGs you want to run by toggling their state from Off to On (on the left in the DAGs page)

  2. Create a Data Mechanics connection. To do this, click on Admin in the navbar, then Connections in the dropdown menu, then go to the Create tab. The connection should have datamechanics_default as connection name, https://demo.datamechanics.co/ as host, and our usual API key for the demo cluster as password. Leave the rest blank.

> As long as you don’t trash the anonymous Docker compose volume on which the airflow db is persisted, you shouldn’t have to repeat the operations above, even if you restart Airflow.

## Do the demo

  1. Open Airflow at [http://localhost:8080](http://localhost:8080).

  2. Open the DAG full-example.

  3. Switch to Graph view

  4. Trigger the DAG (Trigger DAG)

  5. Explain that the first two tasks are run in parallel (you can show the dashboard at this point)

  6. The two last tasks are meant to fail. Click on the failed execution of the failed-app task, click on View logs, and show that the URL to the dashboard is provided in the logs

## Turn this demo into a library

The code that should be turned into an Airflow plugin library is contained in folder plugins/.

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

Date format is YYYY-MM-DD

## [1.0.0] 2020-09-11

### Changed

  • Converted the existing plugin into a Python package

[unreleased]: https://github.com/datamechanics/datamechanics_airflow_plugin/compare/v1.0.0…master [1.0.0]: https://github.com/datamechanics/datamechanics_airflow_plugin/compare/…v1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamechanics_airflow_plugin-1.0.0.tar.gz (6.6 kB view hashes)

Uploaded Source

Built Distribution

datamechanics_airflow_plugin-1.0.0-py2.py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page