Skip to main content

Python package to extend Airflow functionality with CWL v1.0 support

Project description

[![Build Status](https://travis-ci.org/Barski-lab/cwl-airflow.svg?branch=master)](https://travis-ci.org/Barski-lab/cwl-airflow) - **Travis CI**
[![Build Status](https://ci.commonwl.org/buildStatus/icon?job=airflow-conformance)](https://ci.commonwl.org/job/airflow-conformance) - **CWL conformance tests**

# cwl-airflow

### About
Python package to extend **[Apache-Airflow 1.9.0](https://github.com/apache/incubator-airflow)**
functionality with **[CWL v1.0](http://www.commonwl.org/v1.0/)** support.

### Check it out
1. Install *cwl-airflow*
```sh
$ pip3 install cwl-airflow --user --find-links https://michael-kotliar.github.io/cwl-airflow-wheels/
```
2. Run *demo*
```sh
$ cwl-airflow demo --auto
```
3. Open your [web browser](http://localhost:8080/admin/) to see the progress


### Read it if you have troubles with installation
1. Check the requirements
- Ubuntu 16.04.4
- python 3.5.2
- pip3
```bash
wget https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py --user
```
- setuptools
```
pip3 install setuptools --user
```
- docker
```
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo groupadd docker
sudo usermod -aG docker $USER
```
Log out and log back in so that your group membership is re-evaluated.
- python3-dev
```bash
sudo apt-get install python3-dev
```


### Configuration
1. Initialize `cwl-airflow` with the following command
```sh
$ cwl-airflow init # consider using --refresh=5 --workers=4 options if you want the webserver to react faster
```
2. If you had **[Apache-Airflow v1.9.0](https://github.com/apache/incubator-airflow)**
already installed and configured, you may skip this step
```sh
$ airflow initdb
```

### Running
#### Batch mode
To automatically monitor and process all the job files present in a specific folder
1. Make sure your job files include the following mandatory fields:
- `uid` - unique ID, string
- `output_folder` - absolute path the the folder to save result, string
- `workflow` - absolute path the the workflow to be run, string

Aditionally, job files may also include the `tmp_folder` parameter
to point to the temporary folder absolute path.
2. Put your JSON/YAML job files into the directory
set as `jobs` in `cwl` section of `airflow.cfg` file
(by default `~/airflow/cwl/jobs`)
3. Run Airflow scheduler:
```sh
$ airflow scheduler
```

#### Manual mode
To perform a single run of the specific CWL workflow and job files

```bash
cwl-airflow run WORKFLOW_FILE JOB_FILE
```
If `uid`, `output_folder`, `workflow` and `tmp_folder` fields are not present
in the job file, you may set the them with the following arguments:
```bash
-o, --outdir Output directory, default current directory
-t, --tmp Folder to store temporary data, default /tmp
-u, --uid Unique ID, default random uuid
```
#### Demo mode
1. Get the list of the available demo workflows to run
```bash
$ cwl-airflow demo
```
2. Run demo workflow from the list (if running on macOS, consider adding the directory where you
installed cwl-airflow package to the _**Docker / Preferences / File sharing**_ options)
```bash
$ cwl-airflow demo super-enhancer.cwl
```
3. Optionally, run `airflow webserver` to check workflow status (default [webserver link](http://localhost:8080/))
```bash
$ airflow webserver
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cwl-airflow, version 1.0.7
Filename, size File type Python version Upload date Hashes
Filename, size cwl-airflow-1.0.7.tar.gz (6.2 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page