Skip to main content

Python package to extend Airflow functionality with CWL v1.0 support

Project description

Build Status - Travis CI
Build Status - CWL conformance tests

cwl-airflow

Python package to extend Apache-Airflow 1.9.0 functionality with CWL v1.0 support.


Quick guides

  1. Install cwl-airflow
    $ pip install cwl-airflow --user --find-links https://michael-kotliar.github.io/cwl-airflow-wheels/
    
  2. Init configuration
    $ cwl-airflow init
    
  3. Run demo
    $ cwl-airflow demo --auto
    
  4. When you see in the console output that Airflow Webserver is started, open the provided URL address

Installation requirements

OS specific

Ubuntu 16.04.4
  • python 2.7/3.5 (tested on the system Python 2.7.12 and the latest available 3.5.2)
  • docker
    sudo apt-get update
    sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    sudo apt-get update
    sudo apt-get install docker-ce
    sudo groupadd docker
    sudo usermod -aG docker $USER
    
    Log out and log back in so that your group membership is re-evaluated.
  • python-dev (or python3-dev if using Python 3.5)
    sudo apt-get install python-dev # python3-dev
    
macOS High Sierra 10.13.5
  • python 2.7/3.6 (tested on the system Python 2.7.10 and the latest availble 3.6.5)
  • docker (follow the official install documentation)

Common

  • pip
    wget https://bootstrap.pypa.io/get-pip.py
    python get-pip.py --user
    
    When using the system Python, you might need to update your PATH variable following the instruction printed on console
  • setuptools
    pip install -U setuptools --user
    
  • Apple Command Line Tools
    xcode-select --install
    
    Click Install on the pop up when it appears.

Configuration

When running cwl-airflow init the following parameters can be specified:

  • -l LIMIT, --limit LIMIT sets the number of processed jobs kept in history. Default 10 for each of the category: Running, Success, Failed
  • -j JOBS, --jobs JOBS sets the path to the folder where all the new jobs will be added. Default ~/airflow/jobs
  • -t DAG_TIMEOUT, --timeout DAG_TIMEOUT sets timeout (in seconds) for importing all the DAGs from the DAG folder. Default 30 seconds
  • -r WEB_INTERVAL, --refresh WEB_INTERVAL sets the webserver workers refresh interval (in seconds). Default 30 seconds
  • -w WEB_WORKERS, --workers WEB_WORKERS sets the number of webserver workers to be refreshed at the same time. Default 1
  • -p THREADS, --threads THREADS sets the number of threads for Airflow Scheduler. Default 2

If you update Airflow configuration file (default location ~/airflow/airflow.cfg) manually, make sure to run cwl-airflow init command to apply all the changes, especially if core/dags_folder or cwl/jobs parameters are changed.

Submitting a job

To submit new cwl workflow descriptor and input parameters file for execution it's recommended to use cwl-airflow submit command.

Manual mode

To perform a single run of the specific CWL workflow and job files

cwl-airflow run WORKFLOW_FILE JOB_FILE

If uid, output_folder, workflow and tmp_folder fields are not present in the job file, you may set the them with the following arguments:

  -o, --outdir      Output directory, default current directory
  -t, --tmp         Folder to store temporary data, default /tmp
  -u, --uid         Unique ID, default random uuid

Demo mode

  1. Get the list of the available demo workflows to run
    $ cwl-airflow demo
    
  2. Run demo workflow from the list (if running on macOS, consider adding the directory where you installed cwl-airflow package to the Docker / Preferences / File sharing options)
    $ cwl-airflow demo super-enhancer.cwl
    
  3. Optionally, run airflow webserver to check workflow status (default webserver link)
    $ airflow webserver
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwl-airflow-1.0.9.tar.gz (6.2 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page