Skip to main content

Python package to extend Airflow functionality with CWL v1.0 support

Project description

Build Status - Travis CI
Build Status - CWL conformance tests

cwl-airflow

Python package to extend Apache-Airflow 1.9.0 functionality with CWL v1.0 support.


Try it out

  1. Install cwl-airflow
    $ pip3 install cwl-airflow --user --find-links https://michael-kotliar.github.io/cwl-airflow-wheels/
    
  2. Init configuration
    $ cwl-airflow init
    
  3. Run demo
    $ cwl-airflow demo --auto
    
  4. When you see in the console output that Airflow Webserver is started, open the provided link

Installation requirements

  • Ubuntu 16.04.4
  • python 3.5.2
  • pip3
      wget https://bootstrap.pypa.io/get-pip.py
      python3 get-pip.py --user
    
  • setuptools
    pip3 install setuptools --user
    
  • docker
    sudo apt-get update
    sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    sudo apt-get update
    sudo apt-get install docker-ce
    sudo groupadd docker
    sudo usermod -aG docker $USER
    
    Log out and log back in so that your group membership is re-evaluated.
  • python3-dev
    sudo apt-get install python3-dev
    

Configuration

When running cwl-airflow init the following parameters can be specified:

  • -l LIMIT, --limit LIMIT sets the number of processed jobs kept in history. Default 10 for each of the category: Running, Success, Failed
  • -j JOBS, --jobs JOBS sets the path to the folder where all the new jobs will be added. Default ~/airflow/jobs
  • -t DAG_TIMEOUT, --timeout DAG_TIMEOUT sets timeout (in seconds) for importing all the DAGs from the DAG folder. Default 30 seconds
  • -r WEB_INTERVAL, --refresh WEB_INTERVAL sets the webserver workers refresh interval (in seconds). Default 30 seconds
  • -w WEB_WORKERS, --workers WEB_WORKERS sets the number of webserver workers to be refreshed at the same time. Default 1
  • -p THREADS, --threads THREADS sets the number of threads for Airflow Scheduler. Default 2

If core/dags_folder parameters from Airflow configuration file (default location ~/airflow/airflow.cfg) has been updated manualy, make sure to rerun cwl-airflow init

Running

Batch mode

To automatically monitor and process all the job files present in a specific folder

  1. Make sure your job files include the following mandatory fields:

    • uid - unique ID, string
    • output_folder - absolute path the the folder to save result, string
    • workflow - absolute path the the workflow to be run, string

    Aditionally, job files may also include the tmp_folder parameter to point to the temporary folder absolute path.

  2. Put your JSON/YAML job files into the directory set as jobs in cwl section of airflow.cfg file (by default ~/airflow/cwl/jobs)

  3. Run Airflow scheduler:

    $ airflow scheduler
    

Manual mode

To perform a single run of the specific CWL workflow and job files

cwl-airflow run WORKFLOW_FILE JOB_FILE

If uid, output_folder, workflow and tmp_folder fields are not present in the job file, you may set the them with the following arguments:

  -o, --outdir      Output directory, default current directory
  -t, --tmp         Folder to store temporary data, default /tmp
  -u, --uid         Unique ID, default random uuid

Demo mode

  1. Get the list of the available demo workflows to run
    $ cwl-airflow demo
    
  2. Run demo workflow from the list (if running on macOS, consider adding the directory where you installed cwl-airflow package to the Docker / Preferences / File sharing options)
    $ cwl-airflow demo super-enhancer.cwl
    
  3. Optionally, run airflow webserver to check workflow status (default webserver link)
    $ airflow webserver
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cwl-airflow, version 1.0.8
Filename, size File type Python version Upload date Hashes
Filename, size cwl-airflow-1.0.8.tar.gz (6.2 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page