Python package to extend Airflow functionality with CWL v1.0 support
Project description
- Travis CI
- CWL conformance tests
cwl-airflow
Python package to extend Apache-Airflow 1.9.0 functionality with CWL v1.0 support.
Quick guides
- Install cwl-airflow
$ pip install cwl-airflow --user --find-links https://michael-kotliar.github.io/cwl-airflow-wheels/
- Init configuration
$ cwl-airflow init
- Run demo
$ cwl-airflow demo --auto
- When you see in the console output that Airflow Webserver is started, open the provided URL address
Installation requirements
OS specific
Ubuntu 16.04.4
- python 2.7/3.5 (tested on the system Python 2.7.12 and the latest available 3.5.2)
- docker
Log out and log back in so that your group membership is re-evaluated.sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo groupadd docker sudo usermod -aG docker $USER
- python-dev (or python3-dev if using Python 3.5)
sudo apt-get install python-dev # python3-dev
macOS High Sierra 10.13.5
- python 2.7/3.6 (tested on the system Python 2.7.10 and the latest availble 3.6.5)
- docker (follow the official install documentation)
Common
- pip
wget https://bootstrap.pypa.io/get-pip.py python get-pip.py --user
When using the system Python, you might need to update your PATH variable following the instruction printed on console - setuptools
pip install -U setuptools --user
- Apple Command Line Tools
xcode-select --install
Click Install on the pop up when it appears.
Configuration
When running cwl-airflow init
the following parameters can be specified:
-l LIMIT
,--limit LIMIT
sets the number of processed jobs kept in history. Default 10 for each of the category: Running, Success, Failed-j JOBS
,--jobs JOBS
sets the path to the folder where all the new jobs will be added. Default~/airflow/jobs
-t DAG_TIMEOUT
,--timeout DAG_TIMEOUT
sets timeout (in seconds) for importing all the DAGs from the DAG folder. Default 30 seconds-r WEB_INTERVAL
,--refresh WEB_INTERVAL
sets the webserver workers refresh interval (in seconds). Default 30 seconds-w WEB_WORKERS
,--workers WEB_WORKERS
sets the number of webserver workers to be refreshed at the same time. Default 1-p THREADS
,--threads THREADS
sets the number of threads for Airflow Scheduler. Default 2
If you update Airflow configuration file (default location ~/airflow/airflow.cfg) manually, make sure to run cwl-airflow init command to apply all the changes, especially if core/dags_folder or cwl/jobs parameters are changed.
Submitting a job
To submit new cwl workflow descriptor and input parameters file for execution it's recommended to use cwl-airflow submit command.
Manual mode
To perform a single run of the specific CWL workflow and job files
cwl-airflow run WORKFLOW_FILE JOB_FILE
If uid
, output_folder
, workflow
and tmp_folder
fields are not present
in the job file, you may set the them with the following arguments:
-o, --outdir Output directory, default current directory
-t, --tmp Folder to store temporary data, default /tmp
-u, --uid Unique ID, default random uuid
Demo mode
- Get the list of the available demo workflows to run
$ cwl-airflow demo
- Run demo workflow from the list (if running on macOS, consider adding the directory where you
installed cwl-airflow package to the Docker / Preferences / File sharing options)
$ cwl-airflow demo super-enhancer.cwl
- Optionally, run
airflow webserver
to check workflow status (default webserver link)$ airflow webserver
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.