Replaces BioWardrobe's backend with CWL Airflow
Project description
BioWardrobe backend (airflow+cwl)
About
Python package to replace BioWardrobe's python/cron scripts. It uses Apache-Airflow functionality with CWL v1.0.
Install
- Add biowardrobe MySQL connection into Airflow connections
select * from airflow.connection; insert into airflow.connection values(NULL,'biowardrobe','mysql','localhost','ems','wardrobe','',null,'{"cursor":"dictcursor"}',0,0);
- Install
sudo pip3 install .
Requirements
-
Make sure your system satisfies the following criteria:
- Ubuntu 16.04.3
- python3.6
sudo add-apt-repository ppa:jonathonf/python-3.6 sudo apt-get update sudo apt-get install python3.6
- pip3
curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6 pip3 install --upgrade pip3
- setuptools
pip3 install setuptools
- docker
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo groupadd docker sudo usermod -aG docker $USER
Log out and log back in so that your group membership is re-evaluated. - libmysqlclient-dev
sudo apt-get install libmysqlclient-dev
- nodejs
sudo apt-get install nodejs
- python3.6
- Ubuntu 16.04.3
-
Get the latest version of
cwl-airflow-parser
. If Apache-Airflow or cwltool aren't installed, installation will be done automatically with recommended versions. SetAIRFLOW_HOME
environment variable to airflow config directory default is~/airflow/
.git clone https://github.com/datirium/cwl-airflow-parser.git cd cwl-airflow-parser sudo pip3 install .
-
If required, add extra airflow packages for extending Airflow functionality, for instance, with MySQL support
pip3 install apache-airflow[mysql]
.
Running
-
To create BioWardrobe's dags run
biowardrobe-init
in airflow's dags directorycd ~/airflow/dags ./biowardrobe-init
-
Run Airflow scheduler:
airflow scheduler
-
Use
airflow trigger_dag
with input parameter--conf "JSON"
where JSON is either job definition or biowardrobe_uid and explicitly specified cwl descriptordag_id
.airflow trigger_dag --conf "{\"job\":$(cat ./hg19.job)}" "bowtie-index"
where
hg19.job
is:{ "fasta_input_file": { "class": "File", "location": "file:///wardrobe/indices/bowtie/hg19/chrM.fa", "format":"http://edamontology.org/format_1929", "size": 16909, "basename": "chrM.fa", "nameroot": "chrM", "nameext": ".fa" }, "output_folder": "/wardrobe/indices/bowtie/hg19/", "threads": 6, "genome": "hg19" }
-
All the output will be moved from temporary directory into output_folder parameter of the job.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba0ad97e7facaf63238a6278f72ef21242651701b4c63b4b754e5f9b83f9fa14 |
|
MD5 | ccec7142bc0c6854b0ad05af48e23a95 |
|
BLAKE2b-256 | 992b72a310ac4dc9722c96b18b3de841758a41e9b69c7a1c15306120244f3793 |