Replaces BioWardrobe's backend with CWL Airflow
Project description
BioWardrobe backend (airflow+cwl)
About
Python package to replace BioWardrobe's python/cron scripts. It uses Apache-Airflow functionality with CWL v1.0.
Install
- Add biowardrobe MySQL connection into Airflow connections
select * from airflow.connection; insert into airflow.connection values(NULL,'biowardrobe','mysql','localhost','ems','wardrobe','',null,'{"cursor":"dictcursor"}',0,0);
- Install
sudo pip3 install .
Requirements
-
Make sure your system satisfies the following criteria:
- Ubuntu 16.04.3
- python3.6
sudo add-apt-repository ppa:jonathonf/python-3.6 sudo apt-get update sudo apt-get install python3.6
- pip3
curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6 pip3 install --upgrade pip3
- setuptools
pip3 install setuptools
- docker
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo groupadd docker sudo usermod -aG docker $USER
Log out and log back in so that your group membership is re-evaluated. - libmysqlclient-dev
sudo apt-get install libmysqlclient-dev
- nodejs
sudo apt-get install nodejs
- python3.6
- Ubuntu 16.04.3
-
Get the latest version of
cwl-airflow-parser
. If Apache-Airflow or cwltool aren't installed, installation will be done automatically with recommended versions. SetAIRFLOW_HOME
environment variable to airflow config directory default is~/airflow/
.git clone https://github.com/datirium/cwl-airflow-parser.git cd cwl-airflow-parser sudo pip3 install .
-
If required, add extra airflow packages for extending Airflow functionality, for instance, with MySQL support
pip3 install apache-airflow[mysql]
.
Running
-
To create BioWardrobe's dags run
biowardrobe-init
in airflow's dags directorycd ~/airflow/dags ./biowardrobe-init
-
Run Airflow scheduler:
airflow scheduler
-
Use
airflow trigger_dag
with input parameter--conf "JSON"
where JSON is either job definition or biowardrobe_uid and explicitly specified cwl descriptordag_id
.airflow trigger_dag --conf "{\"job\":$(cat ./hg19.job)}" "bowtie-index"
where
hg19.job
is:{ "fasta_input_file": { "class": "File", "location": "file:///wardrobe/indices/bowtie/hg19/chrM.fa", "format":"http://edamontology.org/format_1929", "size": 16909, "basename": "chrM.fa", "nameroot": "chrM", "nameext": ".fa" }, "output_folder": "/wardrobe/indices/bowtie/hg19/", "threads": 6, "genome": "hg19" }
-
All the output will be moved from temporary directory into output_folder parameter of the job.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz
.
File metadata
- Download URL: biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba0ad97e7facaf63238a6278f72ef21242651701b4c63b4b754e5f9b83f9fa14 |
|
MD5 | ccec7142bc0c6854b0ad05af48e23a95 |
|
BLAKE2b-256 | 992b72a310ac4dc9722c96b18b3de841758a41e9b69c7a1c15306120244f3793 |