Skip to main content

Replaces BioWardrobe's backend with CWL Airflow

Project description

BioWardrobe backend (airflow+cwl)

About

Python package to replace BioWardrobe's python/cron scripts. It uses Apache-Airflow functionality with CWL v1.0.

Install

  1. Add biowardrobe MySQL connection into Airflow connections
    select * from airflow.connection;
    insert into airflow.connection values(NULL,'biowardrobe','mysql','localhost','ems','wardrobe','',null,'{"cursor":"dictcursor"}',0,0);
    
  2. Install
    sudo pip3 install .
    

Requirements

  1. Make sure your system satisfies the following criteria:

    • Ubuntu 16.04.3
      • python3.6
        sudo add-apt-repository ppa:jonathonf/python-3.6
        sudo apt-get update
        sudo apt-get install python3.6
        
      • pip3
        curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6
        pip3 install --upgrade pip3
        
      • setuptools
        pip3 install setuptools
        
      • docker
        sudo apt-get update
        sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
        curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
        sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
        sudo apt-get update
        sudo apt-get install docker-ce
        sudo groupadd docker
        sudo usermod -aG docker $USER
        
        Log out and log back in so that your group membership is re-evaluated.
      • libmysqlclient-dev
        sudo apt-get install libmysqlclient-dev
        
      • nodejs
        sudo apt-get install nodejs
        
  2. Get the latest version of cwl-airflow-parser. If Apache-Airflow or cwltool aren't installed, installation will be done automatically with recommended versions. Set AIRFLOW_HOME environment variable to airflow config directory default is ~/airflow/.

    git clone https://github.com/datirium/cwl-airflow-parser.git
    cd cwl-airflow-parser
    sudo pip3 install .
    
  3. If required, add extra airflow packages for extending Airflow functionality, for instance, with MySQL support pip3 install apache-airflow[mysql].

Running

  1. To create BioWardrobe's dags run biowardrobe-init in airflow's dags directory

    cd ~/airflow/dags
    ./biowardrobe-init 
    
  2. Run Airflow scheduler:

    airflow scheduler
    
  3. Use airflow trigger_dag with input parameter --conf "JSON" where JSON is either job definition or biowardrobe_uid and explicitly specified cwl descriptor dag_id.

    airflow trigger_dag --conf "{\"job\":$(cat ./hg19.job)}" "bowtie-index"
    

    where hg19.job is:

    {
      "fasta_input_file": {
        "class": "File", 
        "location": "file:///wardrobe/indices/bowtie/hg19/chrM.fa", 
        "format":"http://edamontology.org/format_1929",
        "size": 16909,
        "basename": "chrM.fa",
        "nameroot": "chrM",
        "nameext": ".fa"
      },
      "output_folder": "/wardrobe/indices/bowtie/hg19/",
      "threads": 6,
      "genome": "hg19"
    }
    
  4. All the output will be moved from temporary directory into output_folder parameter of the job.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

File details

Details for the file biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz.

File metadata

  • Download URL: biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for biowardrobe-airflow-analysis-1.0.20181211064034.tar.gz
Algorithm Hash digest
SHA256 ba0ad97e7facaf63238a6278f72ef21242651701b4c63b4b754e5f9b83f9fa14
MD5 ccec7142bc0c6854b0ad05af48e23a95
BLAKE2b-256 992b72a310ac4dc9722c96b18b3de841758a41e9b69c7a1c15306120244f3793

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page