Skip to main content

A platform-agnostic, cloud-ready framework for simplified deployment of the Common Workflow Language using a graphical web interface

Project description

CWLab - An open-source framework for simplified deployment of the Common Workflow Language using a graphical web interface

Background and Scope:

The Common Workflow Language (CWL) allows to wrap and link up bioinformatic software in a standardized and portable way. However, setting up and operating a CWL-based workflow management system can be a labor-intensive challenge for many data-driven laboratories. To this end, we developed CWLab: a framework for simplified, graphical deployment of CWL.

CWLab allows life-science researchers with all levels of computational proficiency to create, execute and monitor jobs for CWL-wrapped tools and workflows. Input parameters for large sample batches are specified using a simple HTML form or a spreadsheet and are automatically validated. The integrated web server allows to remotely control the execution on clusters as well as single workstations. Moreover, automatic infrastructure provisioning and scaling for OpenStack-based clouds is being implemented. CWLab can also be used as a local desktop application that supports Linux, MacOS, and Windows by leveraging Docker containerization. Our Python-based framework is easy to set up and, via a flexible API, it can be integrated with any CWL runner and adapted to custom software environments.

With CWLab, we would like to hide the complexity of workflow management so that scientific users can focus on their data analyses. This might promote the adoption of CWL in multi-professional life-science laboratories.

Installation and Quick Start:

Attention: CWLab is in alpha state currently and not all features are available yet. However, the core functionalies are working and we are happy if you test it. We are working hard to push out a stable version in the comming weeks. Please press the watch button to not miss it.

Installation can be done using pip:
python3 -m pip install cwlab

Please see the section "Configuration" for a discussion of available options.

Start the web server with your costum configuration (or leave out the --config flag to use the default one):
cwlab up --config config.yaml

The usage of the web interface should be self explanatory with build-in instruction. The following section give an overview of the basic usage senario.

Usage:

Connect to the web interface:

Open a modern browser of your choice like Chrome, Firefox, Safari, or Edge (Internet Explorer might be partially incompatible).

Type in the URL of your web server. The URL depends on your configuration:

  • If the webserver is running on the same machine and uses port 5000 is used (this is the default), type:
    https://localhost:5000/

  • If CWLab is running on a remote machine in the same network, type in the machine's IP address and the used port. For instance, if the IP adress is 172.22.0.1 and port 5000 is used:
    https://172.22.0.1:5000/

You should see a Welcome page like this:
welcome screenshot

Import a CWL workflow or tool:

CWLab can be used to run any workflow or tool that has been wrapped using the the Common Workflow Language. Of course, you can write workflows or tool wrappers yourself (we recommend rabix-composer https://rabix.io/), however, for many especially bioinformatic tasks, existing CWL solution are publicly available. Check the CWL website as a starting point:
https://www.commonwl.org/#Repositories_of_CWL_Tools_and_Workflows.

To import a CWL document:

  • Click on the button "Import CWL Workflow/Tool" in the top bar
  • Choose a CWL document (workflow or tool)*
  • Press the import button

The workflow will be automatically validated:
import screenshot

*Please note: Currently, workflows can only be imported in the "packed" format. We will add support for the unpacked format soon. To pack a CWL workflow, use:
cwltool --pack my_workflow.cwl > my_workflow_packed.cwl

Create a new Job:

To run a workflow or tool with your data, you have to create a new job. One job may contain multiple runs (for instance multiple samples or conditions). CWLab will automatically present you a list of needed input parameters. For each parameter, you can choose whether to specify it globally (all runs of a job will get the same value) or per run.

  • Click on the button "Create New Job" in the top bar and select the desired CWL document in the side bar
  • Specify a discriptive job name (the job ID will be composed of the date, time, and the name)
  • If the job shall contain multiple runs toggle the "runs per job" switch, then:
    • Specify run names as comma-seperated list in the dedicated text field
    • In the parameter list, select which parameters should be run-specific
  • CWLab will automatically create a parameter form for you to fill in:
    • Export/download the form in the desired format
    • Open it in a spreadsheet editor (e.g. Microsoft Excel or Open Office)
    • The file may contain the following sheets (depends on the type of input parameters and your selections for "global"/"run-specific" specification):
      • global single values: parameters that take only one value and are defined globally (one for all runs)
      • run-specific single values: parameters that take only one value but are specified per run
      • global arrays: array parameters (takes a list of values) that are defined globally
      • A seperate sheet will be created for each run-specific array parameter. It will be titled with the parameters name
      • config: This sheet contains configuration options that only need adaption in adavance use cases.
    • Fill in the sheet and import/upload the edited file to CWLab *
  • Your parameter settings are automatically validated. (E.g. it is checked whether the specified values match the parameter's type and whether the paths of specified files or direcories exist.)
  • If valid, you can press the "create job" button and head over to "Job Execution & Results" in the top bar

* Please note: For specifying file or directory parameters, there are two options:

  • Either specify the absolute path
  • Specify a character string that can be uniquely matched to a file/directory in the default input directory (please see the INPUT_DIR parameter in the config section).

This is an example screenshot for creating a job for an ATAC-seq workflow:
create job screenshot

Job execution:

  • Click on "Job Execution & Results" in the top bar and choose the job of interest in the side bar
  • Select the runs you want to start
  • Select an execution profile (see the "Configuration" for details) and press "start"
  • The execution status will be displayed in the run list
  • Pressing the "Details/Results" button will show (not implemented yet):
    • the deployed input parameter
    • execution logs (from the CWL runner)
    • a QC report
  • Once finished the output can be found in the "exec" directory (set in the configuration) along with the used parameter values, CWL document, and log files

An example screenshot of the execution interface:
execution screenshot

Configuration:

CWLab is a higly versatile package and makes almost no assumtions on your hard- and software environment used for execution of CWL. To adapt it to your system and use case, a set of configuration option are available:
- General configs, including: - web server (hosting ip address and port, remotely or locally available, login protected or not) - paths of working directories - Execution profiles:
This flexible API allows you to adapt CWLab to your local software environemnt and to integrate a CWL runner of your choice (such as Cwltool, Toil, or Cromwell).

All configuration options can be specified in a single YAML file which is provided to CWLab upon start:
cwlab up --config my_config.yaml

To get an example config file, run following command:
cwlab print_config > config.yaml (or see the example below)

General Configs:

  • WEB_SERVER_HOST:
    Specify the host or IP address on which the web server shall run. Use localhost for local usage on your machine only. Use 0.0.0.0 to allow remote accessibility by other machines in the same network.
    Default: localhost

  • WEB_SERVER_PORT:
    Specify the port used by the web server.
    Default: 5000

  • TEMP_DIR:
    Directory for temporary files.
    Default: a subfolder "cwlab/temp" in the home directory

  • CWL_DIR:
    Directory for saving CWL documents.
    Default: a subfolder "cwlab/temp" in the home directory

  • EXEC_DIR:
    Directory for saving execution data including output files.
    Default: a subfolder "cwlab/temp" in the home directory

  • INPUT_DIR:
    Directory where input files are expected by default (if the full path is not specified).
    Default: a subfolder "cwlab/temp" in the home directory

  • DB_DIR:
    Directory for databases.
    Default: a subfolder "cwlab/temp" in the home directory

  • DEBUG:
    If set to True, debugging mode is turned on. Do not use on production systems.
    Default: False

Exec Profiles:

This is where you configure how to execute cwl jobs on your system. A profile consists of four steps: pre_exec, exec, eval, and post_exec (only exec required, the rest is optional). For each step you can specify commands that are executed in bash or cmd terminal.

You can define multiple execution profile as shown in the config example below. This allows frontend users to chooce between different execution options (e.g. using different CWL runners, different dependency management systems, or even chooce a between multiple available batch execution infrastructures like lsf, pbs, ...). For each execution profile, following configuration parameters are available (but only shell and exec is required):

  • shell:
    Specify which shell to use. For Linux or MacOS use bash. For Windows, use cmd.
    Required.

  • timeout:
    For each step in the execution profile, you can set a timeout limit.
    Default:
    pre_exec: 120 exec: 86400 eval: 120 post_exec: 120

  • pre_exec*:
    Shell commands that are executed before the actual CWL execution. For instance to load required python/conda environments.
    Optional.

  • exec*:
    Shell commands to start the CWL execution. Usually this is only the command line to execute the CWL runner. The stdout and stderr of the CWL runner should be redirected to the predefined log file.
    Required.

  • eval*:
    The exit status at the end of the exec step is automatically checked. Here you can specify shell commands to additionally evaluate the content of the execution log to determine if the execution succeeded. To communicate failure to CWLab, set the SUCCESS variable to False.
    Optional.

  • post_exec*: Shell commands that are executed after exec and eval. For instance, this can be used to cleanup temporary files.

* Additional notes regarding execution profile steps:

  • In each step following predefiened variables are available:
    • JOB_ID
    • RUN_ID (please note: is only unique within a job)
    • CWL (the path to the used CWL document)
    • RUN_YAML (the path to the YAML file containing input parameters)
    • OUTPUT_DIR (the path of the run-specific output directory)
    • LOG_FILE (the path of the log file that should receive the stdout and stderr of CWL runner)
    • SUCCESS (if set to False the run will be marked as failed and terminated)
  • The four steps will be executed in the same shell session and can therefore be treated as one connected script. (Between the steps, CWLab communicates the status to the database allowing the User to get status notifications via the front end).
  • Thus you may define your own variables that will also be available in all downstream steps.
  • At the end of each step. The exit code is checked. If it is non-zero, the run will be marked as failed. Please note, if a step consists of multiple commands and an intermediate command fails, this will not be recognized by CWLab as long as the final command of the step will succeed. To manually communicate a failure to CWLab, please set the SUCCESS variable to False.
  • The steps are executed using pexpect (https://pexpect.readthedocs.io/en/stable/overview.html), this allows you also connect to a remote infrastructure via ssh (recommended to use an ssh key). Please be aware that the path of files or directories specified in the input parameter YAML will not be adapted to the new host. We are working on solutions to achieve an automated path correction and/or upload functionality if the execution host ist not the CWLab server host.

Example comfiguration file:

WEB_SERVER_HOST: localhost 
WEB_SERVER_PORT: 5000

DEBUG: False  

TEMP_DIR: '/home/cwlab_user/cwlab/temp'
CWL_DIR: '/home/cwlab_user/cwlab/cwl'
EXEC_DIR: '/home/cwlab_user/cwlab/exec'
INPUT_DIR: '/home/cwlab_user/cwlab/input'
DB_DIR: '/home/cwlab_user/cwlab/db'

EXEC_PROFILES:

  cwltool_local:
    shell: bash
    timeout:
      pre_exec: 120
      exec: 86400
      eval: 120
      post_exec: 120
    exec: |
      cwltool --outdir "${OUTPUT_DIR}" "${CWL}" "${RUN_YAML}" >> "${LOG_FILE}" 2>&1
    eval: | 
      LAST_LINE=$(tail -n 1 ${LOG_FILE})
      if [[ "${LAST_LINE}" == *"Final process status is success"* ]]
      then
        SUCCESS=True
      else
        SUCCESS=False
        ERR_MESSAGE="cwltool failed - ${LAST_LINE}"
      fi

Documentation:

Please note: A much more detailed documentation is on the way. In the meantime, please notify us if you have any questions (see the "Contact and Contribution" section). We are happy to help.

Contact and Contribution:

If you have any question or are experiencing problems with CWLab, please contact us at k.breuer@dkfz.de or open an issue in github.

If you would like to contribute to the development and like to extend the functionality of CWLab to meet your requirements, you are more than welcome. We will do our best to support you and your contribution will be acknowledged.

About Us:

CWLab is developed with love in the Devision of Cancer Epigenomics at the German Cancer Research Center (DKFZ) in the beautiful university city of Heidelberg. We are an interdisziplinary team with wet-lab scientists and bioinformatians working closely together. Our DNA sequencing-drive methodologies produce challenging amounts of data. CWLab helps us by giving all members of our team the ability to perform common bioinformatic analyses autonomously without having to aquire programming skills. This allows our bioinformatic stuff to focus on method development and interpretation of computationally complex data interpretation and integration.

If you like to know more about us, please visit our website https://www.dkfz.de/en/CanEpi/contact.html.

Licence:

This package is free to use and modify under the Apache 2.0 Licence.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwlab-0.1.1.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

cwlab-0.1.1-py2-none-any.whl (62.8 kB view details)

Uploaded Python 2

File details

Details for the file cwlab-0.1.1.tar.gz.

File metadata

  • Download URL: cwlab-0.1.1.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for cwlab-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d5002318e8b2231780569f7abd1d5e9c2849498ec4dc262b9291c911ffbca3bb
MD5 f519f90bcf17c90723d53c896e718168
BLAKE2b-256 83ce42447c8d8120811cf122c60255970f344b062f6b005cdac556bd00518f71

See more details on using hashes here.

File details

Details for the file cwlab-0.1.1-py2-none-any.whl.

File metadata

  • Download URL: cwlab-0.1.1-py2-none-any.whl
  • Upload date:
  • Size: 62.8 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for cwlab-0.1.1-py2-none-any.whl
Algorithm Hash digest
SHA256 74d92bad2a186c31426b131d530911054755fd447cad04ad1da7ac8a0bce4c32
MD5 89a231f4553b8509a2af81b069fb3d9a
BLAKE2b-256 c221fb82a25467beac6940286f9231ca0310748238f6dd55a4f995da75822d42

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page