Skip to main content

A software to extract and analyze the structure and associated metadata from a Nextflow workflow.

Project description

BioFlow-Insight

MIT licensed Version 0.1

Description

This repository contains BioFlow-Insight, a Python software tool. BioFlow-Insight automatically analyses Nextflow workflow code, extracting useful information, notably in the form of visual graphs illustrating the workflow's structure and its various steps.

BioFlow-Insight is easily installable as a Python package (see here). It is also accessible as a free web service. For more information and to start using BioFlow-Insight, visit here (https://bioflow-insight.pasteur.cloud/).

Table of Contents

Installation

Using from source

BioFlow-Insight's dependencies are given in the requirements.txt file.

Note : To install graphviz, in linux you might need to execute this command sudo apt install graphviz

Using the Python package

BioFlow-Insight is easily installable as a Python package.

To install it using pip, use the following command :

pip install bioflow-insight

TODO

Usage

BioFlow-Insight automatically analyses the code of Nextflow workflows and extracts useful information, particularly in the form of visual graphs depicting the workflow's structure and representing its different steps.

For an explanation of the different elements composing a Nextflow workflow, see its documentation.

The 3 different graphs generated by BioFlow-Insight are :

  1. The specification graph which represents all elements of the workflow, including processes and operations, and their interactions through channels. Within the specification graph, we define two types of operations: those without inputs and those with inputs (called branch operations).
  2. The second graph represents operations without any inputs, along with processes and their dependencies. This graph, called the dependency graph without branch operations, is obtained by removing the branch operations and linking the remaining elements if a path exists between them in the original specification graph.
  3. The final graph, called the process dependency graph, represents only processes and their dependencies. Similar to the latter, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph.

For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage here (https://bioflow-insight.pasteur.cloud/).

To examplify BioFlow-Insight utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found here). Examples of the output are given below.

Input

In this example, we are going to use the BioFlow-Insight source code. After cloning both repositories (this one and the rnaseq-nf workflow). We can run the following command to run the analyses (the different steps are described below) :

import os
current_path= os.getcwd()
os.chdir("bioflow-insight/")
from src.workflow import Workflow
os.chdir(current_path)

w = Workflow("./rnaseq-nf/main.nf", duplicate=False, display_info=True)
w.initialise()
w.generate_all_graphs(render_graphs = True, processes_2_remove=[])
  1. line 1 to 5 : import the Workflow object allowing the analysis
  2. line 6 : create the object w corresponding to Workflow
    1. line 6 : the first parameter is the address of the main Nextflow file (obligatory paramter).
    2. line 6 : parameter duplicate (by default False), in the case some processes and subworkflows are duplicated in the workflow by the include as option, this parameter will duplicate the elements in the graphs.
    3. line 6 : parameter display_info (by default True), shows the files which are being analysed
  3. line 7 : initialise runs the entire analysis of the Nextflow workflow
  4. line 8 : generate_all_graphs generates all the graphs in the mermaid and dot formats + the associated metadata for the graphs
    1. line 8 : parameter render_graphs (by default True), if true the png images of the dot graphs are generated thanks to Graphviz. For large workflows this can sometimes fail (depending on the hardware).
    2. line 8 : parameter processes_2_remove (by default []), is a list of processes which are to be removed from the graphs. This is usefull in the cas of MULTIQC processes (they don't really serve a functionnal role but can cluter the structure since they are connected to the majority of processes).

Output

After the workflow has been analysed and the graphs generated, the outputs are saved in the results folder.

The structure of this folder is organised as such :

.
├── debug
│   ├── calls.nf
│   ├── operations_in_call.nf
│   └── operations.nf
├── graphs
│   ├── dependency_graph_wo_branch_operations.dot
│   ├── dependency_graph_wo_branch_operations.json
│   ├── dependency_graph_wo_branch_operations.mmd
│   ├── dependency_graph_wo_branch_operations.png
│   ├── dependency_graph_wo_branch_operations_wo_lables.dot
│   ├── dependency_graph_wo_branch_operations_wo_lables.mmd
│   ├── dependency_graph_wo_branch_operations_wo_lables.png
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.dot
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.mmd
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.png
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.dot
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.mmd
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.png
│   ├── metadata_dependency_graph_wo_branch_operations.json
│   ├── metadata_process_dependency_graph.json
│   ├── metadata_specification_graph.json
│   ├── process_dependency_graph.dot
│   ├── process_dependency_graph.json
│   ├── process_dependency_graph.mmd
│   ├── process_dependency_graph.png
│   ├── specification_graph.dot
│   ├── specification_graph.json
│   ├── specification_graph.mmd
│   ├── specification_graph.png
│   ├── specification_graph_wo_labels.dot
│   ├── specification_graph_wo_labels.mmd
│   ├── specification_graph_wo_labels.png
│   ├── specification_wo_orphan_operations.dot
│   ├── specification_wo_orphan_operations.mmd
│   ├── specification_wo_orphan_operations.png
│   ├── specification_wo_orphan_operations_wo_labels.dot
│   ├── specification_wo_orphan_operations_wo_labels.mmd
│   └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata-rnaseq-nf.json
  • The ro-crate-metadata-rnaseq-nf.json describes the workflow following an extended Workflow RO-Crate profile. The description of this extended profile can be found here (TODO)
  • the debug folder contains different intermediary files which are ussefull for debugging
  • the graphs folder contains the different graphs which are generated. For each of the 3 graphs described above, BioFlow-Insight generates :
    • A json file which describes the graph using BioFlow-Insight specific format
    • A json file which describes the metadata which are extracted from the graph
    • Where possible BioFlow-Insight also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.

For each graph BioFlow-Insight generates it in the mermaid format and the dot dot format. If the render_graphs option is set to True, the png image is also generated.

Here are some of the graphs which are generated by BioFlow-Insight, they are rendered using Graphviz (png).

Specification Graph Dependency Graph without branch operations Process Dependency Graph

License

This project is licensed under the GNU Affero General Public License.

TODO -> add license to git repo

Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.






Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioflow-insight-0.0.10.tar.gz (66.8 kB view hashes)

Uploaded Source

Built Distribution

bioflow_insight-0.0.10-py3-none-any.whl (74.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page