A software to extract and analyze the structure and associated metadata from a Nextflow workflow.
Project description
BioFlow-Insight
Description
This repository contains BioFlow-Insight, a Python software tool. BioFlow-Insight automatically analyses Nextflow workflow code, extracting useful information, notably in the form of visual graphs illustrating the workflow's structure and its various steps.
BioFlow-Insight is easily installable as a Python package (see here). It is also accessible as a free web service. For more information and to start using BioFlow-Insight, visit here (https://bioflow-insight.pasteur.cloud/).
Table of Contents
Installation
Using from source
BioFlow-Insight's dependencies are given in the requirements.txt
file.
Note : To install graphviz, in linux you might need to execute this command
sudo apt install graphviz
Using the Python package
BioFlow-Insight is easily installable as a Python package.
To install it using pip, use the following command :
pip install bioflow-insight
TODO
Usage
BioFlow-Insight automatically analyses the code of Nextflow workflows and extracts useful information, particularly in the form of visual graphs depicting the workflow's structure and representing its different steps.
For an explanation of the different elements composing a Nextflow workflow, see its documentation.
The 3 different graphs generated by BioFlow-Insight are :
- The specification graph which represents all elements of the workflow, including processes and operations, and their interactions through channels. Within the specification graph, we define two types of operations: those without inputs and those with inputs (called branch operations).
- The second graph represents operations without any inputs, along with processes and their dependencies. This graph, called the dependency graph without branch operations, is obtained by removing the branch operations and linking the remaining elements if a path exists between them in the original specification graph.
- The final graph, called the process dependency graph, represents only processes and their dependencies. Similar to the latter, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph.
For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage here (https://bioflow-insight.pasteur.cloud/).
To examplify BioFlow-Insight utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found here). Examples of the output are given below.
Input
In this example, we are going to use the BioFlow-Insight source code. After cloning both repositories (this one and the rnaseq-nf workflow). We can run the following command to run the analyses (the different steps are described below) :
import os
current_path= os.getcwd()
os.chdir("bioflow-insight/")
from src.workflow import Workflow
os.chdir(current_path)
w = Workflow("./rnaseq-nf/main.nf", duplicate=False, display_info=True)
w.initialise()
w.generate_all_graphs(render_graphs = True, processes_2_remove=[])
- line 1 to 5 : import the
Workflow
object allowing the analysis - line 6 : create the object
w
corresponding toWorkflow
- line 6 : the first parameter is the address of the main Nextflow file (obligatory paramter).
- line 6 : parameter
duplicate
(by defaultFalse
), in the case some processes and subworkflows are duplicated in the workflow by theinclude as
option, this parameter will duplicate the elements in the graphs. - line 6 : parameter
display_info
(by defaultTrue
), shows the files which are being analysed
- line 7 :
initialise
runs the entire analysis of the Nextflow workflow - line 8 :
generate_all_graphs
generates all the graphs in the mermaid and dot formats + the associated metadata for the graphs- line 8 : parameter
render_graphs
(by defaultTrue
), if true the png images of the dot graphs are generated thanks to Graphviz. For large workflows this can sometimes fail (depending on the hardware). - line 8 : parameter
processes_2_remove
(by default[]
), is a list of processes which are to be removed from the graphs. This is usefull in the cas ofMULTIQC
processes (they don't really serve a functionnal role but can cluter the structure since they are connected to the majority of processes).
- line 8 : parameter
Output
After the workflow has been analysed and the graphs generated, the outputs are saved in the results
folder.
The structure of this folder is organised as such :
.
├── debug
│ ├── calls.nf
│ ├── operations_in_call.nf
│ └── operations.nf
├── graphs
│ ├── dependency_graph_wo_branch_operations.dot
│ ├── dependency_graph_wo_branch_operations.json
│ ├── dependency_graph_wo_branch_operations.mmd
│ ├── dependency_graph_wo_branch_operations.png
│ ├── dependency_graph_wo_branch_operations_wo_lables.dot
│ ├── dependency_graph_wo_branch_operations_wo_lables.mmd
│ ├── dependency_graph_wo_branch_operations_wo_lables.png
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations.dot
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations.mmd
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations.png
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.dot
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.mmd
│ ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.png
│ ├── metadata_dependency_graph_wo_branch_operations.json
│ ├── metadata_process_dependency_graph.json
│ ├── metadata_specification_graph.json
│ ├── process_dependency_graph.dot
│ ├── process_dependency_graph.json
│ ├── process_dependency_graph.mmd
│ ├── process_dependency_graph.png
│ ├── specification_graph.dot
│ ├── specification_graph.json
│ ├── specification_graph.mmd
│ ├── specification_graph.png
│ ├── specification_graph_wo_labels.dot
│ ├── specification_graph_wo_labels.mmd
│ ├── specification_graph_wo_labels.png
│ ├── specification_wo_orphan_operations.dot
│ ├── specification_wo_orphan_operations.mmd
│ ├── specification_wo_orphan_operations.png
│ ├── specification_wo_orphan_operations_wo_labels.dot
│ ├── specification_wo_orphan_operations_wo_labels.mmd
│ └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata-rnaseq-nf.json
- The
ro-crate-metadata-rnaseq-nf.json
describes the workflow following an extended Workflow RO-Crate profile. The description of this extended profile can be found here (TODO) - the
debug
folder contains different intermediary files which are ussefull for debugging - the
graphs
folder contains the different graphs which are generated. For each of the 3 graphs described above, BioFlow-Insight generates :- A
json
file which describes the graph using BioFlow-Insight specific format - A
json
file which describes the metadata which are extracted from the graph - Where possible BioFlow-Insight also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.
- A
For each graph BioFlow-Insight generates it in the
mermaid
format and the dotdot
format. If therender_graphs
option is set toTrue
, thepng
image is also generated.
Here are some of the graphs which are generated by BioFlow-Insight, they are rendered using Graphviz (png).
Specification Graph | Dependency Graph without branch operations | Process Dependency Graph |
License
This project is licensed under the GNU Affero General Public License.
TODO -> add license to git repo
Funding
This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bioflow_insight-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a05ba91870192224910a9f97e32c973a0c6d398e994211c2c5b02ab318c679fa |
|
MD5 | d626caa61f8c6a73da6783ebac817567 |
|
BLAKE2b-256 | 30ef92a51761111d3df20485bc9c1b35693e0814d232b686b9ae9397baabe4f9 |