A software to extract and analyze the structure and associated metadata from a Nextflow workflow.
Project description
BioFlow-Insight
Description
BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.
BioFlow-Insight is easily installable as a CLI (see here). It is also freely accessible as a free web service. For more information and to start using BioFlow-Insight, visit here (https://bioflow-insight.pasteur.cloud/).
Table of Contents
Installation
Installing via pip
BioFlow-Insight is easily installable as a CLI.
To install it using pip, use the following command :
pip install bioflow-insight
Using from source
To access its source code, simply clone its GitLab repository. BioFlow-Insight is developed using Python 3
BioFlow-Insight's dependencies are given in the requirements.txt
file.
Note : To install graphviz, in linux you might need to execute this command
sudo apt install graphviz
Usage
BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.
For an explanation of the different elements composing a Nextflow workflow, see its documentation.
The 3 different graphs generated by BioFlow-Insight are :
-
Specification graph: BioFlow-Insight reconstructs the workflow’s specification graph from its source code without having to execute it. The specification graph is defined as a directed graph where nodes are processes and operations, and edges are channels that are directed from one vertex to another (steps of the workflow are ordered). This graph represents all the possible interactions between processes and operations through channels that are defined in the workflow code. Within the specification graph, we define two types of operations: operations are categorised in two groups: the following operations defined as operations that have at least one input, and the starting operations defined as operations without any inputs.
-
Dependency graph: From the specification graph, BioFlow-Insight also generates the dependency graph which represents starting operations, along with processes (as nodes) and their dependencies (edges). This graph is obtained by removing the following operations and linking the remaining elements if a path exists between them in the original specification graph. In this representation, the edges no longer represent interaction between its elements, but their dependencies.
-
Process dependency graph: Finally BioFlow-Insight also generates the process dependency graph which represents only processes (nodes) and their dependencies (edges). Similar to the dependency graph, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph. Again in this representation, the edges no longer represent interaction between its elements, but their dependencies.
For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage here (https://bioflow-insight.pasteur.cloud/specification/).
To examplify BioFlow-Insight utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found here). Examples of the output are given below.
Input
In this example, we are going to use the BioFlow-Insight tool to analyse the rna-seq workflow. After installing BioFlow-Insight via pip, and cloning the the rnaseq-nf repository. Simply run this command line :
bioflow-insight rnaseq-nf/main.nf
Output
After the workflow has been analysed and the graphs generated, the outputs are saved in the results
folder.
The structure of this folder is organised as such :
.
├── debug
│ ├── calls.nf
│ ├── operations_in_call.nf
│ └── operations.nf
├── graphs
│ ├── dependency_graph.dot
│ ├── dependency_graph.json
│ ├── dependency_graph.mmd
│ ├── dependency_graph.png
│ ├── dependency_graph_wo_labels.dot
│ ├── dependency_graph_wo_labels.mmd
│ ├── dependency_graph_wo_labels.png
│ ├── dependency_graph_wo_orphan_operations.dot
│ ├── dependency_graph_wo_orphan_operations.mmd
│ ├── dependency_graph_wo_orphan_operations.png
│ ├── dependency_graph_wo_orphan_operations_wo_labels.dot
│ ├── dependency_graph_wo_orphan_operations_wo_labels.mmd
│ ├── dependency_graph_wo_orphan_operations_wo_labels.png
│ ├── metadata_dependency_graph.json
│ ├── metadata_process_dependency_graph.json
│ ├── metadata_specification_graph.json
│ ├── process_dependency_graph.dot
│ ├── process_dependency_graph.json
│ ├── process_dependency_graph.mmd
│ ├── process_dependency_graph.png
│ ├── specification_graph.dot
│ ├── specification_graph.json
│ ├── specification_graph.mmd
│ ├── specification_graph.png
│ ├── specification_graph_wo_labels.dot
│ ├── specification_graph_wo_labels.mmd
│ ├── specification_graph_wo_labels.png
│ ├── specification_wo_orphan_operations.dot
│ ├── specification_wo_orphan_operations.mmd
│ ├── specification_wo_orphan_operations.png
│ ├── specification_wo_orphan_operations_wo_labels.dot
│ ├── specification_wo_orphan_operations_wo_labels.mmd
│ └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata.json
- The
ro-crate-metadata.json
describes the workflow following an extended Workflow RO-Crate profile. The description of this extended profile can be found here. - the
debug
folder contains different intermediary files which are ussefull for debugging - the
graphs
folder contains the different graphs which are generated. For each of the 3 graphs described above, BioFlow-Insight generates :- A
json
file which describes the graph using BioFlow-Insight specific format - A
json
file which describes the metadata which are extracted from the graph - Where possible BioFlow-Insight also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.
- A
For each graph BioFlow-Insight generates it in the
mermaid
format and the dotdot
format. If therender_graphs
option is set toTrue
, thepng
image is also generated.
Here are some of the graphs which are generated by BioFlow-Insight, they are rendered using Graphviz (png).
Specification Graph | Dependency Graph | Process Dependency Graph |
Citing BioFlow-Insight
Please cite BioFlow-Insight in any research that uses or extends BioFlow-Insight.
To cite BioFlow-Insight, please use the following publication:
George Marchment, Bryan Brancotte, Marie Schmit, Frédéric Lemoine, Sarah Cohen-Boulakia, BioFlow-Insight: facilitating reuse of Nextflow workflows with structure reconstruction and visualization, NAR Genomics and Bioinformatics, Volume 6, Issue 3, September 2024, lqae092, https://doi.org/10.1093/nargab/lqae092
License
This project is licensed under the GNU Affero General Public License.
Funding
This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bioflow_insight-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c06275bd1f8dfa36094e7958b266f7ef8ba444e6dee1878add5f1bea1f734b32 |
|
MD5 | e21eaae8822e1648113e1e79a33de300 |
|
BLAKE2b-256 | a32c140afcb08ec3ef2a345d40ded937d48f72e40a69748842bddb796fec45a7 |