Skip to main content

A software to extract and analyze the structure and associated metadata from a Nextflow workflow.

Project description

BioFlow-Insight

License: GPL v3 Version 1.0 Zenodo doi badge

Description

BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.

BioFlow-Insight is easily installable as a CLI (see here). It is also freely accessible as a free web service. For more information and to start using BioFlow-Insight, visit here (https://bioflow-insight.pasteur.cloud/).

Table of Contents

Installation

Installing via pip

BioFlow-Insight is easily installable as a CLI.

To install it using pip, use the following command :

pip install bioflow-insight

Using from source

To access its source code, simply clone its GitLab repository. BioFlow-Insight is developed using Python 3

BioFlow-Insight's dependencies are given in the requirements.txt file.

Note : To install graphviz, in linux you might need to execute this command sudo apt install graphviz

Usage

BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.

For an explanation of the different elements composing a Nextflow workflow, see its documentation.

The 3 different graphs generated by BioFlow-Insight are :

  • Specification graph: BioFlow-Insight reconstructs the workflow’s specification graph from its source code without having to execute it. The specification graph is defined as a directed graph where nodes are processes and operations, and edges are channels that are directed from one vertex to another (steps of the workflow are ordered). This graph represents all the possible interactions between processes and operations through channels that are defined in the workflow code. Within the specification graph, we define two types of operations: operations are categorised in two groups: the following operations defined as operations that have at least one input, and the starting operations defined as operations without any inputs.

  • Dependency graph: From the specification graph, BioFlow-Insight also generates the dependency graph which represents starting operations, along with processes (as nodes) and their dependencies (edges). This graph is obtained by removing the following operations and linking the remaining elements if a path exists between them in the original specification graph. In this representation, the edges no longer represent interaction between its elements, but their dependencies.

  • Process dependency graph: Finally BioFlow-Insight also generates the process dependency graph which represents only processes (nodes) and their dependencies (edges). Similar to the dependency graph, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph. Again in this representation, the edges no longer represent interaction between its elements, but their dependencies.

For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage here (https://bioflow-insight.pasteur.cloud/specification/).

To examplify BioFlow-Insight utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found here). Examples of the output are given below.

Input

In this example, we are going to use the BioFlow-Insight tool to analyse the rna-seq workflow. After installing BioFlow-Insight via pip, and cloning the the rnaseq-nf repository. Simply run this command line :

bioflow-insight rnaseq-nf/main.nf

Output

After the workflow has been analysed and the graphs generated, the outputs are saved in the results folder.

The structure of this folder is organised as such :

.
├── debug
│   ├── calls.nf
│   ├── operations_in_call.nf
│   └── operations.nf
├── graphs
│   ├── dependency_graph.dot
│   ├── dependency_graph.json
│   ├── dependency_graph.mmd
│   ├── dependency_graph.png
│   ├── dependency_graph_wo_labels.dot
│   ├── dependency_graph_wo_labels.mmd
│   ├── dependency_graph_wo_labels.png
│   ├── dependency_graph_wo_orphan_operations.dot
│   ├── dependency_graph_wo_orphan_operations.mmd
│   ├── dependency_graph_wo_orphan_operations.png
│   ├── dependency_graph_wo_orphan_operations_wo_labels.dot
│   ├── dependency_graph_wo_orphan_operations_wo_labels.mmd
│   ├── dependency_graph_wo_orphan_operations_wo_labels.png
│   ├── metadata_dependency_graph.json
│   ├── metadata_process_dependency_graph.json
│   ├── metadata_specification_graph.json
│   ├── process_dependency_graph.dot
│   ├── process_dependency_graph.json
│   ├── process_dependency_graph.mmd
│   ├── process_dependency_graph.png
│   ├── specification_graph.dot
│   ├── specification_graph.json
│   ├── specification_graph.mmd
│   ├── specification_graph.png
│   ├── specification_graph_wo_labels.dot
│   ├── specification_graph_wo_labels.mmd
│   ├── specification_graph_wo_labels.png
│   ├── specification_wo_orphan_operations.dot
│   ├── specification_wo_orphan_operations.mmd
│   ├── specification_wo_orphan_operations.png
│   ├── specification_wo_orphan_operations_wo_labels.dot
│   ├── specification_wo_orphan_operations_wo_labels.mmd
│   └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata-rnaseq-nf.json
  • The ro-crate-metadata-rnaseq-nf.json describes the workflow following an extended Workflow RO-Crate profile. The description of this extended profile can be found here.
  • the debug folder contains different intermediary files which are ussefull for debugging
  • the graphs folder contains the different graphs which are generated. For each of the 3 graphs described above, BioFlow-Insight generates :
    • A json file which describes the graph using BioFlow-Insight specific format
    • A json file which describes the metadata which are extracted from the graph
    • Where possible BioFlow-Insight also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.

For each graph BioFlow-Insight generates it in the mermaid format and the dot dot format. If the render_graphs option is set to True, the png image is also generated.

Here are some of the graphs which are generated by BioFlow-Insight, they are rendered using Graphviz (png).

Specification Graph Dependency Graph Process Dependency Graph

License

This project is licensed under the GNU Affero General Public License.

Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.







Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioflow-insight-1.0.tar.gz (68.1 kB view hashes)

Uploaded Source

Built Distribution

bioflow_insight-1.0-py3-none-any.whl (74.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page