Skip to main content

A software to extract and analyze the structure and associated metadata from a Nextflow workflow.

Project description

BioFlow-Insight

License: GPL v3 Version 1.0

Description

BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.

BioFlow-Insight is easily installable as a CLI (see here). It is also freely accessible as a free web service. For more information and to start using BioFlow-Insight, visit here (https://bioflow-insight.pasteur.cloud/).

Table of Contents

Installation

Installing via pip

BioFlow-Insight is easily installable as a CLI.

To install it using pip, use the following command :

pip install bioflow-insight

Using from source

To access its source code, simply clone its GitLab repository. BioFlow-Insight is developed using Python 3

BioFlow-Insight's dependencies are given in the requirements.txt file.

Note : To install graphviz, in linux you might need to execute this command sudo apt install graphviz

Usage

BioFlow-Insight is a Python-based open-source command-line tool designed to automatically analyse Nextflow workflow code, gathering useful information, particularly in the form of visual graphs that illustrate the workflow's structure and its various steps. Additionally, it is capable of detecting certain programming errors and generates a RO-Crate JSON-LD file that describes the workflow.

For an explanation of the different elements composing a Nextflow workflow, see its documentation.

The 3 different graphs generated by BioFlow-Insight are :

  • Specification graph: BioFlow-Insight reconstructs the workflow’s specification graph from its source code without having to execute it. The specification graph is defined as a directed graph where nodes are processes and operations, and edges are channels that are directed from one vertex to another (steps of the workflow are ordered). This graph represents all the possible interactions between processes and operations through channels that are defined in the workflow code. Within the specification graph, we define two types of operations: operations are categorised in two groups: the following operations defined as operations that have at least one input, and the starting operations defined as operations without any inputs.

  • Dependency graph: From the specification graph, BioFlow-Insight also generates the dependency graph which represents starting operations, along with processes (as nodes) and their dependencies (edges). This graph is obtained by removing the following operations and linking the remaining elements if a path exists between them in the original specification graph. In this representation, the edges no longer represent interaction between its elements, but their dependencies.

  • Process dependency graph: Finally BioFlow-Insight also generates the process dependency graph which represents only processes (nodes) and their dependencies (edges). Similar to the dependency graph, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph. Again in this representation, the edges no longer represent interaction between its elements, but their dependencies.

For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage here (https://bioflow-insight.pasteur.cloud/specification/).

To examplify BioFlow-Insight utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found here). Examples of the output are given below.

Input

In this example, we are going to use the BioFlow-Insight tool to analyse the rna-seq workflow. After installing BioFlow-Insight via pip, and cloning the the rnaseq-nf repository. Simply run this command line :

bioflow-insight rnaseq-nf/main.nf

Output

After the workflow has been analysed and the graphs generated, the outputs are saved in the results folder.

The structure of this folder is organised as such :

.
├── debug
│   ├── calls.nf
│   ├── operations_in_call.nf
│   └── operations.nf
├── graphs
│   ├── dependency_graph.dot
│   ├── dependency_graph.json
│   ├── dependency_graph.mmd
│   ├── dependency_graph.png
│   ├── dependency_graph_wo_labels.dot
│   ├── dependency_graph_wo_labels.mmd
│   ├── dependency_graph_wo_labels.png
│   ├── dependency_graph_wo_orphan_operations.dot
│   ├── dependency_graph_wo_orphan_operations.mmd
│   ├── dependency_graph_wo_orphan_operations.png
│   ├── dependency_graph_wo_orphan_operations_wo_labels.dot
│   ├── dependency_graph_wo_orphan_operations_wo_labels.mmd
│   ├── dependency_graph_wo_orphan_operations_wo_labels.png
│   ├── metadata_dependency_graph.json
│   ├── metadata_process_dependency_graph.json
│   ├── metadata_specification_graph.json
│   ├── process_dependency_graph.dot
│   ├── process_dependency_graph.json
│   ├── process_dependency_graph.mmd
│   ├── process_dependency_graph.png
│   ├── specification_graph.dot
│   ├── specification_graph.json
│   ├── specification_graph.mmd
│   ├── specification_graph.png
│   ├── specification_graph_wo_labels.dot
│   ├── specification_graph_wo_labels.mmd
│   ├── specification_graph_wo_labels.png
│   ├── specification_wo_orphan_operations.dot
│   ├── specification_wo_orphan_operations.mmd
│   ├── specification_wo_orphan_operations.png
│   ├── specification_wo_orphan_operations_wo_labels.dot
│   ├── specification_wo_orphan_operations_wo_labels.mmd
│   └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata.json
  • The ro-crate-metadata.json describes the workflow following an extended Workflow RO-Crate profile. The description of this extended profile can be found here.
  • the debug folder contains different intermediary files which are ussefull for debugging
  • the graphs folder contains the different graphs which are generated. For each of the 3 graphs described above, BioFlow-Insight generates :
    • A json file which describes the graph using BioFlow-Insight specific format
    • A json file which describes the metadata which are extracted from the graph
    • Where possible BioFlow-Insight also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.

For each graph BioFlow-Insight generates it in the mermaid format and the dot dot format. If the render_graphs option is set to True, the png image is also generated.

Here are some of the graphs which are generated by BioFlow-Insight, they are rendered using Graphviz (png).

Specification Graph Dependency Graph Process Dependency Graph

Citing BioFlow-Insight

Please cite BioFlow-Insight in any research that uses or extends BioFlow-Insight.

To cite BioFlow-Insight, please use the following publication:

George Marchment, Bryan Brancotte, Marie Schmit, Frédéric Lemoine, Sarah Cohen-Boulakia, BioFlow-Insight: facilitating reuse of Nextflow workflows with structure reconstruction and visualization, NAR Genomics and Bioinformatics, Volume 6, Issue 3, September 2024, lqae092, https://doi.org/10.1093/nargab/lqae092

License

This project is licensed under the GNU Affero General Public License.

Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.







Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioflow_insight-1.0.5.tar.gz (73.6 kB view details)

Uploaded Source

Built Distribution

bioflow_insight-1.0.5-py3-none-any.whl (80.2 kB view details)

Uploaded Python 3

File details

Details for the file bioflow_insight-1.0.5.tar.gz.

File metadata

  • Download URL: bioflow_insight-1.0.5.tar.gz
  • Upload date:
  • Size: 73.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for bioflow_insight-1.0.5.tar.gz
Algorithm Hash digest
SHA256 ade82b7b2c522c1433a39c3589e5ac4c04bb0162644893b692a695991ab36e07
MD5 cdb56d57e68bc1dc0d8832540852f731
BLAKE2b-256 d9e6b1a04b0bdacc25a94d1df5bea19531d1ebd08fe61d999b069bef2d08f9aa

See more details on using hashes here.

File details

Details for the file bioflow_insight-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for bioflow_insight-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1b5c09e4ce45a76e79258a2d17633544530b834855647dbd846084fe76b0f363
MD5 0177fbc8fdae5ad677053794d88e5344
BLAKE2b-256 f73e39e6eb4cbe06fc0d14e6a546f8ae4b4da7d85c5e7506576b771cbef3dc81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page