Skip to main content

Multilayer networks for biological multimodal data fusion and analysis.

Project description

BioFusion

A tool for multimodal biological data integration and analysis with the help of multilayer networks.

This repository contains code developed during collaboration between Fujitsu Research of Europe and Barcelona Supercomputing Center.

Installation

You can install package from PyPI:

pip install biofusion

For developers, to install the last version of the package please run the command:

pip install -e .

from the package roor directory.

End-to-end example

1. Set up the project

1.1. Install uv package manager

Follow instructions from here. For Linux/MacOS the command is:

curl -LsSf https://astral.sh/uv/install.sh | sh

1.2. Create project dir and corresponding Python environment

mkdir biofusion-demo
cd biofusion-demo
uv venv --python=3.12.9

The last command has created .venv folder with local Python environment. Let’s activate it:

source .venv/bin/activate

Let’s install biofusion package:

uv pip install biofusion

2. Create the data files

2.1. Create the data folder

mkdir data

2.2. Populate the data folder

In the root of the project create the notebook (e.g. 01_demo.ipynb). Open notebook in your favorite IDE (e.g. VS Code) and select the Jupyter kernel from the environment that we created before. After this we are ready to generate some synthetic data to check the community detection algorithms. In the notebook enter and run the following cells:

from BioFusion.utils import generate_and_save_graphs
# each layer/graph is described by the tuple of parameters
# first tuple element is the number of unique nodes, second is a probability of the
# edge in between two random nodes and third is the label string
graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]
# all generated graps will be stored in the dir below in the format `1.csv`, ... `<N>.csv`, wheree <N> is the number of tuples in the list `graph_params`
path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)

2.3. Create the output folder

Folder to store the reesults of the analysis:

mkdir out

After running commands in this section the files in the project will be created:

biofusion-demo$ tree
.
├── 01_demo.ipynb
├── data
│   ├── 1.csv
│   ├── 2.csv
│   ├── 3.csv
│   └── 4.csv
└── out

3. Run community detection

Import required dependencies:

import os
from BioFusion.cmmd import cmmd

Define the layers of multiayer network:

prefix = "./data/"
input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
# sort the input layers, os ignores the alphanumeric order of the files
input_layers.sort()

Define parameters of the community detection algorithm:

gamma_min = 0
gamma_max = 10
gamma_step = 0.5
path_to_communities = "./out/"

Run the community detection algorithm:

cmmd_output = cmmd(
    nodelist = None,
    input_layers = input_layers,
    gamma_min = gamma_min,
    gamma_max = gamma_max,
    gamma_step = gamma_step,
    path_to_communities = path_to_communities,
    distmethod = "hamming")

Output of the algorithm is sotred in the ./out folder.

The whole script:

import os
from BioFusion.utils import generate_and_save_graphs
from BioFusion.cmmd import cmmd

graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]

path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)
prefix = "./data/"

input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
input_layers.sort()

gamma_min = 0
gamma_max = 10
gamma_step = 0.5

path_to_communities = "./out/"

cmmd_output = cmmd(
    nodelist = None,
    input_layers = input_layers,
    gamma_min = gamma_min,
    gamma_max = gamma_max,
    gamma_step = gamma_step,
    path_to_communities = path_to_communities,
    distmethod = "hamming")

Organisation

The directory structure is as follows:

.
|-- data
|   |-- GeneCelltypes
|   |   |-- gene_celltypes_all_common.txt
|   |   |-- gene_celltypes_all_common_cnv.txt
|   |   |-- gene_celltypes_all_common_rna.txt
|   |   |-- gene_celltypes_all_unique.txt
|   |   |-- gene_celltypes_all_unique_cnv.txt
|   |   `-- gene_celltypes_all_unique_rna.txt
|   |-- MultilayerCommunities
|   |   |-- <BSC-community-trajectories.tsv>
|   |   `-- <BSC-distance-matrix.tsv>
|   |-- MultilayerGraphs
|   |   |-- <BSC-MLN-layer-1.json>
|   |   |-- :
|   |   `-- <BSC-MLN-layer-5.json>
|   |-- TCGA_BRCA_Dic_Hover_files
|   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d.pt
|   |-- TopGenesWSI
|   |   |-- common_genes
|   |   |   |-- box_level
|   |   |   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d
|   |   |   |       `-- stats.csv
|   |   |   `-- wsi_level
|   |   `-- unique_genes
|   |       |-- box_level
|   |       `-- wsi_level
|   |-- cnv.csv
|   `-- rna.csv
|-- outputs
|   |-- TCGA_BRCA_spatial
|   |-- TCGA_Gene_Graphs
|   `-- TopGenesMLN
|-- scripts
|   |-- create_gene_graph.py
|   |-- create_gene_list.py
|   |-- get_WSI_celltype_weights.py
|   `-- get_WSI_gene_info.py
|-- README.md
`-- requirements.txt

Usage

The Python scripts can be run from the /scripts directory after installing all necessary Python modules as listed in requirements.txt.

The following scripts are provided:

create_gene_list.py - Description: This script finds the set of genes that are common between the MLN and the genomic data (CNV or RNA). Files in the folder that have suffix “_cnv” and “_rna” are generated using this script. - Input: /data/GeneCelltypes, /data/cnv.csv - Output: /data/GeneCelltypes

get_WSI_gene_info.py - This script/module reads top genes from WSI patches and retrieves gene associations and significant neighbourhood communities from multilayer network. - Input: /data/TopGenesWSI - Output: /outputs/TopGenesMLN

get_WSI_celltype_weights.py - This script takes WSI Graphs (where patches correspond to groups of nodes), gene celltype associations, and bulk-RNA data, and produces heatmaps of approximated spatial gene expression. - Input: /data/TCGA_BRCA_Dic_Hover_files, /data/GeneCelltypes, /data/rna.csv - Output: /outputs/TCGA_BRCA_spatial

create_gene_graph.py - Description: This script takes the genomic data (CNV or RNA) and MLN graphs (along with computes Louvain community based Hamming distance matrix) and generates a hierarchical clustering based similarity matrix for the genes and a gene graph with edge attributes reflecting the gene-gene similarities. - Input: /data/cnv.csv, /data/MultilayerGraphs, /dataa/MultilayerCommunities - Output: /outputs/TCGA_Gene_Graphs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biofusion-0.0.6.tar.gz (113.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biofusion-0.0.6-py3-none-any.whl (113.0 kB view details)

Uploaded Python 3

File details

Details for the file biofusion-0.0.6.tar.gz.

File metadata

  • Download URL: biofusion-0.0.6.tar.gz
  • Upload date:
  • Size: 113.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for biofusion-0.0.6.tar.gz
Algorithm Hash digest
SHA256 abae6e55901b0ff8be624a3cf982f178d185705fe0771c1709ae1d581c495b1c
MD5 1edbbacda2d6dc49120021c7ffe5dff7
BLAKE2b-256 62746a368b99ded0b779b7745f088407d9c49a0f53c14c2b98f151b0da19685c

See more details on using hashes here.

File details

Details for the file biofusion-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: biofusion-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 113.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for biofusion-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4b9141d35b04626e6a95e4337ee2ea030c2674ae082115ac8a937f2585c00f54
MD5 172f850ace219884687e8e068b667035
BLAKE2b-256 316382e052383373f5b1f6d5c3b9e65bb044a9e961125e2f4ef4d16fd6c87a90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page