Multilayer networks for biological multimodal data fusion and analysis.
Project description
BioFusion
A tool for multimodal biological data integration and analysis with the help of multilayer networks.
This repository contains code developed during collaboration between Fujitsu Research of Europe and Barcelona Supercomputing Center.
Installation
You can install package from PyPI:
pip install biofusion
For developers, to install the last version of the package please run the command:
pip install -e .
from the package roor directory.
End-to-end example
1. Set up the project
1.1. Install uv package manager
Follow instructions from here. For Linux/MacOS the command is:
curl -LsSf https://astral.sh/uv/install.sh | sh
1.2. Create project dir and corresponding Python environment
mkdir biofusion-demo
cd biofusion-demo
uv venv --python=3.12.9
The last command has created .venv folder with local Python
environment. Let’s activate it:
source .venv/bin/activate
Let’s install biofusion package:
uv pip install biofusion
2. Create the data files
2.1. Create the data folder
mkdir data
2.2. Populate the data folder
In the root of the project create the notebook (e.g. 01_demo.ipynb).
Open notebook in your favorite IDE (e.g. VS Code) and select the Jupyter
kernel from the environment that we created before. After this we are
ready to generate some synthetic data to check the community detection
algorithms. In the notebook enter and run the following cells:
from BioFusion.utils import generate_and_save_graphs
# each layer/graph is described by the tuple of parameters
# first tuple element is the number of unique nodes, second is a probability of the
# edge in between two random nodes and third is the label string
graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]
# all generated graps will be stored in the dir below in the format `1.csv`, ... `<N>.csv`, wheree <N> is the number of tuples in the list `graph_params`
path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)
2.3. Create the output folder
Folder to store the reesults of the analysis:
mkdir out
After running commands in this section the files in the project will be created:
biofusion-demo$ tree
.
├── 01_demo.ipynb
├── data
│ ├── 1.csv
│ ├── 2.csv
│ ├── 3.csv
│ └── 4.csv
└── out
3. Run community detection
Import required dependencies:
import os
from BioFusion.cmmd import cmmd
Define the layers of multiayer network:
prefix = "./data/"
input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
# sort the input layers, os ignores the alphanumeric order of the files
input_layers.sort()
Define parameters of the community detection algorithm:
gamma_min = 0
gamma_max = 10
gamma_step = 0.5
path_to_communities = "./out/"
Run the community detection algorithm:
cmmd_output = cmmd(
nodelist = None,
input_layers = input_layers,
gamma_min = gamma_min,
gamma_max = gamma_max,
gamma_step = gamma_step,
path_to_communities = path_to_communities,
distmethod = "hamming")
Output of the algorithm is sotred in the ./out folder.
The whole script:
import os
from BioFusion.utils import generate_and_save_graphs
from BioFusion.cmmd import cmmd
graph_params = [(300, 0.2, ""), (500, 0.2, ""), (400, 0.2, ""), (300, 0.4, "")]
path_dir_to = "./data/"
generate_and_save_graphs(graph_params, path_dir_to)
prefix = "./data/"
input_layers = [prefix + x for x in os.listdir(prefix) if x.endswith(".csv")]
input_layers.sort()
gamma_min = 0
gamma_max = 10
gamma_step = 0.5
path_to_communities = "./out/"
cmmd_output = cmmd(
nodelist = None,
input_layers = input_layers,
gamma_min = gamma_min,
gamma_max = gamma_max,
gamma_step = gamma_step,
path_to_communities = path_to_communities,
distmethod = "hamming")
Organisation
The directory structure is as follows:
.
|-- data
| |-- GeneCelltypes
| | |-- gene_celltypes_all_common.txt
| | |-- gene_celltypes_all_common_cnv.txt
| | |-- gene_celltypes_all_common_rna.txt
| | |-- gene_celltypes_all_unique.txt
| | |-- gene_celltypes_all_unique_cnv.txt
| | `-- gene_celltypes_all_unique_rna.txt
| |-- MultilayerCommunities
| | |-- <BSC-community-trajectories.tsv>
| | `-- <BSC-distance-matrix.tsv>
| |-- MultilayerGraphs
| | |-- <BSC-MLN-layer-1.json>
| | |-- :
| | `-- <BSC-MLN-layer-5.json>
| |-- TCGA_BRCA_Dic_Hover_files
| | `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d.pt
| |-- TopGenesWSI
| | |-- common_genes
| | | |-- box_level
| | | | `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d
| | | | `-- stats.csv
| | | `-- wsi_level
| | `-- unique_genes
| | |-- box_level
| | `-- wsi_level
| |-- cnv.csv
| `-- rna.csv
|-- outputs
| |-- TCGA_BRCA_spatial
| |-- TCGA_Gene_Graphs
| `-- TopGenesMLN
|-- scripts
| |-- create_gene_graph.py
| |-- create_gene_list.py
| |-- get_WSI_celltype_weights.py
| `-- get_WSI_gene_info.py
|-- README.md
`-- requirements.txt
Usage
The Python scripts can be run from the /scripts directory after
installing all necessary Python modules as listed in requirements.txt.
The following scripts are provided:
create_gene_list.py - Description: This script finds the set of genes
that are common between the MLN and the genomic data (CNV or RNA). Files
in the folder that have suffix “_cnv” and “_rna” are generated using
this script. - Input: /data/GeneCelltypes, /data/cnv.csv - Output:
/data/GeneCelltypes
get_WSI_gene_info.py - This script/module reads top genes from WSI
patches and retrieves gene associations and significant neighbourhood
communities from multilayer network. - Input: /data/TopGenesWSI -
Output: /outputs/TopGenesMLN
get_WSI_celltype_weights.py - This script takes WSI Graphs (where
patches correspond to groups of nodes), gene celltype associations, and
bulk-RNA data, and produces heatmaps of approximated spatial gene
expression. - Input: /data/TCGA_BRCA_Dic_Hover_files,
/data/GeneCelltypes, /data/rna.csv - Output: /outputs/TCGA_BRCA_spatial
create_gene_graph.py - Description: This script takes the genomic data
(CNV or RNA) and MLN graphs (along with computes Louvain community based
Hamming distance matrix) and generates a hierarchical clustering based
similarity matrix for the genes and a gene graph with edge attributes
reflecting the gene-gene similarities. - Input: /data/cnv.csv,
/data/MultilayerGraphs, /dataa/MultilayerCommunities - Output:
/outputs/TCGA_Gene_Graphs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biofusion-0.0.6.tar.gz.
File metadata
- Download URL: biofusion-0.0.6.tar.gz
- Upload date:
- Size: 113.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abae6e55901b0ff8be624a3cf982f178d185705fe0771c1709ae1d581c495b1c
|
|
| MD5 |
1edbbacda2d6dc49120021c7ffe5dff7
|
|
| BLAKE2b-256 |
62746a368b99ded0b779b7745f088407d9c49a0f53c14c2b98f151b0da19685c
|
File details
Details for the file biofusion-0.0.6-py3-none-any.whl.
File metadata
- Download URL: biofusion-0.0.6-py3-none-any.whl
- Upload date:
- Size: 113.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b9141d35b04626e6a95e4337ee2ea030c2674ae082115ac8a937f2585c00f54
|
|
| MD5 |
172f850ace219884687e8e068b667035
|
|
| BLAKE2b-256 |
316382e052383373f5b1f6d5c3b9e65bb044a9e961125e2f4ef4d16fd6c87a90
|