A deep-learning based multi-omics bulk sequencing data integration suite with a focus on (pre-)clinical endpoint prediction.
Project description
flexynesis
A deep-learning based multi-omics bulk sequencing data integration suite with a focus on (pre-)clinical endpoint prediction. The package includes multiple types of deep learning architectures such as simple fully connected networks, supervised variational autoencoders; different options of data layer fusion, and automates feature selection and hyperparameter optimisation. The tools are continuosly benchmarked on publicly available datasets mostly related to the study of cancer. Some of the applications of the methods we develop are drug response modeling in cancer patients or preclinical models (such as cell lines and patient-derived xenografts), cancer subtype prediction, or any other clinically relevant outcome prediction that can be formulated as a regression or classification problem.
Documentation
A detailed documentation of classes and functions in this repository can be found here.
Benchmarks
For the latest benchmark results see: https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dashboard.html
The code for the benchmarking pipeline is at: https://github.com/BIMSBbioinfo/flexynesis-benchmarks
Quick Start
Install
# create an environment with python 3.11
conda create --name flexynesisenv python==3.11
conda activate flexynesisenv
# install latest version from pypi (https://pypi.org/project/flexynesis)
# make sure to use python3.11*
python -m pip install flexynesis --upgrade
Options
For a full set of command-line options:
flexynesis -h
Test the installation
Download a dataset and test the flexynesis installation on a test run.
curl -L -o dataset1.tgz \
https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis-benchmark-datasets/dataset1.tgz
tar -xzvf dataset1.tgz
conda activate flexynesisenv
flexynesis --data_path dataset1 \
--model_class DirectPred \
--target_variables Erlotinib \
--fusion_type early \
--hpo_iter 1 \
--features_min 50 \
--features_top_percentile 5 \
--log_transform False \
--data_types gex,cnv \
--outdir . \
--prefix erlotinib_direct \
--early_stop_patience 3 \
--use_loss_weighting False \
--evaluate_baseline_performance
Accelerating with GPUs
If you have access to GPUs on your system, they can be used to accelerate the training of models.
However, making GPUs accessible to torch
is system-specific. Please contact your system administrator
to make sure you have accessible GPUs and methods to access them.
With Slurm
If you have [Slurm Workload Manager] in your system, you can call flexynesis
as follows:
conda activate flexynesisenv
srun --gpus=1 --pty flexynesis --use_gpu ...otherarguments
GridEngine
If you have an HPC sytem running GridEngine with GPU nodes, you may be allowed to request a node with GPUs. The important thing here is to request a GPU node with the proper CUDA version installed on it.
# request 1 GPU device node with CUDA version 12
qrsh -l gpu=1,cuda12
# activate your environment
conda activate flexynesisenv
flexynesis --use_gpu ...otherarguments
Input Dataset Structure
InputFolder/
| -- train
| |-- omics1.csv
| |-- omics2.csv
| |-- ...
| |-- clin.csv
| -- test
| |-- omics1.csv
| |-- omics2.csv
| |-- ...
| |-- clin.csv
File contents
clin.csv
clin.csv
contains the sample metadata. The first column contains unique sample identifiers.
The other columns contain sample-associated clinical variables.
NA
values are allowed in the clinical variables.
v1,v2
s1,a,b
s2,c,d
s3,e,f
omics.csv
The first column of the feature tables must be unique feature identifiers (e.g. gene names).
The column names must be sample identifiers that should overlap with those in the clin.csv
.
They don't have to be completely identical or in the same order. Samples from the clin.csv
that are not represented
in the omics table will be dropped.
s1,s2,s3
g1,0,1,2
g2,3,3,5
g3,2,3,4
Concordance between train/test splits
The corresponding omics files in train/test splits must contain overlapping feature names (they don't
have to be identical or in the same order).
The clin.csv
files in train/test must contain matching clinical variables.
Guix
You can also create a reproducible development environment or build a reproducible package of Flexynesis with GNU Guix. You will need at least the Guix channels listed in channels.scm
. It also helps to have authorized the Inria substitute server to get binaries for CUDA-enabled packages. See this page for instructions on how to configure fetching binary substitutes from the build servers.
You can build a Guix package from the current committed state of your git checkout and using the specified state of Guix like this:
guix time-machine -C channels.scm -- \
build --no-grafts -f guix.scm
To enter an environment containing just Flexynesis:
guix time-machine -C channels.scm -- \
shell --no-grafts -f guix.scm
To enter a development environment to hack on Flexynesis:
guix time-machine -C channels.scm -- \
shell --no-grafts -Df guix.scm
Do this to build a Docker image containing this package together with a matching Python installation:
guix time-machine -C channels.scm -- \
pack -C none \
-e '(load "guix.scm")' \
-f docker \
-S /bin=bin -S /lib=lib -S /share=share \
glibc-locales coreutils bash python
Defining Kernel for Jupyter Notebook
For interactively using flexynesis on Jupyter notebooks, one can define the kernel to make flexynesis and its dependencies available on the jupyter session.
Assuming you have already defined an environment and installed the package:
conda activate flexynesisenv
python -m ipykernel install --user --name "flexynesisenv" --display-name "flexynesisenv"
Compiling Notebooks
papermill
can be used to compile the tutorials under examples/tutorials
.
If the purpose is to quickly check if the notebook can be run; set HPO_ITER to 1. This sets hyperparameter optimisation steps to 1. For longer training runs to see more meaningful results from the notebook, increase this number to e.g. 50.
Example:
papermill examples/tutorials/brca_subtypes.ipynb brca_subtypes.ipynb -p HPO_ITER 1
The output from papermill can be converted to an html file as follows:
jupyter nbconvert --to html brca_subtypes.ipynb
Documentation
Documentation generated using mkdocs
pip install mkdocstrings[python]
mkdocs build --clean
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for flexynesis-0.2.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0af5e4fe4b87909cda1b1650d0f3772499157ee8439f50d43d9c55e4e3fc61d5 |
|
MD5 | 0494045170080bcb5a28e57d958f2078 |
|
BLAKE2b-256 | d6b7e7cf11717301dfd9049e862b185ffba6c23e25b38eea1226eddca66879c6 |