Skip to main content

Multi-omics variational autoencoder

Project description

MOVE (Multi-Omics Variational autoEncoder)

PyPI version Documentation Status

The code in this repository can be used to run our Multi-Omics Variational autoEncoder (MOVE) framework for integration of omics and clinical variabels spanning both categorial and continuous data. Our approach includes training ensemble VAE models and using in silico perturbation experiments to identify cross omics associations. The manuscript has been published in Nature Biotechnology:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of drug–omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x

We developed the method based on a Type 2 Diabetes cohort from the IMI DIRECT project containing 789 newly diagnosed T2D patients. The cohort and data creation is described in Koivula et al. and Wesolowska-Andersen et al.. For the analysis we included the following data:

Multi-omics data sets:

Genomics
Transcriptomics
Proteomics
Metabolomics
Metagenomics

Other data sets:

Clinical data (blood measurements, imaging data, ...)
Questionnaire data (diet etc)
Accelerometer data
Medication data

Installation

Installing MOVE package

MOVE is written in Python and can be installed using pip:

>>> pip install move-dl

Requirements

MOVE should run on any environmnet where Python is available. The variational autoencoder architecture is implemented in PyTorch.

The training of the VAEs can be done using CPUs only or GPU acceleration. If you do not have powerful GPUs available, it is possible to run using only CPUs. For instance, the tutorial data set consisting of simulated drug, metabolomics and proteomics data for 500 individuals runs fine on a standard macbook.

Note: The pip installation of move-dl does not setup your local GPU automatically

The MOVE pipeline

MOVE has five-six steps:

01. Encode the data into a format that can be read by MOVE
02. Finding the right architecture of the network focusing on reconstruction accuracy
03. Finding the right architecture of the network focusing on stability of the model
04. Use model, determined from steps 02-03, to create and analyze the latent space
05. Identify associations between a categorical and continuous datasets
05a. Using an ensemble of VAEs with the t-test approach
05b. Using an ensemble of VAEs with the Bayesian decision theory approach
06. If both 5a and 5b were run select the overlap between them

How to run MOVE

Please refer to our documentation for examples and tutorials on how to run MOVE.

Additionally, you can copy this notebook and follow its instructions to get familiar with our pipeline.

Data sets

DIRECT data set

The data used in notebooks are not available for testing due to the informed consent given by study participants, the various national ethical approvals for the study, and the European General Data Protection Regulation (GDPR). Therefore, individual-level clinical and omics data cannot be transferred from the centralized IMI-DIRECT repository. Requests for access to summary statistics IMI-DIRECT data, including those presented here, can be made to DIRECTdataaccess@Dundee.ac.uk. Requesters will be informed on how summary-level data can be accessed via the DIRECT secure analysis platform following submission of appropriate application. The IMI-DIRECT data access policy is available here.

Simulated and publicaly available data sets

We have therefore provided two datasets to test the workflow: a simulated dataset and a publicly-available maize rhizosphere microbiome data set.

Citation

To cite MOVE, use the following information:

Allesøe, R.L., Lundgaard, A.T., Hernández Medina, R. et al. Discovery of drug–omics associations in type 2 diabetes with generative deep-learning models. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-022-01520-x

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

move_dl-1.5.0.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

move_dl-1.5.0-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file move_dl-1.5.0.tar.gz.

File metadata

  • Download URL: move_dl-1.5.0.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.18

File hashes

Hashes for move_dl-1.5.0.tar.gz
Algorithm Hash digest
SHA256 f3f3f603e3db4e88f5cbd29c61fbc4397d5a4fb485a65aae1cb2ec4b9c9b1f5c
MD5 ffa9ca923970e76742aed8763e879134
BLAKE2b-256 bcdac1b6cffbe3f9b16d85d404988eba56557d98be68c288e90731463a9f35fd

See more details on using hashes here.

File details

Details for the file move_dl-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: move_dl-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.18

File hashes

Hashes for move_dl-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0913f6c56494d54dfbbd78b4a3917eb4ae34757ea58cfafaa1565bf747c08d57
MD5 20f0c164692d9fd84a8308392d7c5656
BLAKE2b-256 687d10ba54d092674a88e1923b065f6cf28f431a8ad3192de08592843b03fbd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page