Skip to main content

Automated longitudinal data analysis using Gaussian processes.

Project description

Overview

This repository houses code for the waveome package - an easy to use and powerful Python library that analyzes longitudinal data using Gaussian processes. It is particularly well-suited to characterize the temporal dynamics of omics measurements and associated variables of interest. This is done by using the Gaussian process as a prior to allow for flexible, nonparametric estimation of the potential relationships between varibles of interest. Futhermore, we allow for automated variable selection through a variety of methods. The software is open source and is built on top of GPflow (and TensorFlow).

Key features

  • General Purpose: Focus for longitudinal data analysis, but also useful for cross-sectional hypotheses
  • Flexible Modeling: Variety of kernels (including for categorical variables) and non-Gaussian likelihoods available
  • Variable Selection: Search-based as well as global penalization with Horshoe priors to automatically identify relevant covariates and kernel structure
  • Metrics & Visualizations: Generalized deviance explained and Bayes factors available as well as a variety of plotting features
  • Parallelization: Independent model hyperparameter optimization occurs in parallel through Ray allowing scalability from local machine to clusters

Installation

We recommend a fresh conda environment (Python 3.9–3.11):

conda create -n waveome_env python=3.11
conda activate waveome_env
pip install waveome

Recommended for Jupyter notebooks:

conda install jupyter ipykernel
python -m ipykernel install --user --name=waveome_env

For platform-specific tips, see docs/INSTALL.md (optional).

Quick Start

import seaborn as sns
from waveome.model_search import GPSearch

# Load example dataset
iris = sns.load_dataset("iris")

# Load waveome object
# Assume outcomes are sepal_length and sepal_width
gps = GPSearch(
  X=iris[["petal_length", "petal_width", "species"]],
  Y=iris[["sepal_length", "sepal_width"]],
  categorical_vars=["species"]
)

# Optimize GP models via penalization
gps.penalized_optimization()

# Visualize results
gps.plot_heatmap(var_cutoff=0, cluster=False)

See the tutorial notebook waveome_overview.ipynb for longitudinal synthetic data generation and more visualization options post-fitting.

Applications

Simulations:

Path: examples/simulations/
Summary: We evaluated our methods on simulated data both for holdout distributional fit as well as our automated variable selection strategies. These were performed on the GW HPC, but individuals might be interested in understanding more of the modeling components and methods in waveome which can be found in the notebook simple_regression_different_models.ipynb.

iHMP longitudinal metabolome:

Path: examples/iHMP/
Summary: We used metabolomics data from iHMP (Inflammatory Bowel Disease) project Lloyd-Price et al. (2017) for this application. Our goal was to characterize temporal dynamics of metabolites associated with severity of IBD while controlling for other patient/sample characteristics. The notebook ihmp_waveome.ipynb shows the analysis.

Marine microbiome (In progress):

Path: examples/Marine_microbiome/
Summary: We analyzed 28 observations of repeated microbiome samples taken in a marine environment pre and post treatment shock times. Our analysis focused on evaluating the relationship between the abundance of sequence variants and the treatment administered, while controlling for other environmental factors. The preliminary results can be seen in 16S_environment_microbiome_antibiotic_treatments.ipynb.

Breastmilk RNA and infant microbiome & metabolome (In progress):

Path: examples/Breastmilk/
Summary: GWDBB is a reference data library for clinical trials and omics data. One study contains the longitudinal gut microbiome and metabolomics data of infants and mothers breast milk RNA collected at multiple time points. Two longitudinal analyses have been performed and can be found in breastmilk_infant_metabolites_Poisson.ipynb and Breastmilk_infant_Microbiome.ipynb notebook files.

HIV CD4 counts (In progress):

Path: examples/CD4/
Summary: The bivariate responses of HIV-1 RNA (count/ml) in seminal and blood of patients in HIV-RNA AIDS studies from Seattle, Swiss and UNCCH cohorts are considered in this example. The data were collected out of N = 149 subjects divided into two groups of patients who were receiving a therapy (106 patients) and those with no therapy or unknown therapy method (43 patients). The covariates are scaled time, baseline age, baseline CD4 and two factors consists of group and cohort. Data are also available through Wang (2013). The analysis using waveome is provided in CD4.ipynb.


Citation

If you use waveome, please cite:

Allen Ross, Ali Reza Taheriouyn, Jason Llyod-Price, Ali Rahnavard (2024). waveome: characterizing temporal dynamics of metabolites in longitudinal studies, https://github.com/omicsEye/waveome/.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waveome-0.2.0.tar.gz (77.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

waveome-0.2.0-py3-none-any.whl (77.4 kB view details)

Uploaded Python 3

File details

Details for the file waveome-0.2.0.tar.gz.

File metadata

  • Download URL: waveome-0.2.0.tar.gz
  • Upload date:
  • Size: 77.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for waveome-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4098e59ae947a3ed89f9628a804b1a0193c9bc8c8d844df9de1d2d5d7a6e9342
MD5 f8bf2b3274022cfd435fb0391192638d
BLAKE2b-256 7922b101cf990c0450c13aa0eaff9f30a2b865ce6d9340dbc43f3e53bcb1fa37

See more details on using hashes here.

File details

Details for the file waveome-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: waveome-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 77.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for waveome-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8524535cb7f77cc168a1897b7d06f28ad664679696529f1b665ab8ea7d4583ff
MD5 5a700adda7f09ab5f2e60eecbf0bc1f0
BLAKE2b-256 f275ed59f5f9d2856436c145db0e7500983b832c15241a4064f643afaf45ea79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page