A probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs
Project description
LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs
Overview
LuxGLM is a method for quantifying oxi-mC species with arbitrary covariate structures from bisulphite based sequencing data. LuxGLM's probabilistic modeling framework combines a previously proposed hierarchical generative model of Lux for oxi-mC-seq data and a general linear model component to account for confounding effects.
Features
- Model-based integration and analysis of BS-seq/oxBS-seq/TAB-seq/fCAB-seq/CAB-seq/redBS-seq/MAB-seq/etc. data from whole genome, reduced representation or targeted experiments
- Accounts for confounding effects through general linear model component
- Considers nonideal experimental parameters through modeling, including e.g. bisulphite conversion and oxidation efficiencies, various chemical labeling and protection steps etc.
- Model-based integration of biological replicates
- Detects differential methylation using Bayes factors (DMRs)
- Full Bayesian inference using NumPyro
Quick introduction
An usual LuxGLM pipeline has the following steps
-
Alignment of BS-seq and oxBS-seq data (e.g., Bismark or BSmooth)
-
Extraction of converted and unconverted counts (e.g., Bismark or BSmooth)
-
Integrative methylation analysis
-
Analysis of obtained methylation estimates, e.g., using Bayes factors
This documentation focuses on the third and fourth points.
Installation
PyPI
$ pip install luxglm
GitHub
Install the version from the main branch as follows
$ pip install git+https://github.com/tare/LuxGLM.git
Usage
Metadata
Count data and covariates are defined in the metadata file.
name | basal/tgf-beta | vitc | ra | timepoint | count_file | control_count_file | control_definition_file |
---|---|---|---|---|---|---|---|
TGFb_1_24h | 1 | 0 | 0 | 24 | wildtype/TGFb_1_24h.tsv | control/TGFb_1_24h_control.tsv | control_definitions.tsv |
TGFb_1_38h | 1 | 0 | 0 | 38 | wildtype/TGFb_1_38h.tsv | control/TGFb_1_38h_control.tsv | control_definitions.tsv |
TGFb_1_48h | 1 | 0 | 0 | 48 | wildtype/TGFb_1_48h.tsv | control/TGFb_1_48h_control.tsv | control_definitions.tsv |
The following columns are mandatory: name
, count_file
, control_count_file
, and control_definition
. Additionally, there has to be at least one covariate. In the above example, we have four covariates: basal/tgf-beta
, vitc
, ra
, and timepoint
.
Control cytosines
The control cytosine data are supplied in the control count files. Each experiment will have its own file. The files contain location information (chromosome
and position
) and control type information (control_cytosine
) for the control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (bs_c
, bs_total
, oxbs_c
, and oxbs_total
).
chromosome | position | control_type | bs_c | bs_total | oxbs_c | oxbs_total |
---|---|---|---|---|---|---|
Lambda_ctrl | 22924 | C | 2 | 343 | 1 | 562 |
Lambda_ctrl | 22928 | C | 2 | 341 | 1 | 561 |
Lambda_ctrl | 47359 | 5mC | 3770 | 3857 | 4767 | 4877 |
Lambda_ctrl | 47367 | 5mC | 3895 | 3962 | 4855 | 4979 |
Lambda_ctrl | 23789 | 5hmC | 3792 | 3964 | 79 | 865 |
Lambda_ctrl | 23794 | 5hmC | 3901 | 4115 | 62 | 934 |
The prior knowledge on the control cytosines is supplied in the control definition file. Note that control_type
is used to link the control count data and control definitions.
control_type | C_pseudocount | 5mC_pseudocount | 5hmC_pseudocount |
---|---|---|---|
C | 998 | 1 | 1 |
5mC | 1 | 998 | 1 |
5hmC | 6 | 2 | 72 |
Noncontrol cytosines
The non-control cytosine data are supplied in the count files. Each experiment will have its own file. The files contain location information (chromosome
and position
) for the non-control cytosines. Additionally, we have the number of Cs and and total number of read-outs from BS-seq and oxBS-seq experiments (bs_c
, bs_total
, oxbs_c
, and oxbs_total
).
chromosome | position | bs_c | bs_total | oxbs_c | oxbs_total |
---|---|---|---|---|---|
chrX | 7159069 | 1083 | 1563 | 850 | 2736 |
chrX | 7159186 | 1341 | 1534 | 2119 | 2719 |
chrX | 7159222 | 4949 | 5575 | 3886 | 4639 |
chrX | 7159235 | 4831 | 5588 | 4354 | 4641 |
LuxGLM analysis
The following lines are sufficient to run LuxGLM
import numpyro
from jax import random
from luxglm.inference import run_nuts
from luxglm.utils import get_input_data
numpyro.enable_x64()
# read input data
lux_input_data = get_input_data("metadata.tsv")
key = random.PRNGKey(0)
key, key_ = random.split(key)
# run LuxGLM
lux_result = run_nuts(
key,
lux_input_data,
["basal/tgf-beta"],
num_warmup=1_000,
num_samples=1_000,
num_chains=4,
)
# ensure convergence
lux_result.inference_metrics["summary"].query("r_hat > 1.05")
To get the posterior samples of methylation levels of control and non-control cytosines one can call lux_result.methylation_controls()
and lux_result.methylation()
, respectively.
The posterior samples of experimental parameters can be obtained by calling lux_result.experimental_parameters()
.
To study the effects of the covariates, one can get the posterior samples of coefficients of covariates using lux_result.coefficients()
.
Examples
Please see the examples directory for the tutorial notebooks.
References
[1] T. Äijö, X. Yue, A. Rao and H. Lähdesmäki, “LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs.,” Bioinformatics, 32.17:i511-i519, Sep 2016.
[2] T. Äijö, Y. Huang, H. Mannerström, L. Chavez, A. Tsagaratou, A. Rao and H. Lähdesmäki, “A probabilistic generative model for quantification of DNA modifications enables analysis of demethylation pathways.,” Genome Biol, 17.1:1, Mar 2016.
[3] F. Krueger and S. R. Andrews, “Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications.,” Bioinformatics, 27.11:1571-1572, Jun 2011.
[4] K. D. Hansen, B. Langmead and R. A. Irizarry, “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.,” Genome Biol, 13.10:1, Oct 2012.
[5] D. Phan, N. Pradhan and M. Jankowiak, “Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro.,” arXiv preprint 1912.11554, Dec 2019.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file luxglm-1.0.0.tar.gz
.
File metadata
- Download URL: luxglm-1.0.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c94e22b5a63bd6e1675c310d597436bfd3906c2dcc01f09cbe7653c348c2c0b |
|
MD5 | d07ce325b1bebe5821c1bd6deb208e5a |
|
BLAKE2b-256 | 1d445e96215266cb090c44c99a55735123a9e0e8e0ee687c38705c7baa0d9796 |
File details
Details for the file luxglm-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: luxglm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.6 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ff1c8f97537a9cb46182253ce94edda6c9b108d53b56237f625b714d0c8afd8 |
|
MD5 | c4a554040b6fbed27313086faa2630f1 |
|
BLAKE2b-256 | 76a6602096a14a406322b22d50b399e5517c3d405a493785864356a678432a3c |