EWAS Analysis software for Illumina methylation arrays
Project description
methylize
is a python package for analyzing output from Illumina methylation arrays. It complements methylprep
and methylcheck
. View on ReadTheDocs.
 Overview
 Demonstrating differentially methylated probe (DMP) detection (volcano plot) and mapping to chrosomes (manhattan plot)
 About BumpHunter
Methylize Package
The methylize
package contains both highlevel APIs for processing data from local files and lowlevel functionality allowing you to analyze your data AFTER running methylprep
and methylcheck
. For greatest usability, import methylize
into a Jupyer Notebook along with your processed sample data (a DataFrame of beta values or mvalues and a separate DataFrame containing meta data about the samples).
Methylize
allows you to run linear or logistic regression on all probes and identify points of interest in the methylome where DNA is differentially modified. Then you can use these regression results to create volcano plots and manhattan plots.
Sample Manhattan Plot
Sample Volcano Plot
Customizable: Plot size, color palette, and cutoff pvalue lines can be set by the user. Exporting: You can export all probe statistics, or just the significant probes as CSV or python pickled DataFrame.
Installation
pip install methylize
differentially methylated position/probe (DMP) detection
The diff_meth_pos()
function searches for individual differentially methylated positions/probes
(DMPs) by regressing the methylation Mvalue for each sample at a given
genomic location against the phenotype data for those samples.
Phenotypes can be provided as
 a list of stringbased,
 integer binary data,
 numeric continuous data
 (TODO: use the methylprep generated metadata dataframe as input)
The function will coerge string labels for phenotype into 0s and 1s when running logistic regression. Only 2 phenotypes are allowed with logistic regression. Linear regression can take more than two phenotypes.
Inputs and Parameters
meth_data:
A pandas dataframe of methylation Mvalues for
where each column corresponds to a CpG site probe and each
row corresponds to a sample.
pheno_data:
A list or one dimensional numpy array of phenotypes
for each sample row in meth_data.
 Binary phenotypes can be presented as a list/array
of zeroes and ones or as a list/array of strings made up
of two unique words (i.e. "control" and "cancer"). The first
string in phenoData will be converted to zeroes, and the
second string encountered will be convered to ones for the
logistic regression analysis.
 Use numbers for phenotypes if running linear regression.
regression_method: (logistic  linear)
 Either the string "logistic" or the string "linear"
depending on the phenotype data available.
 Default: "linear"
 Phenotypes with only two options (e.g. "control" and "cancer") can be analyzed with a logistic regression
 Continuous numeric phenotypes (e.g. age) are required to run a linear regression analysis.
q_cutoff:
 Select a cutoff value to return only those DMPs that meet a
particular significance threshold. Reported qvalues are
pvalues corrected according to the model's false discovery
rate (FDR).
 Default: 1  returns all DMPs regardless of significance.
export:
 default: False
 if True or 'csv', saves a csv file with data
 if 'pkl', saves a pickle file of the results as a dataframe.
 USE q_cutoff to limit what gets saved to only significant results.
by default, q_cutoff == 1 and this means everything is saved/reported/exported.
filename:
 specify a filename for the exported file.
By default, if not specified, filename will be `DMP_<number of probes in file>_<number of samples processed>_<current_date>.<pklcsv>`
shrink_var:
 If True, variance shrinkage will be employed and squeeze
variance using Bayes posterior means. Variance shrinkage
is recommended when analyzing small datasets (n < 10).
(NOT IMPLEMENTED YET)
Returns
A pandas dataframe of regression statistics with a row for each probe analyzed
and columns listing the individual probe's regression statistics of:
 regression coefficient
 lower limit of the coefficient's 95% confidence interval
 upper limit of the coefficient's 95% confidence interval
 standard error
 pvalue (phenotype group A vs B  likelihood that the difference is significant for this probe/location)
 qvalue (pvalues corrected for multiple testing using the BenjaminiHochberg FDR method)
 FDR_QValue: p value, adjusted for multiple comparisons
The rows are sorted by qvalue in ascending order to list the most significant
probes first. If q_cutoff is specified, only probes with significant qvalues
less than the cutoff will be returned in the dataframe.
If Progress Bar Missing: if you don't see a progress bar in your jupyterlab notebook, try this:
 conda install c condaforge nodejs
 jupyter labextension install @jupyterwidgets/jupyterlabmanager
Loading processed data
Assuming you previously used methylprep
to process a data set like this:
python m methylprep v process d GSE130030 betas
This creates two files, beta_values.pkl
and sample_sheet_meta_data.pkl
. You can load both in methylize
like this:
Navigate to the GSE130030
folder created by methylrep
, and start a python interpreter:
>>>import methylize >>>data,meta = methylize.load_both() INFO:methylize.helpers:loaded data (485512, 14) from 1 pickled files (0.159s) INFO:methylize.helpers:meta.Sample_IDs match data.index (OK)
Or if you are running in a notebook, specify the full path:
import methylize data,meta = methylize.load_both('<path_to...>/GSE105018')
This also validates both files, and ensures that the Sample_ID
column in meta DataFrame aligns with the column names in the data DataFrame
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size methylize0.9.3py3noneany.whl (21.7 kB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size methylize0.9.3.tar.gz (21.8 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for methylize0.9.3py3noneany.whl
Algorithm  Hash digest  

SHA256  07e2c787d8561c7b3b39ae01289e4fa89501011ebe0c2f8b706fb4fa373f1840 

MD5  a96767015dca9aede75f2a03dd6dd06e 

BLAKE2256  1fb711a1ba891eecf932785e942fb2956429ad847b38916653521acc9814ae2e 