Skip to main content

EpiToolkit is a set of tools useful in the analysis of data from EPIC / 450K microarrays.

Project description

EpiGenToolKit

Is a small library created to deal with data from EPIC / 450K microarrays. The tool allows to:

a) Simply visualize methylation levels of specific CpG or genomic region.

b) Perform enrichment analysis of a selected subset of CpG against the whole array. In this type of analysis expected frequency [%] (based on mynorm) of genomic regions is compared to observed (based on provided cpgs set), results are comapred using chi-square test.

How to start?

a) using env

python -m venv env
source env/bin/activate # Windows: env\Scripts\activate
pip install epitoolkit

b) using poetry

poetry new .
poetry add epitoolkit

c) or just clone the repository:

git clone https://github.com/ClinicalEpigeneticsLaboratory/EpiGenToolKit.git
cd EpiGenToolKit && poetry install

How to use?

Visualization

To visualize single CpG site or specific genomic region initialize Visualise object:

from epitoolkit.tools import Visualize

viz = Visualize(manifest=<path_to_array_manifest>, # path to manifest file
                mynorm=<path_to_mynorm_file>, # path to mynorm file
                poi=<path_to_poi_file>, # path to poi file
                poi_col=<column_name> # name of column containing sample phenotype
                skiprows=0) # many manifest contains headers, set skiprows argument to ignore them.

all files must have *.csv extension, mynorm must contain sample names as columns and cpgs as rows, the proper EPIC manifest may be downloaded from here, poi file must contain sample names rows (only samples overlapped between poi and mynorm will be used) and POI (phenotype of interest) column containing names of phenotype e.g. Control and Case.

To visualize single CpG:

viz.plot_CpG("cg07881041", # cpg ID
    static=False, # plot type static / interactive [default]
    height=400, # plot size [default]
    width=700, # plot size [default]
    title="", # plot title [default]
    legend_title="", # legend title [default]
    font_size=22, # font size [default]
    show_legend=True, # False to hide legedn [default]
    x_axis_label="CpG", # x axsis label [default]
    category_order=["Cohort 1", "Cohort 2], # box order [default]
    y_axis_label="beta-values") # y axis label [default]

NOTE: most of those arguments are default! So you don't need to specify most of them!

CpGPlot

To visualize specific genomic region:

vis.plot_Range(chr=17, start=5999, end=7000)

NOTE: please note that all arguments available in viz.plot_CpG are also in plot_Range

CpGPlot

To visualize specific CpGs in genomic order, instead of whole region, just pass collection of CpGs:

viz.plot_Range(cpgs=["cg04594855", "cg19812938", "cg05451842"]

CpGPlot

To save plots use export argument, for instance:

viz.plot_Range(chr=17, start=5999, end=6770, export="plot.html") # if static = False only html format is supported if static = True, use png extension.

Enrichment analysis

To perform enrichment analysis against any type of genomic region specified in the manifest file, the user needs to initialize EnrichemntAnalysis object.

from src.epitoolkit.tools import EnrichmentAnalysis

ea = EnrichmentAnalysis(manifest=<path_to_array_manifest>,
        mynorm=<path_to_mynorm_file>)

or if Visualize object already exists use load method (this approach makes you not have to load the data again):

ea = EnrichmentAnalysis.load(<Visualize_object_name>)

To start analysis:

ea.enrichmentAnalysis(categories_to_analyse=["UCSC_RefGene_Group", "Relation_to_UCSC_CpG_Island"],  # list of categories to analyse
                cpgs=cpgs) # list of cpgs to analyse against background

examplePlot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epitoolkit-0.2.6.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epitoolkit-0.2.6-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file epitoolkit-0.2.6.tar.gz.

File metadata

  • Download URL: epitoolkit-0.2.6.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.8.10 Linux/5.4.0-89-generic

File hashes

Hashes for epitoolkit-0.2.6.tar.gz
Algorithm Hash digest
SHA256 18dc221fa39a4b769d6de8124eecf982069b3590f43809401c2ecc62b73864d7
MD5 5c09ab9210117771b15cc722d46f9661
BLAKE2b-256 a09ed65d7b009fe2454a0db42cd416c3d199511450af9c58745d01c3c679507b

See more details on using hashes here.

File details

Details for the file epitoolkit-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: epitoolkit-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.8.10 Linux/5.4.0-89-generic

File hashes

Hashes for epitoolkit-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0726e7f620b51a61a83d6a4231091a5abd4cd0fc92eda3b08ff3dcdfc4689823
MD5 c44974d838797ef20d8f3d9bc22ed0cc
BLAKE2b-256 89ad12e1fa040a3d32270df4f782819cbf7b565a943402041b66e3bbe2ded651

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page