Comprehensive tool for visualizing genome-wide cytosine data.
Project description
BismarkPlot
Comprehensive tool for visualizing genome-wide cytosine data.
See the docs: https://shitohana.github.io/BismarkPlot
Right now only coverage2cytosine
input is supported, but support for other input types will be added soon.
Installation
pip install bismarkplot
Console usage
You can use bismarkplot
either as python library or directly from console after installing it.
Console options:
- bismarkplot-metagene - methylation density visualizing tool.
- bismarkplot-chrs - chromosome methylation levels visualizing tool.
bismarkplot-metagene
usage: BismarkPlot. [-h] [-o OUT] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH]
[-w GWINDOWS] [--line] [--heatmap] [--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H H] [-V V] [--dpi DPI]
[-F {png,pdf,svg}]
filename [filename ...]
Metagene visualizing tool.
positional arguments:
filename path to bismark methylation_extractor files
optional arguments:
-h, --help show this help message and exit
-o OUT, --out OUT output base name (default: /Users/shitohana/Desktop/PycharmProjects/BismarkPlot)
-g GENOME, --genome GENOME
path to GFF genome file (default: None)
-r {gene,exon,tss,tes}, --region {gene,exon,tss,tes}
path to GFF genome file (default: gene)
-b BATCH, --batch BATCH
number of rows to be read from bismark file by batch (default: 1000000)
-c CORES, --cores CORES
number of cores to use (default: None)
-f FLENGTH, --flength FLENGTH
length in bp of flank regions (default: 2000)
-u UWINDOWS, --uwindows UWINDOWS
number of windows for upstream (default: 50)
-d DWINDOWS, --dwindows DWINDOWS
number of windows for downstream (default: 50)
-m MLENGTH, --mlength MLENGTH
minimal length in bp of gene (default: 4000)
-w GWINDOWS, --gwindows GWINDOWS
number of windows for genes (default: 100)
--line line-plot enabled (default: False)
--heatmap heat-map enabled (default: False)
--box box-plot enabled (default: False)
--violin violin-plot enabled (default: False)
-S SMOOTH, --smooth SMOOTH
windows for smoothing (default: 10)
-L LABELS [LABELS ...], --labels LABELS [LABELS ...]
labels for plots (default: None)
-C CONFIDENCE, --confidence CONFIDENCE
probability for confidence bands for line-plot. 0 if disabled (default: 0)
-H H vertical resolution for heat-map (default: 100)
-V V vertical resolution for heat-map (default: 100)
--dpi DPI dpi of output plot (default: 200)
-F {png,pdf,svg}, --format {png,pdf,svg}
format of output plots (default: pdf)
Example:
bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000 -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt
bismarkplot-chrs
usage: BismarkPlot [-h] [-o DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] path/to/txt [path/to/txt ...]
Chromosome methylation levels visualization.
positional arguments:
path/to/txt path to bismark methylation_extractor file
options:
-h, --help show this help message and exit
-o DIR, --out DIR output base name (default: current/path)
-b N, --batch N number of rows to be read from bismark file by batch (default: 1000000)
-c CORES, --cores CORES
number of cores to use (default: None)
-w N, --wlength N number of windows for genes (default: 100000)
-m N, --mlength N minimum chromosome length (default: 1000000)
-S FLOAT, --smooth FLOAT
windows for smoothing (0 - no smoothing, 1 - straight line (default: 50)
-F {png,pdf,svg}, --format {png,pdf,svg}
format of output plots (default: pdf)
Example:
bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt
Python
BismarkPlot provides a large variety of function for manipulating with cytosine methylation data.
Metagene
Below we will show the basic BismarkPlot workflow.
Single sample
import bismarkplot
# Firstly, we need to read the regions annotation (e.g. reference genome .gff)
genome = bismarkplot.Genome.from_gff("path/to/genome.gff")
# Next we need to filter regions of interest from the genome
genes = genome.gene_body(min_length=4000, flank_length=2000)
# Now we need to calculate metagene data
metagene = bismarkplot.Metagene.from_file(
file = "path/to/CX_report.txt",
genome=genes, # filtered regions
upstream_windows = 500,
gene_windows = 1000,
downstream_windows = 500,
batch_size= 10**7 # number of lines to be read simultaneously
)
# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagene.filter(context = "CG", strand = "+")
# We are ready to plot
lp = filtered.line_plot() # line plot data
lp.draw().savefig("path/to/lp.pdf") # matplotlib.Figure
hm = filtered.heat_map(ncol=200, nrow=200)
hm.draw().savefig("path/to/hm.pdf") # matplotlib.Figure
Output:
Smoothing the line plot
Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts)
# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf") # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf") # smoothed with window length = 50
Output:
Multiple samples, same specie
# We can initialize genome like in previous example
filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...) # rest of params from previous example
# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagenes.filter(context = "CG", strand = "+")
# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below
trimmed = filtered.trim_flank() # we want to analyze only gene bodies
trimmed.box_plot(showfliers=False).savefig(...)
trimmed.violin_plot().savefig(...)
# If data is technical replicates we can merge them into single DataFrame and analyze as one
merged = filtered.merge()
Output:
Multiple samples, multiple species
# For analyzing samples with different reference genomes, we need to initialize several genomes instances
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]
genomes = [
bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]
# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
metagene = bismarkplot.Metagene(report, genome = genome, ...)
metagenes.append(metagene)
# Initialize MetageneFiles
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"]
metagenes = Bismarkplot.MetageneFiles(metagenes, labels)
# Now we can plot them like in previous example
Output:
Different regions
Other genomic regions from .gff can be analyzed too with .exon
or .near_tss/.near_tes
option for bismarkplot.Genome
exons = [
bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
metagene = bismarkplot.Metagene(report, genome = exon,
upstream_windows = 0, # !!!
downstream_windows = 0, # !!!
...)
metagenes.append(metagene)
# OR
tss = [
bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
metagene = bismarkplot.Metagene(report, genome = t,
upstream_windows = 1000,# same number of windows
gene_windows = 1000, # same number of windows
downstream_windows = 0, # !!!
...)
metagenes.append(metagene)
Exon output:
TSS output:
Chromosome levels
BismarkPlot allows user to visualize chromosome methylation levels across full genome
import bismarkplot
chr = bismarkplot.ChrLevels.from_file(
"path/to/CX_report.txt",
window_length=10**5, # window length in bp
batch_size=10**7,
chr_min_length = 10**6, # minimum chr length in bp
)
fig, axes = plt.subplots()
for context in ["CG", "CHG", "CHH"]:
chr.filter(strand="+", context=context).draw(
(fig, axes), # to plot contexts on same axes
smooth=10, # window number for smoothing
label=context # labels for lines
)
fig.savefig(f"chrom.pdf", dpi = 200)
Output for Arabidopsis t.:
Output for Brachypodium d.:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bismarkplot-1.3.0b0.tar.gz
.
File metadata
- Download URL: bismarkplot-1.3.0b0.tar.gz
- Upload date:
- Size: 47.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b16af12e7952baa5e24c37ae94c977232003081d31f44f9bfcda082d0b41e8c0 |
|
MD5 | 6b11bf0407ba4c3b12963800c4adef26 |
|
BLAKE2b-256 | b3d173371030cd180387f4bed62a366a05eaeef7c53270410330b0bad7794f13 |
File details
Details for the file bismarkplot-1.3.0b0-py3-none-any.whl
.
File metadata
- Download URL: bismarkplot-1.3.0b0-py3-none-any.whl
- Upload date:
- Size: 44.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a66223b60c481e7e265d42a6a7e8f38ce6e0ae583a62c573b8441a2beebb53b6 |
|
MD5 | a2ef715199a696044219e33922582770 |
|
BLAKE2b-256 | e1d1cc702b24d6b62a06abb92630a94e725ec1faac92e8ea2b43df64f516fe68 |