Skip to main content

Comprehensive tool for visualizing genome-wide cytosine data.

Project description

BismarkPlot

Comprehensive tool for visualizing genome-wide cytosine data.

See the docs: https://shitohana.github.io/BismarkPlot

Right now only coverage2cytosine input is supported, but support for other input types will be added soon.

Installation

pip install bismarkplot

Console usage

You can use bismarkplot either as python library or directly from console after installing it.

Console options:

  • bismarkplot-metagene - methylation density visualizing tool.
  • bismarkplot-chrs - chromosome methylation levels visualizing tool.

bismarkplot-metagene

usage: BismarkPlot. [-h] [-o OUT] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH]
                    [-w GWINDOWS] [--line] [--heatmap] [--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H H] [-V V] [--dpi DPI]
                    [-F {png,pdf,svg}]
                    filename [filename ...]

Metagene visualizing tool.

positional arguments:
  filename              path to bismark methylation_extractor files

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     output base name (default: /Users/shitohana/Desktop/PycharmProjects/BismarkPlot)
  -g GENOME, --genome GENOME
                        path to GFF genome file (default: None)
  -r {gene,exon,tss,tes}, --region {gene,exon,tss,tes}
                        path to GFF genome file (default: gene)
  -b BATCH, --batch BATCH
                        number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -f FLENGTH, --flength FLENGTH
                        length in bp of flank regions (default: 2000)
  -u UWINDOWS, --uwindows UWINDOWS
                        number of windows for upstream (default: 50)
  -d DWINDOWS, --dwindows DWINDOWS
                        number of windows for downstream (default: 50)
  -m MLENGTH, --mlength MLENGTH
                        minimal length in bp of gene (default: 4000)
  -w GWINDOWS, --gwindows GWINDOWS
                        number of windows for genes (default: 100)
  --line                line-plot enabled (default: False)
  --heatmap             heat-map enabled (default: False)
  --box                 box-plot enabled (default: False)
  --violin              violin-plot enabled (default: False)
  -S SMOOTH, --smooth SMOOTH
                        windows for smoothing (default: 10)
  -L LABELS [LABELS ...], --labels LABELS [LABELS ...]
                        labels for plots (default: None)
  -C CONFIDENCE, --confidence CONFIDENCE
                        probability for confidence bands for line-plot. 0 if disabled (default: 0)
  -H H                  vertical resolution for heat-map (default: 100)
  -V V                  vertical resolution for heat-map (default: 100)
  --dpi DPI             dpi of output plot (default: 200)
  -F {png,pdf,svg}, --format {png,pdf,svg}
                        format of output plots (default: pdf)

Example:

bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000  -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt 

Result

bismarkplot-chrs

usage: BismarkPlot [-h] [-o DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] path/to/txt [path/to/txt ...]

Chromosome methylation levels visualization.

positional arguments:
  path/to/txt           path to bismark methylation_extractor file

options:
  -h, --help            show this help message and exit
  -o DIR, --out DIR     output base name (default: current/path)
  -b N, --batch N       number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -w N, --wlength N     number of windows for genes (default: 100000)
  -m N, --mlength N     minimum chromosome length (default: 1000000)
  -S FLOAT, --smooth FLOAT
                        windows for smoothing (0 - no smoothing, 1 - straight line (default: 50)
  -F {png,pdf,svg}, --format {png,pdf,svg}
                        format of output plots (default: pdf)

Example:

bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt

Result

Python

BismarkPlot provides a large variety of function for manipulating with cytosine methylation data.

Metagene

Below we will show the basic BismarkPlot workflow.

Single sample

import bismarkplot
# Firstly, we need to read the regions annotation (e.g. reference genome .gff)
genome = bismarkplot.Genome.from_gff("path/to/genome.gff")  
# Next we need to filter regions of interest from the genome
genes = genome.gene_body(min_length=4000, flank_length=2000)

# Now we need to calculate metagene data
metagene = bismarkplot.Metagene.from_file(
    file = "path/to/CX_report.txt",
    genome=genes,                         # filtered regions
    upstream_windows = 500,               
    gene_windows = 1000,
    downstream_windows = 500,
    batch_size= 10**7                     # number of lines to be read simultaneously
)

# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagene.filter(context = "CG", strand = "+")
# We are ready to plot
lp = filtered.line_plot()                 # line plot data
lp.draw().savefig("path/to/lp.pdf")       # matplotlib.Figure

hm = filtered.heat_map(ncol=200, nrow=200)
hm.draw().savefig("path/to/hm.pdf")       # matplotlib.Figure

Output:

Smoothing the line plot

Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts)

# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf")       # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf")      # smoothed with window length = 50

Output:

Multiple samples, same specie

# We can initialize genome like in previous example

filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...)  # rest of params from previous example

# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagenes.filter(context = "CG", strand = "+")

# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below
trimmed = filtered.trim_flank()           # we want to analyze only gene bodies
trimmed.box_plot(showfliers=False).savefig(...)
trimmed.violin_plot().savefig(...)

# If data is technical replicates we can merge them into single DataFrame and analyze as one
merged = filtered.merge()

Output:

Multiple samples, multiple species

# For analyzing samples with different reference genomes, we need to initialize several genomes instances
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]

genomes = [
    bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]

# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
    metagene = bismarkplot.Metagene(report, genome = genome, ...)
    metagenes.append(metagene)

# Initialize MetageneFiles
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"]
metagenes = Bismarkplot.MetageneFiles(metagenes, labels)
# Now we can plot them like in previous example

Output:

Different regions

Other genomic regions from .gff can be analyzed too with .exon or .near_tss/.near_tes option for bismarkplot.Genome

exons = [
    bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
    metagene = bismarkplot.Metagene(report, genome = exon, 
                                    upstream_windows = 0,   # !!!
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)
# OR
tss = [
    bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
    metagene = bismarkplot.Metagene(report, genome = t, 
                                    upstream_windows = 1000,# same number of windows
                                    gene_windows = 1000,    # same number of windows
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)

Exon output:

TSS output:

Chromosome levels

BismarkPlot allows user to visualize chromosome methylation levels across full genome

import bismarkplot
chr = bismarkplot.ChrLevels.from_file(
    "path/to/CX_report.txt",
    window_length=10**5,                  # window length in bp
    batch_size=10**7,                     
    chr_min_length = 10**6,               # minimum chr length in bp
)
fig, axes = plt.subplots()

for context in ["CG", "CHG", "CHH"]:
     chr.filter(strand="+", context=context).draw(
         (fig, axes),                     # to plot contexts on same axes
         smooth=10,                       # window number for smoothing
         label=context                    # labels for lines
     )

fig.savefig(f"chrom.pdf", dpi = 200)

Output for Arabidopsis t.:

Output for Brachypodium d.:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bismarkplot-1.3.0b0.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

bismarkplot-1.3.0b0-py3-none-any.whl (44.2 kB view details)

Uploaded Python 3

File details

Details for the file bismarkplot-1.3.0b0.tar.gz.

File metadata

  • Download URL: bismarkplot-1.3.0b0.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for bismarkplot-1.3.0b0.tar.gz
Algorithm Hash digest
SHA256 b16af12e7952baa5e24c37ae94c977232003081d31f44f9bfcda082d0b41e8c0
MD5 6b11bf0407ba4c3b12963800c4adef26
BLAKE2b-256 b3d173371030cd180387f4bed62a366a05eaeef7c53270410330b0bad7794f13

See more details on using hashes here.

File details

Details for the file bismarkplot-1.3.0b0-py3-none-any.whl.

File metadata

File hashes

Hashes for bismarkplot-1.3.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 a66223b60c481e7e265d42a6a7e8f38ce6e0ae583a62c573b8441a2beebb53b6
MD5 a2ef715199a696044219e33922582770
BLAKE2b-256 e1d1cc702b24d6b62a06abb92630a94e725ec1faac92e8ea2b43df64f516fe68

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page