Skip to main content

Analytical framework for BS-seq data comparison and visualization

Project description

BismarkPlot

Analytical framework for BS-seq data comparison and visualization.

See the docs: https://shitohana.github.io/BismarkPlot

Right now only coverage2cytosine input is supported, but support for other input types will be added soon.

Installation

pip install bismarkplot

Console usage

You can use bismarkplot either as python library or directly from console after installing it.

Console options:

  • bismarkplot-metagene - methylation density visualizing tool.
  • bismarkplot-chrs - chromosome methylation levels visualizing tool.

bismarkplot-metagene

usage: BismarkPlot. [-h] [-o NAME] [--dir DIR] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH] [-w GWINDOWS] [--line] [--heatmap]
                    [--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H VRESOLUTION] [-V HRESOLUTION] [--dpi DPI] [-F {png,pdf,svg}]
                    filename [filename ...]

Metagene visualizing tool.

positional arguments:
  filename              path to bismark methylation_extractor files

optional arguments:
  -h, --help            show this help message and exit
  -o NAME, --out NAME   output base name (default: plot)
  --dir DIR             output dir (default: /Users/shitohana/PycharmProjects/BismarkPlot_test)
  -g GENOME, --genome GENOME
                        path to GFF genome file (default: None)
  -r {gene,exon,tss,tes}, --region {gene,exon,tss,tes}
                        path to GFF genome file (default: gene)
  -b BATCH, --batch BATCH
                        number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -f FLENGTH, --flength FLENGTH
                        length in bp of flank regions (default: 2000)
  -u UWINDOWS, --uwindows UWINDOWS
                        number of windows for upstream (default: 50)
  -d DWINDOWS, --dwindows DWINDOWS
                        number of windows for downstream (default: 50)
  -m MLENGTH, --mlength MLENGTH
                        minimal length in bp of gene (default: 4000)
  -w GWINDOWS, --gwindows GWINDOWS
                        number of windows for genes (default: 100)
  --line                line-plot enabled (default: False)
  --heatmap             heat-map enabled (default: False)
  --box                 box-plot enabled (default: False)
  --violin              violin-plot enabled (default: False)
  -S SMOOTH, --smooth SMOOTH
                        windows for smoothing (default: 10)
  -L LABELS [LABELS ...], --labels LABELS [LABELS ...]
                        labels for plots (default: None)
  -C CONFIDENCE, --confidence CONFIDENCE
                        probability for confidence bands for line-plot. 0 if disabled (default: 0)
  -H VRESOLUTION        vertical resolution for heat-map (default: 100)
  -V HRESOLUTION        vertical resolution for heat-map (default: 100)
  --dpi DPI             dpi of output plot (default: 200)
  -F {png,pdf,svg}, --format {png,pdf,svg}
                        format of output plots (default: pdf)

Example:

bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000  -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt 

Result

bismarkplot-chrs

usage: BismarkPlot [-h] [-o NAME] [-d DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] [--dpi DPI] path/to/txt

Chromosome methylation levels visualization.

positional arguments:
  path/to/txt           path to bismark methylation_extractor file

optional arguments:
  -h, --help            show this help message and exit
  -o NAME, --out NAME   output base name (default: plot)
  -d DIR, --dir DIR     output dir (default: /Users/shitohana/PycharmProjects/BismarkPlot_test)
  -b N, --batch N       number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -w N, --wlength N     number of windows for chromosome (default: 100000)
  -m N, --mlength N     minimum chromosome length (default: 1000000)
  -S FLOAT, --smooth FLOAT
                        windows for smoothing (0 - no smoothing, 1 - straight line (default: 50)
  -F {png,pdf,svg}, --fmt {png,pdf,svg}
                        format of output plots (default: pdf)
  --dpi DPI             dpi of output plot (default: 200)

Example:

bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt

Result

Python

BismarkPlot provides a large variety of function for manipulating with cytosine methylation data.

Metagene

Below we will show the basic BismarkPlot workflow.

Single sample

import bismarkplot
# Firstly, we need to read the regions annotation (e.g. reference genome .gff)
genome = bismarkplot.Genome.from_gff("path/to/genome.gff")  
# Next we need to filter regions of interest from the genome
genes = genome.gene_body(min_length=4000, flank_length=2000)

# Now we need to calculate metagene data
metagene = bismarkplot.Metagene.from_file(
    file = "path/to/CX_report.txt",
    genome=genes,                         # filtered regions
    upstream_windows = 500,               
    gene_windows = 1000,
    downstream_windows = 500,
    batch_size= 10**7                     # number of lines to be read simultaneously
)

# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagene.filter(context = "CG", strand = "+")
# We are ready to plot
lp = filtered.line_plot()                 # line plot data
lp.draw().savefig("path/to/lp.pdf")       # matplotlib.Figure

hm = filtered.heat_map(ncol=200, nrow=200)
hm.draw().savefig("path/to/hm.pdf")       # matplotlib.Figure

Output for Brachypodium distachyon:

If metagene is not filtered by context, all available contexts will be plotted:

filtered_by_strand = metagene.filter(strand == "+")
lp = filtered_by_strand.line_plot()
lp.draw()

Output for Brachypodium distachyon:

Confidence bands can be visualized via setting the confidence parameter in LinePlot.draw()

lp.draw(confidence=.95)

Output for Brachypodium distachyon:

Heat-map clusterisation

Genes can be clustered to minimize distances between them before plotting heat-map. This can be useful for capturing overall methylation patterns in sample. This operation is very time consuming. It is advised to set small number of windows (< 50).

metagene = bismarkplot.Metagene.from_file(
    file = "path/to/CX_report.txt",
    genome=genes,                         # filtered regions
    upstream_windows = 5, gene_windows = 10, downstream_windows = 5,
)
clustered = metagene.clustering(
    count_threshold=5,                    # Minimum counts per window
    dist_method="euclidean",              # See scipy.spatial.distance.pdist
    clust_method="average"                # See scipy.cluster.hierarchy.linkage
)

# Heatmap with optimized distances between genes will be drawn
clustered.draw().savefig("path/to/clustered_hm.pdf")

Output for Brachypodium distachyon - CHG

Genes dynamicTreeCut

To shrink clustered heat-map and capture main patterns genes can be split into modules using dynamicTreeCut algorithm by Peter Langfelder and Bin Zhang. Then genes can be plotted as heat-map as previous example:

# Parameters are the same as for cutreeHybrid (see dynamicTreeCut)
modules = clustered.modules(deepSplit = 1)

modules.draw().savefig("path/to/modules_hm.pdf")

Output for Brachypodium distachyon - CHG

Smoothing the line plot

Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts)

# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf")       # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf")      # smoothed with window length = 50

Output for Mus musculus:

Multiple samples, same specie

# We can initialize genome like in previous example

filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...)  # rest of params from previous example

# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagenes.filter(context = "CG", strand = "+")

# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below
trimmed = filtered.trim_flank()           # we want to analyze only gene bodies
trimmed.box_plot(showfliers=False).savefig(...)
trimmed.violin_plot().savefig(...)

# If data is technical replicates we can merge them into single DataFrame and analyze as one
merged = filtered.merge()

Output for Brachypodium distachyon:

Multiple samples, multiple species

# For analyzing samples with different reference genomes, we need to initialize several genomes instances
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]

genomes = [
    bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]

# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
    metagene = bismarkplot.Metagene(report, genome = genome, ...)
    metagenes.append(metagene)

# Initialize MetageneFiles
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"]
metagenes = Bismarkplot.MetageneFiles(metagenes, labels)
# Now we can plot them like in previous example

Output:

Different regions

Other genomic regions from .gff can be analyzed too with .exon or .near_tss/.near_tes option for bismarkplot.Genome

exons = [
    bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
    metagene = bismarkplot.Metagene(report, genome = exon, 
                                    upstream_windows = 0,   # !!!
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)
# OR
tss = [
    bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
    metagene = bismarkplot.Metagene(report, genome = t, 
                                    upstream_windows = 1000,# same number of windows
                                    gene_windows = 1000,    # same number of windows
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)

Exon output:

TSS output:

Chromosome levels

BismarkPlot allows user to visualize chromosome methylation levels across full genome

import bismarkplot
chr = bismarkplot.ChrLevels.from_file(
    "path/to/CX_report.txt",
    window_length=10**5,                  # window length in bp
    batch_size=10**7,                     
    chr_min_length = 10**6,               # minimum chr length in bp
)
fig, axes = plt.subplots()

for context in ["CG", "CHG", "CHH"]:
     chr.filter(strand="+", context=context).draw(
         (fig, axes),                     # to plot contexts on same axes
         smooth=10,                       # window number for smoothing
         label=context                    # labels for lines
     )

fig.savefig(f"chrom.pdf", dpi = 200)

Output for Arabidopsis thaliana:

Output for Brachypodium distachyon:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bismarkplot-1.3.2.post3.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

bismarkplot-1.3.2.post3-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file bismarkplot-1.3.2.post3.tar.gz.

File metadata

  • Download URL: bismarkplot-1.3.2.post3.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for bismarkplot-1.3.2.post3.tar.gz
Algorithm Hash digest
SHA256 00219124d8e18602ead931571c2416f7ab1d78bf4c833a88bd9d9e2e46253fd0
MD5 b04552292b613bf3ae24ea2195367348
BLAKE2b-256 d535fe6a05c530785702a5f7f7774fd2fe9c9e5f678c1c5d3edb8f6d67d1daba

See more details on using hashes here.

File details

Details for the file bismarkplot-1.3.2.post3-py3-none-any.whl.

File metadata

File hashes

Hashes for bismarkplot-1.3.2.post3-py3-none-any.whl
Algorithm Hash digest
SHA256 56855b397cf44a2e0428a57f35c4d73877a1862ffc9886357bcbc001e98bdb9f
MD5 c08c4783bc86cf12519e36bbc7b95f59
BLAKE2b-256 89a9c8259ac80d19d0cace722963736f4e444f9fff681411201d44f97f22d734

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page