Skip to main content

Comprehensive tool for visualizing genome-wide cytosine data.

Reason this release was yanked:

unstable

Project description

BismarkPlot

Comprehensive tool for visualizing genome-wide cytosine data.

See the docs: https://shitohana.github.io/BismarkPlot

Right now only coverage2cytosine input is supported, but support for other input types will be added soon.

Installation

pip install bismarkplot

Console usage

You can use bismarkplot either as python library or directly from console after installing it.

Console options:

  • bismarkplot-metagene - methylation density visualizing tool.
  • bismarkplot-chrs - chromosome methylation levels visualizing tool.

bismarkplot-metagene

usage: BismarkPlot. [-h] [-o OUT] [-g GENOME] [-r {gene,exon,tss,tes}] [-b BATCH] [-c CORES] [-f FLENGTH] [-u UWINDOWS] [-d DWINDOWS] [-m MLENGTH]
                    [-w GWINDOWS] [--line] [--heatmap] [--box] [--violin] [-S SMOOTH] [-L LABELS [LABELS ...]] [-C CONFIDENCE] [-H H] [-V V] [--dpi DPI]
                    [-F {png,pdf,svg}]
                    filename [filename ...]

Metagene visualizing tool.

positional arguments:
  filename              path to bismark methylation_extractor files

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --out OUT     output base name (default: /Users/shitohana/Desktop/PycharmProjects/BismarkPlot)
  -g GENOME, --genome GENOME
                        path to GFF genome file (default: None)
  -r {gene,exon,tss,tes}, --region {gene,exon,tss,tes}
                        path to GFF genome file (default: gene)
  -b BATCH, --batch BATCH
                        number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -f FLENGTH, --flength FLENGTH
                        length in bp of flank regions (default: 2000)
  -u UWINDOWS, --uwindows UWINDOWS
                        number of windows for upstream (default: 50)
  -d DWINDOWS, --dwindows DWINDOWS
                        number of windows for downstream (default: 50)
  -m MLENGTH, --mlength MLENGTH
                        minimal length in bp of gene (default: 4000)
  -w GWINDOWS, --gwindows GWINDOWS
                        number of windows for genes (default: 100)
  --line                line-plot enabled (default: False)
  --heatmap             heat-map enabled (default: False)
  --box                 box-plot enabled (default: False)
  --violin              violin-plot enabled (default: False)
  -S SMOOTH, --smooth SMOOTH
                        windows for smoothing (default: 10)
  -L LABELS [LABELS ...], --labels LABELS [LABELS ...]
                        labels for plots (default: None)
  -C CONFIDENCE, --confidence CONFIDENCE
                        probability for confidence bands for line-plot. 0 if disabled (default: 0)
  -H H                  vertical resolution for heat-map (default: 100)
  -V V                  vertical resolution for heat-map (default: 100)
  --dpi DPI             dpi of output plot (default: 200)
  -F {png,pdf,svg}, --format {png,pdf,svg}
                        format of output plots (default: pdf)

Example:

bismarkplot-metagene -g path/to/genome.gff -r gene -f 2000 -m 4000  -u 500 -d 500 -w 1000 -b 1000000 --line --heatmap --box --violin --dpi 200 -f pdf -S 50 report1.txt report2.txt report3.txt report4.txt 

Result

bismarkplot-chrs

usage: BismarkPlot [-h] [-o DIR] [-b N] [-c CORES] [-w N] [-m N] [-S FLOAT] [-F {png,pdf,svg}] path/to/txt [path/to/txt ...]

Chromosome methylation levels visualization.

positional arguments:
  path/to/txt           path to bismark methylation_extractor file

options:
  -h, --help            show this help message and exit
  -o DIR, --out DIR     output base name (default: current/path)
  -b N, --batch N       number of rows to be read from bismark file by batch (default: 1000000)
  -c CORES, --cores CORES
                        number of cores to use (default: None)
  -w N, --wlength N     number of windows for genes (default: 100000)
  -m N, --mlength N     minimum chromosome length (default: 1000000)
  -S FLOAT, --smooth FLOAT
                        windows for smoothing (0 - no smoothing, 1 - straight line (default: 50)
  -F {png,pdf,svg}, --format {png,pdf,svg}
                        format of output plots (default: pdf)

Example:

bismarkplot-chrs -b 10000000 -w 10000 -m 1000000 -s 10 -f pdf path/to/CX_report.txt

Result

Python

BismarkPlot provides a large variety of function for manipulating with cytosine methylation data.

Metagene

Below we will show the basic BismarkPlot workflow.

Single sample

import bismarkplot
# Firstly, we need to read the regions annotation (e.g. reference genome .gff)
genome = bismarkplot.Genome.from_gff("path/to/genome.gff")  
# Next we need to filter regions of interest from the genome
genes = genome.gene_body(min_length=4000, flank_length=2000)

# Now we need to calculate metagene data
metagene = bismarkplot.Metagene.from_file(
    file = "path/to/CX_report.txt",
    genome=genes,                         # filtered regions
    upstream_windows = 500,               
    gene_windows = 1000,
    downstream_windows = 500,
    batch_size= 10**7                     # number of lines to be read simultaneously
)

# Our metagene contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagene.filter(context = "CG", strand = "+")
# We are ready to plot
lp = filtered.line_plot()                 # line plot data
lp.draw().savefig("path/to/lp.pdf")       # matplotlib.Figure

hm = filtered.heat_map(ncol=200, nrow=200)
hm.draw().savefig("path/to/hm.pdf")       # matplotlib.Figure

Output:

Smoothing the line plot

Smoothing is very useful, when input signal is very weak (e.g. mammalian non-CpG contexts)

# mouse CHG methylation example
filtered = metagene.filter(context = "CHG", strand = "+")
lp.draw(smooth = 0).savefig("path/to/lp.pdf")       # no smooth
lp.draw(smooth = 50).savefig("path/to/lp.pdf")      # smoothed with window length = 50

Output:

Multiple samples, same specie

# We can initialize genome like in previous example

filenames = ["report1.txt", "report2.txt", "report3.txt", "report4.txt"]
metagenes = bismarkplot.MetageneFiles.from_list(filenames, labels = ["1", "2", "3", "4"], ...)  # rest of params from previous example

# Our metagenes contains all methylation contexts and both strands, so we need to filter it (as in dplyr)
filtered = metagenes.filter(context = "CG", strand = "+")

# Now we can draw line-plot or heatmap like in previous example, or plot distribution statistics as shown below
trimmed = filtered.trim_flank()           # we want to analyze only gene bodies
trimmed.box_plot(showfliers=False).savefig(...)
trimmed.violin_plot().savefig(...)

# If data is technical replicates we can merge them into single DataFrame and analyze as one
merged = filtered.merge()

Output:

Multiple samples, multiple species

# For analyzing samples with different reference genomes, we need to initialize several genomes instances
genome_filenames = ["arabidopsis.gff", "brachypodium.gff", "cucumis.gff", "mus.gff"]
reports_filenames = ["arabidopsis.txt", "brachypodium.txt", "cucumis.txt", "mus.txt"]

genomes = [
    bismarkplot.Genome.from_gff(file).gene_body(...) for file in genome_filenames
]

# Now we read reports
metagenes = []
for report, genome in zip(reports_filenames, genomes):
    metagene = bismarkplot.Metagene(report, genome = genome, ...)
    metagenes.append(metagene)

# Initialize MetageneFiles
labels = ["A. thaliana", "B. distachyon", "C. sativus", "M. musculus"]
metagenes = Bismarkplot.MetageneFiles(metagenes, labels)
# Now we can plot them like in previous example

Output:

Different regions

Other genomic regions from .gff can be analyzed too with .exon or .near_tss/.near_tes option for bismarkplot.Genome

exons = [
    bismarkplot.Genome.from_gff(file).exon(min_length=100) for file in genome_filenames
]
metagenes = []
for report, exon in zip(reports_filenames, exons):
    metagene = bismarkplot.Metagene(report, genome = exon, 
                                    upstream_windows = 0,   # !!!
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)
# OR
tss = [
    bismarkplot.Genome.from_gff(file).near_tss(min_length = 2000, flank_length = 2000) for file in genome_filenames
]
metagenes = []
for report, t in zip(reports_filenames, tss):
    metagene = bismarkplot.Metagene(report, genome = t, 
                                    upstream_windows = 1000,# same number of windows
                                    gene_windows = 1000,    # same number of windows
                                    downstream_windows = 0, # !!!
                                    ...)
    metagenes.append(metagene)

Exon output:

TSS output:

Chromosome levels

BismarkPlot allows user to visualize chromosome methylation levels across full genome

import bismarkplot
chr = bismarkplot.ChrLevels.from_file(
    "path/to/CX_report.txt",
    window_length=10**5,                  # window length in bp
    batch_size=10**7,                     
    chr_min_length = 10**6,               # minimum chr length in bp
)
fig, axes = plt.subplots()

for context in ["CG", "CHG", "CHH"]:
     chr.filter(strand="+", context=context).draw(
         (fig, axes),                     # to plot contexts on same axes
         smooth=10,                       # window number for smoothing
         label=context                    # labels for lines
     )

fig.savefig(f"chrom.pdf", dpi = 200)

Output for Arabidopsis t.:

Output for Brachypodium d.:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bismarkplot-1.3.tar.gz (47.0 kB view details)

Uploaded Source

Built Distribution

bismarkplot-1.3-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file bismarkplot-1.3.tar.gz.

File metadata

  • Download URL: bismarkplot-1.3.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for bismarkplot-1.3.tar.gz
Algorithm Hash digest
SHA256 830b3580527cf8a2f0130067ebc65a750734985f6b655f45ed675b7557ee74e4
MD5 12eb9c14a4b44c09e90698f6975e05c0
BLAKE2b-256 28872b447dd476dc91506f9112f964e25a0dd1c18507e78d346655f03ec85f60

See more details on using hashes here.

File details

Details for the file bismarkplot-1.3-py3-none-any.whl.

File metadata

  • Download URL: bismarkplot-1.3-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for bismarkplot-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a6b75bda75229c78d167bb76d4c26e4326defdf91332a10bdb1bbb31525ab3d1
MD5 e0ce2f717434c0b2300bbd881303fcc2
BLAKE2b-256 7dd8dbb8f8eea49745b89cd64c27008743bcb9fb5690900f806f868bc23e5714

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page