Skip to main content

BAM coverage plot

Project description

pycoverplot

tests

change notes:

v0.3:

MAJOR: from user request: change the way pycoverplot compute the total number of reads form normalisation. pycoverplot by default will use the information in the .bai file which correspond to the total number of reads mapped. this will count multi mapping reads multiple time the previous option to parse the output of the STAR file is still available with the --starlog flag. the older option to directly pass read number have been removed. MINOR: no more running on hold when error from the rust backend (fixed communication between python and rust). mostly due to wrong corrdinate passed (i.e. chromosome name not found in the bam)

Fast read-coverage plots from BAM files, straight to publication-ready figures.

example coverage plot

12 BAM files (GEO GSE216294), a 2.24 Mb gene compressed to a readable 14 kb view, replicate-averaged across 4 groups — plotted in ~4 seconds(6 cpus HPC).

pycoverplot reads BAM files directly through a Rust backend. No bigWig intermediate, no separate normalization step, no shell pipeline. Built for RNA-seq but works on any aligned data.

Why pycoverplot

  • Direct BAM → plot. Skip the bamCoverage → bigWig → pyGenomeTracks pipeline entirely.
  • Replicate-aware. Group BAMs by condition, average automatically, render variance as a confidence band.
  • Intron compression. Rescale long introns to a fixed fraction of the plot width so short 5′ exons stay readable in megabase-scale genes.
  • Fast by design. Parallel BAM reading via Rust, optional GTF index caching for repeated runs.
  • Sensible defaults. RPM-normalized from STAR logs out of the box. Strand-aware. CLI and Python API.

Early-stage software — the API may change between versions. Pin to a commit hash if you need reproducibility.

A note on GTF parsing

Parsing a full Ensembl or GENCODE GTF is the slowest step in most coverage workflows. pycoverplot can builds a sidecar index (mygtf.gtf.pbi + mygtf.gtf.pbi.bi) and uses it on every subsequent run, so repeated plots against the same annotation are effectively free at the GTF stage. The index is auto-detected — no extra flag required.

Security note: GTF index files are pickled Python objects. Only use index files you generated yourself or trust the source of — pickle files can execute arbitrary code when loaded.


Installation

recommanded

pip install pycoverplot

local build

rust

rust backend is already included.

# build a wheel 
git clone https://github.com/rLannes/pycoverplot
cd pycoverplot
python -m build --wheel # does the heavy lifting
# Successfully built <wheel>
pip install <wheel>

Requirements

  • Python ≥ 3.10
  • Sorted and indexed BAM files (.bai index required alongside each .bam)
  • A GTF annotation file or explicit genomic coordinates

Quick Start

Command line

Plot coverage of exon (see --exon argument to include intron) for two groups over an annotated gene:

# Pre-build a GTF index (optional, run once — speeds up all subsequent runs):
pycoverplot_gtf --file annotation.gtf


pycoverplot \
    --bam ctrl_rep1.bam ctrl_rep2.bam --color PALETTE_BLUE \
    --bam treat_rep1.bam treat_rep2.bam --color PALETTE_RED \
    --group_name ctrl treatment \
    --bam_dir /path/to/bam/files/ \
    --gtf annotation.gtf --gene_id ENSMUSG00000028494 \
    --out figure.pdf

--bam flag define a bam group, you can repeat it to define multiple bam group; --color argument define the color of a given bam group( either one color or must match the nuber of bam file in a bam group)

### Some option worth knowing: --exon [exon|intron|intron_partial]: plot only the exon, plot the exon + intron or plot the ewon + compress the intron (usefull for very large intron) --average plot the average with enveloope (two times the standard deviation) --smooth average windows smoothing --thread option (multi cpu) --gene_id: you can plot a specific transcript using geneid:transcriptid --color_odd plot every other feature (exon/intorn in differene color) or every even intron in different color

Plot coverage over a custom genomic interval instead of an annotated gene:

pycoverplot
    --bam ctrl.bam --bam treat.bam \
    --inter chr1,+,1000000,1050000 \
    --out figure.pdf

Python API

The scripting API follows three steps: build your groups, fetch coverage, then plot.

from pathlib import Path
from pycoverplot import Groups, get_intervall, color_list, get_file_path, update_group_coverage, plot, get_reads_fromstar

# --- 1. Define groups ---

ctrl_bams  = get_file_path(["ctrl_rep1.bam", "ctrl_rep2.bam"], bam_dir="/data/bam/")
treat_bams = get_file_path(["treat_rep1.bam", "treat_rep2.bam"], bam_dir="/data/bam/")

ctrl_colors  = color_list(["PALETTE_BLUE"], size=len(ctrl_bams))
treat_colors = color_list(["PALETTE_RED"],  size=len(treat_bams))

ctrl_group  = Groups(colors=ctrl_colors,  bam_files=ctrl_bams)
treat_group = Groups(colors=treat_colors, bam_files=treat_bams)
ctrl_group.group_name  = "ctrl"
treat_group.group_name = "treatment"


groups = [ctrl_group, treat_group]

# Populate read counts from STAR logs (skip if using --NoNormalize)
get_reads_fromstar(groups)

# Optionally set read counts for normalisation (if STAR logs are not available)
# ctrl_group.total_reads  = [12_000_000, 11_500_000]
# treat_group.total_reads = [13_000_000, 12_800_000]


# --- 2. Fetch coverage ---

# Retrieve intervals from a GTF file
target_intervals = get_intervall(
    gtf="flybase.gtf",
    gene_id=["FBgn0267432"],
    inter=None
)

for target_name, target_interval in target_intervals.items():

    for g in groups: # reinitialise the coverage value
        g.cover = []

    update_group_coverage(
        groups,
        target_interval,
        lib_scheme="frFirstStrand",
        n_thread=4,
    )

    # --- 3. Plot ---

    plot(
        groups,
        exon="intron_partial",
        intron_prop=0.3,
        normalize=True,
        norm_factor=1_000_000,
        title="Coverage — " + target_name,
        out="figure.pdf",
        color_even="gainsboro" # hilight the exon
    )

Input

BAM files

BAM files must be sorted and indexed. The .bai index file must be present in the same directory as the .bam file.

Genomic region

Two options are available and are mutually exclusive:

GTF + gene ID — plot all transcripts of a gene, or restrict to a specific transcript using the GENE_ID:TRANSCRIPT_ID syntax. The gene_id must match the value in your GTF file exactly (it is database-dependent and differs from the gene symbol).

Custom interval — plot any arbitrary genomic region using --inter CHROM,STRAND,START,END. Multiple intervals on the same chromosome can be provided and will be concatenated in the plot. must be on same chromosome and same strand!


Normalisation

By default, coverage is normalised to reads per million (RPM) using the uniquely mapped read count read from the STAR Log.final.out file expected alongside each BAM file. Normalisation can be disabled with --NoNormalize.

If STAR logs are not available, read counts can be provided manually with --read_count (CLI) or by setting group.total_reads directly (API).


Color options

Colors can be specified per group in three ways:

Format Example
Built-in palette name PALETTE_BLUE, PALETTE_RED, PALETTE_GREEN, PALETTE_ORANGE, PALETTE_GUGN, PALETTE_BUPL, PALETTE_GREY
Matplotlib colormap name viridis, plasma, Blues
Explicit hex colors #ff0000 #00ff00 (one per file in the group)

Each built-in palette provides 5 colors. For groups with more than 5 files, use a colormap or explicit hex colors.


CLI Reference

Argument Description
--bam One or more BAM files per group. Repeat the flag for additional groups.
--bam_dir Base directory for BAM files. One shared directory or one per group.
--group_name Legend label for each group, in the same order as --bam.
--gtf GTF annotation file. Required with --gene_id.
--gene_id Gene ID(s) to plot. Supports GENE_ID:TRANSCRIPT_ID syntax.
--inter Explicit interval(s) as CHROM,STRAND,START,END. Overrides --gtf.
--LibLayout Library strandedness. Default: frFirstStrand.
--exon Intron display mode: exon, intron, or intron_partial. Default: exon.
--intron_prop Max fraction of plot width for introns (with intron_partial). Default: 0.3.
--smooth Sliding window size in bp for coverage smoothing.
--alpha Coverage line opacity, 0–1. Default: 1.
--color Color specification per group.
--NoNormalize Disable RPM normalisation.
--mapq Minimum mapping quality. Default: 13.
--flag_in SAM flag filter: reads to include. Default: 0.
--flag_out SAM flag filter: reads to exclude. Default: 256.
--thread Number of parallel threads. Default: 1.
--width Figure width in inches. Default: 8.
--height Figure height in inches. Default: 5.
--average plot the average for each bam group with envelope
--rasterize rasterize the figure
--out_file Output file path. Format inferred from extension (.pdf, .png, .svg).
--title Plot title.
--color_even color every even feature
--color_odd color every odd feature

Troubleshooting:

plot is empty or very few reads, and I am sure that should not append!

check the LibLayout, flag_in, flag_out, parameter,

How to include all read not just primary alignment?

use "--flag_out 0 --flag_in 0 --mapq 0" options

plot take a long time to open

use the rasterize option

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycoverplot-0.3.2.tar.gz (698.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycoverplot-0.3.2-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file pycoverplot-0.3.2.tar.gz.

File metadata

  • Download URL: pycoverplot-0.3.2.tar.gz
  • Upload date:
  • Size: 698.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pycoverplot-0.3.2.tar.gz
Algorithm Hash digest
SHA256 c2ea2b47568b3c7b77d990ca3edc936d413b4564f637e1674423716587c22418
MD5 274a914d596d1913a426357cd0a44ead
BLAKE2b-256 48954d056e8c51144bf3cd083fe9a29ba70840189ce26483207984bd1791c166

See more details on using hashes here.

File details

Details for the file pycoverplot-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: pycoverplot-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pycoverplot-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b300e8080b5d44feb1094fc804ab4893366ce3e58d9385ba1d8eab66b778c2ce
MD5 5c5b3829458c105954d0fbb42e64a9e2
BLAKE2b-256 cca9525b4c1b0dd57c3440a9f4711ca2e46d452eb1b1d9261700975576234ebe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page