BAM coverage plot
Project description
pycoverplot
change notes:
v0.3:
MAJOR: from user request: change the way pycoverplot compute the total number of reads form normalisation. pycoverplot by default will use the information in the .bai file which correspond to the total number of reads mapped. this will count multi mapping reads multiple time the previous option to parse the output of the STAR file is still available with the --starlog flag. the older option to directly pass read number have been removed. MINOR: no more running on hold when error from the rust backend (fixed communication between python and rust). mostly due to wrong corrdinate passed (i.e. chromosome name not found in the bam)
Fast read-coverage plots from BAM files, straight to publication-ready figures.
12 BAM files (GEO GSE216294), a 2.24 Mb gene compressed to a readable 14 kb view, replicate-averaged across 4 groups — plotted in ~4 seconds(6 cpus HPC).
pycoverplot reads BAM files directly through a Rust backend. No bigWig intermediate, no separate normalization step, no shell pipeline. Built for RNA-seq but works on any aligned data.
Why pycoverplot
- Direct BAM → plot. Skip the
bamCoverage→ bigWig →pyGenomeTrackspipeline entirely. - Replicate-aware. Group BAMs by condition, average automatically, render variance as a confidence band.
- Intron compression. Rescale long introns to a fixed fraction of the plot width so short 5′ exons stay readable in megabase-scale genes.
- Fast by design. Parallel BAM reading via Rust, optional GTF index caching for repeated runs.
- Sensible defaults. RPM-normalized from STAR logs out of the box. Strand-aware. CLI and Python API.
Early-stage software — the API may change between versions. Pin to a commit hash if you need reproducibility.
A note on GTF parsing
Parsing a full Ensembl or GENCODE GTF is the slowest step in most coverage workflows. pycoverplot can builds a sidecar index (mygtf.gtf.pbi + mygtf.gtf.pbi.bi) and uses it on every subsequent run, so repeated plots against the same annotation are effectively free at the GTF stage. The index is auto-detected — no extra flag required.
Security note: GTF index files are pickled Python objects. Only use index files you generated yourself or trust the source of — pickle files can execute arbitrary code when loaded.
Installation
recommanded
pip install pycoverplot
local build
rust
rust backend is already included.
# build a wheel
git clone https://github.com/rLannes/pycoverplot
cd pycoverplot
python -m build --wheel # does the heavy lifting
# Successfully built <wheel>
pip install <wheel>
Requirements
- Python ≥ 3.10
- Sorted and indexed BAM files (
.baiindex required alongside each.bam) - A GTF annotation file or explicit genomic coordinates
Quick Start
Command line
Plot coverage of exon (see --exon argument to include intron) for two groups over an annotated gene:
# Pre-build a GTF index (optional, run once — speeds up all subsequent runs):
pycoverplot_gtf --file annotation.gtf
pycoverplot \
--bam ctrl_rep1.bam ctrl_rep2.bam --color PALETTE_BLUE \
--bam treat_rep1.bam treat_rep2.bam --color PALETTE_RED \
--group_name ctrl treatment \
--bam_dir /path/to/bam/files/ \
--gtf annotation.gtf --gene_id ENSMUSG00000028494 \
--out figure.pdf
--bam flag define a bam group, you can repeat it to define multiple bam group; --color argument define the color of a given bam group( either one color or must match the nuber of bam file in a bam group)
### Some option worth knowing: --exon [exon|intron|intron_partial]: plot only the exon, plot the exon + intron or plot the ewon + compress the intron (usefull for very large intron) --average plot the average with enveloope (two times the standard deviation) --smooth average windows smoothing --thread option (multi cpu) --gene_id: you can plot a specific transcript using geneid:transcriptid --color_odd plot every other feature (exon/intorn in differene color) or every even intron in different color
Plot coverage over a custom genomic interval instead of an annotated gene:
pycoverplot
--bam ctrl.bam --bam treat.bam \
--inter chr1,+,1000000,1050000 \
--out figure.pdf
Python API
The scripting API follows three steps: build your groups, fetch coverage, then plot.
from pathlib import Path
from pycoverplot import Groups, get_intervall, color_list, get_file_path, update_group_coverage, plot, get_reads_fromstar
# --- 1. Define groups ---
ctrl_bams = get_file_path(["ctrl_rep1.bam", "ctrl_rep2.bam"], bam_dir="/data/bam/")
treat_bams = get_file_path(["treat_rep1.bam", "treat_rep2.bam"], bam_dir="/data/bam/")
ctrl_colors = color_list(["PALETTE_BLUE"], size=len(ctrl_bams))
treat_colors = color_list(["PALETTE_RED"], size=len(treat_bams))
ctrl_group = Groups(colors=ctrl_colors, bam_files=ctrl_bams)
treat_group = Groups(colors=treat_colors, bam_files=treat_bams)
ctrl_group.group_name = "ctrl"
treat_group.group_name = "treatment"
groups = [ctrl_group, treat_group]
# Populate read counts from STAR logs (skip if using --NoNormalize)
get_reads_fromstar(groups)
# Optionally set read counts for normalisation (if STAR logs are not available)
# ctrl_group.total_reads = [12_000_000, 11_500_000]
# treat_group.total_reads = [13_000_000, 12_800_000]
# --- 2. Fetch coverage ---
# Retrieve intervals from a GTF file
target_intervals = get_intervall(
gtf="flybase.gtf",
gene_id=["FBgn0267432"],
inter=None
)
for target_name, target_interval in target_intervals.items():
for g in groups: # reinitialise the coverage value
g.cover = []
update_group_coverage(
groups,
target_interval,
lib_scheme="frFirstStrand",
n_thread=4,
)
# --- 3. Plot ---
plot(
groups,
exon="intron_partial",
intron_prop=0.3,
normalize=True,
norm_factor=1_000_000,
title="Coverage — " + target_name,
out="figure.pdf",
color_even="gainsboro" # hilight the exon
)
Input
BAM files
BAM files must be sorted and indexed. The .bai index file must be present in the same directory as the .bam file.
Genomic region
Two options are available and are mutually exclusive:
GTF + gene ID — plot all transcripts of a gene, or restrict to a specific transcript using the GENE_ID:TRANSCRIPT_ID syntax. The gene_id must match the value in your GTF file exactly (it is database-dependent and differs from the gene symbol).
Custom interval — plot any arbitrary genomic region using --inter CHROM,STRAND,START,END. Multiple intervals on the same chromosome can be provided and will be concatenated in the plot. must be on same chromosome and same strand!
Normalisation
By default, coverage is normalised to reads per million (RPM) using the uniquely mapped read count read from the STAR Log.final.out file expected alongside each BAM file. Normalisation can be disabled with --NoNormalize.
If STAR logs are not available, read counts can be provided manually with --read_count (CLI) or by setting group.total_reads directly (API).
Color options
Colors can be specified per group in three ways:
| Format | Example |
|---|---|
| Built-in palette name | PALETTE_BLUE, PALETTE_RED, PALETTE_GREEN, PALETTE_ORANGE, PALETTE_GUGN, PALETTE_BUPL, PALETTE_GREY |
| Matplotlib colormap name | viridis, plasma, Blues |
| Explicit hex colors | #ff0000 #00ff00 (one per file in the group) |
Each built-in palette provides 5 colors. For groups with more than 5 files, use a colormap or explicit hex colors.
CLI Reference
| Argument | Description |
|---|---|
--bam |
One or more BAM files per group. Repeat the flag for additional groups. |
--bam_dir |
Base directory for BAM files. One shared directory or one per group. |
--group_name |
Legend label for each group, in the same order as --bam. |
--gtf |
GTF annotation file. Required with --gene_id. |
--gene_id |
Gene ID(s) to plot. Supports GENE_ID:TRANSCRIPT_ID syntax. |
--inter |
Explicit interval(s) as CHROM,STRAND,START,END. Overrides --gtf. |
--LibLayout |
Library strandedness. Default: frFirstStrand. |
--exon |
Intron display mode: exon, intron, or intron_partial. Default: exon. |
--intron_prop |
Max fraction of plot width for introns (with intron_partial). Default: 0.3. |
--smooth |
Sliding window size in bp for coverage smoothing. |
--alpha |
Coverage line opacity, 0–1. Default: 1. |
--color |
Color specification per group. |
--NoNormalize |
Disable RPM normalisation. |
--mapq |
Minimum mapping quality. Default: 13. |
--flag_in |
SAM flag filter: reads to include. Default: 0. |
--flag_out |
SAM flag filter: reads to exclude. Default: 256. |
--thread |
Number of parallel threads. Default: 1. |
--width |
Figure width in inches. Default: 8. |
--height |
Figure height in inches. Default: 5. |
--average |
plot the average for each bam group with envelope |
--rasterize |
rasterize the figure |
--out_file |
Output file path. Format inferred from extension (.pdf, .png, .svg). |
--title |
Plot title. |
--color_even |
color every even feature |
--color_odd |
color every odd feature |
Troubleshooting:
plot is empty or very few reads, and I am sure that should not append!
check the LibLayout, flag_in, flag_out, parameter,
How to include all read not just primary alignment?
use "--flag_out 0 --flag_in 0 --mapq 0" options
plot take a long time to open
use the rasterize option
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycoverplot-0.3.2.tar.gz.
File metadata
- Download URL: pycoverplot-0.3.2.tar.gz
- Upload date:
- Size: 698.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2ea2b47568b3c7b77d990ca3edc936d413b4564f637e1674423716587c22418
|
|
| MD5 |
274a914d596d1913a426357cd0a44ead
|
|
| BLAKE2b-256 |
48954d056e8c51144bf3cd083fe9a29ba70840189ce26483207984bd1791c166
|
File details
Details for the file pycoverplot-0.3.2-py3-none-any.whl.
File metadata
- Download URL: pycoverplot-0.3.2-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b300e8080b5d44feb1094fc804ab4893366ce3e58d9385ba1d8eab66b778c2ce
|
|
| MD5 |
5c5b3829458c105954d0fbb42e64a9e2
|
|
| BLAKE2b-256 |
cca9525b4c1b0dd57c3440a9f4711ca2e46d452eb1b1d9261700975576234ebe
|