Skip to main content

Coverage inspector for targeted sequencing QC (hg38)

Project description

covsnap

Bioconda PyPI DOI License: MIT

Coverage inspector for targeted sequencing QC (hg38)

covsnap computes per-target (and optionally per-exon) depth-of-coverage metrics from BAM/CRAM files aligned to the human reference genome hg38. It produces a self-contained interactive HTML report with automated PASS/FAIL classification heuristics — designed for clinical and research sequencing QC workflows.


Key Features

  • Graphical interface — Run covsnap with no arguments to launch a Tkinter GUI with file pickers, mode selection, and progress feedback. Works on Linux, macOS, and Windows.
  • Gene-aware analysis — Look up genes by symbol (e.g. BRCA1) or analyze multiple genes at once with a comma-separated list (e.g. BRCA1,TP53,ETFDH). Ships with a built-in dictionary of ~60 clinically relevant genes and an optional full GENCODE v44 tabix index covering 62,700+ genes.
  • Exon-level resolution — Per-exon depth metrics via the --exons flag using MANE Select transcripts from GENCODE v44.
  • Region and BED modes — Accepts genomic coordinates (chr17:43044295-43125482) or a BED file of arbitrary target intervals. Region mode auto-discovers overlapping genes and exons.
  • Interactive HTML report — Single self-contained HTML file with summary cards, exon bar charts with smooth color gradients, accordion details, glossary, and PASS/FAIL classifications.
  • Streaming architecture — O(1) memory per target using Welford's online algorithm for mean/variance and histogram-based exact median. No per-base depth arrays are ever held in memory.
  • Parallel execution — Concurrent samtools and region/exon analysis for faster results.
  • Dual engine support — Prefers mosdepth when available; falls back to samtools depth.
  • Contig auto-detection — Transparently handles both chr-prefixed (UCSC) and non-prefixed (Ensembl/1000G) BAM contig naming.
  • Gene alias resolution — Common aliases like HER2 -> ERBB2 and P53 -> TP53 are resolved automatically, with fuzzy suggestions for typos.
  • BED guardrails — Configurable limits on target count, total bases, and file size to prevent accidental whole-exome/whole-genome runs.
  • Classification heuristics — Automated PASS / DROP_OUT / UNEVEN / LOW_EXON / LOW_COVERAGE calls with tunable thresholds.

Installation

From Bioconda (recommended)

conda install -c bioconda covsnap

From PyPI

pip install covsnap

From source

git clone https://github.com/enes-ak/covsnap.git
cd covsnap
pip install .

With development/test dependencies

pip install ".[dev]"

Runtime requirements

Dependency Version Required?
Python >= 3.9 Yes
pysam >= 0.22 Yes
numpy >= 1.24 Yes
samtools any recent Yes (engine)
mosdepth >= 0.3 Optional (preferred engine)

Note: At least one of samtools or mosdepth must be on your $PATH. When --engine auto (the default), covsnap prefers mosdepth and falls back to samtools.


Quick Start

Graphical interface

Run covsnap with no arguments to launch the GUI:

covsnap

A window opens where you can select your BAM file, choose analysis mode, configure options, and run the analysis — all without typing commands.

Gene mode

Analyze coverage for a gene by name:

covsnap sample.bam BRCA1

This produces covsnap.report.html — an interactive HTML report with coverage metrics and PASS/FAIL classification.

Multiple genes

Analyze several genes in a single run with a comma-separated list:

covsnap sample.bam BRCA1,TP53,ETFDH --exons

With exon-level detail

covsnap sample.bam BRCA1 --exons

Region mode

Specify an explicit genomic region (1-based inclusive coordinates). Overlapping genes and exons are auto-discovered:

covsnap sample.bam chr17:43044295-43125482

BED mode

Use a BED file of target intervals:

covsnap sample.bam --bed targets.bed

Custom output path

covsnap sample.bam BRCA1 -o my_report.html

CRAM files

covsnap sample.cram BRCA1 --reference hg38.fa

HTML Report

covsnap produces a single self-contained HTML file (no external dependencies) containing:

  • Summary cards — key metrics at a glance (mean depth, coverage breadth, classification)
  • Exon bar chart — per-exon coverage with smooth HSL color gradient (red → amber → teal)
  • Accordion details — expandable per-target and per-exon metrics
  • Low-coverage blocks — contiguous regions below threshold (when --emit-lowcov is used)
  • Glossary — definitions of all metrics and classification terms

Classification Heuristics

Each target is classified using ordered heuristics (first match wins):

Status Condition
DROP_OUT pct_zero > 5% OR any zero-coverage block >= 500 bp
UNEVEN mean_depth > 20 AND coefficient of variation > 1.0
LOW_EXON Any exon with pct_ge_20 < 90% or pct_zero > 5% (exon mode only)
LOW_COVERAGE pct_ge_20 < 95%
PASS pct_ge_20 >= 95% AND pct_zero <= 1%

All thresholds are tunable via CLI flags:

covsnap sample.bam BRCA1 \
    --pass-pct-ge-20 98.0 \
    --pass-max-pct-zero 0.5 \
    --dropout-pct-zero 3.0 \
    --uneven-cv 0.8

BED Guardrails

When using --bed, covsnap enforces limits to prevent accidental whole-exome/whole-genome processing:

Parameter Default Flag
Max target intervals 2,000 --max-targets
Max total base pairs 50 Mb --max-total-bp
Max BED file size 50 MB --max-bed-bytes

When limits are exceeded, the behavior is controlled by --on-large-bed:

Mode Behavior
error Exit with code 4
warn_and_clip (default) Keep the first N targets that fit within limits
warn_and_sample Reservoir sample N targets (deterministic with --large-bed-seed)

Building the Full Gene Index

The package ships with a built-in dictionary of ~60 clinically relevant genes. For access to the full GENCODE v44 catalog (62,700+ genes, 201,000+ MANE Select exons), build the tabix index:

# Download GENCODE v44 GTF (requires ~1.5 GB)
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz

# Build the index
python scripts/build_gene_index.py gencode.v44.annotation.gtf.gz

# Files are written to src/covsnap/data/

This creates:

  • hg38_genes.tsv.gz + .tbi — Gene-level tabix index
  • hg38_exons.bed.gz + .tbi — Exon-level tabix index (MANE Select only)
  • hg38_gene_aliases.json.gz — Gene alias mapping

After building, reinstall the package to include the index files:

pip install .

Full CLI Reference

covsnap [-h] [--version] [--bed BED] [--exons] [--reference FASTA]
             [--no-index] [--engine {auto,mosdepth,samtools}]
             [--threads N] [-o FILE] [--emit-lowcov]
             [--lowcov-threshold N] [--lowcov-min-len N]
             [--max-targets N] [--max-total-bp N] [--max-bed-bytes BYTES]
             [--on-large-bed {error,warn_and_clip,warn_and_sample}]
             [--large-bed-seed N] [--pct-thresholds LIST]
             [--pass-pct-ge-20 F] [--pass-max-pct-zero F]
             [--dropout-pct-zero F] [--uneven-cv F]
             [--exon-pct-ge-20 F] [--exon-max-pct-zero F]
             [-v] [--quiet]
             alignment [target]

Positional arguments

Argument Description
alignment Path to BAM or CRAM file
target Gene symbol, comma-separated gene list, or genomic region. Mutually exclusive with --bed

Commonly used options

Flag Description Default
--bed BED BED file of target intervals
--exons Enable exon-level statistics (gene mode only) off
--reference FASTA Reference FASTA for CRAM decoding
--engine Depth engine: auto, mosdepth, samtools auto
--threads N Parallel workers for samtools / threads for mosdepth 4
-o FILE / --output FILE HTML report output path covsnap.report.html
--emit-lowcov Include low-coverage blocks in the report off
-v / --verbose Increase verbosity (repeatable)
--quiet Suppress non-error output off

Coordinate Convention

All output coordinates use 0-based half-open intervals, consistent with BED format:

# A 100 bp region starting at position 1000
contig    start    end      length_bp
chr17     999      1099     100

User-facing region input accepts 1-based inclusive coordinates (e.g. chr17:1000-1099), which are internally converted.


Examples

Gene mode with custom output

covsnap sample.bam BRCA1 -o results/brca1.html

Multiple genes with exon breakdown

covsnap sample.bam BRCA1,TP53,ETFDH --exons -o panel_report.html

Multi-gene panel via BED

covsnap sample.bam --bed panel_targets.bed -o panel_report.html

Exon-level analysis with low-coverage output

covsnap sample.bam BRCA1 --exons --emit-lowcov --lowcov-threshold 20

Strict BED guardrails

covsnap sample.bam --bed wes_targets.bed \
    --on-large-bed error \
    --max-targets 500 \
    --max-total-bp 10000000

Using samtools explicitly with more threads

covsnap sample.bam TP53 --engine samtools --threads 8

Exit Codes

Code Meaning
0 Success
1 Invalid arguments or input validation failure
2 Engine error (samtools/mosdepth failure)
3 Unknown gene name (with fuzzy suggestions printed to stderr)
4 BED guardrail limits exceeded (when --on-large-bed error)

Running Tests

pip install ".[test]"
pytest

The test suite uses synthetic BAM files generated on the fly (no real sequencing data needed). Tests requiring the full GENCODE index or mosdepth are automatically skipped if unavailable.


Project Structure

covsnap/
├── src/covsnap/
│   ├── __init__.py          # Version, build, annotation constants
│   ├── cli.py               # CLI entry point and orchestration
│   ├── annotation.py        # Gene lookup, contig detection, region parsing
│   ├── bed.py               # Streaming BED parser with guardrails
│   ├── metrics.py           # TargetAccumulator (Welford + histogram)
│   ├── engines.py           # samtools / mosdepth depth computation
│   ├── gui.py               # Tkinter graphical interface
│   ├── html_report.py       # Self-contained interactive HTML report
│   ├── report.py            # Classification heuristics
│   └── data/                # Gene/exon tabix indexes (GENCODE v44)
├── tests/                   # Comprehensive test suite
├── scripts/
│   └── build_gene_index.py  # GENCODE GTF → tabix index builder
├── recipes/conda/           # Bioconda-compatible recipe
└── pyproject.toml

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

covsnap-0.3.0.tar.gz (5.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

covsnap-0.3.0-py3-none-any.whl (5.3 MB view details)

Uploaded Python 3

File details

Details for the file covsnap-0.3.0.tar.gz.

File metadata

  • Download URL: covsnap-0.3.0.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for covsnap-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c8d5d5f4b1be22d46fc11b1adffc73dae2afba6b7d8eb37239642c99c6042346
MD5 cf4748c8d92dcb5a1fa93a540481e43c
BLAKE2b-256 f530e75fa030ab17c03fb343e714bb090befce48bd70b63498e7597f03b70f1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for covsnap-0.3.0.tar.gz:

Publisher: publish.yml on enes-ak/covsnap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file covsnap-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: covsnap-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for covsnap-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34154d2a753846a909605b14e0b1a6c0abd220034a2e533c79dbf2a36e22b999
MD5 1c70fdd0480ab0e2b9442f6ca5e5d801
BLAKE2b-256 8b2bfeb113918d179ec99da296170676a01cc7b8aa81caa46a9f78b8ca1aeb07

See more details on using hashes here.

Provenance

The following attestation bundles were made for covsnap-0.3.0-py3-none-any.whl:

Publisher: publish.yml on enes-ak/covsnap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page