Coverage inspector for targeted sequencing QC (hg38)
Project description
covsnap
Coverage inspector for targeted sequencing QC (hg38)
covsnap computes per-target (and optionally per-exon) depth-of-coverage metrics from BAM/CRAM files aligned to the human reference genome hg38. It produces a self-contained interactive HTML report with automated PASS/FAIL classification heuristics — designed for clinical and research sequencing QC workflows.
Key Features
- Graphical interface — Run
covsnapwith no arguments to launch a Tkinter GUI with file pickers, mode selection, and progress feedback. Works on Linux, macOS, and Windows. - Gene-aware analysis — Look up genes by symbol (e.g.
BRCA1) or analyze multiple genes at once with a comma-separated list (e.g.BRCA1,TP53,ETFDH). Ships with a built-in dictionary of ~60 clinically relevant genes and an optional full GENCODE v44 tabix index covering 62,700+ genes. - Exon-level resolution — Per-exon depth metrics via the
--exonsflag using MANE Select transcripts from GENCODE v44. - Region and BED modes — Accepts genomic coordinates (
chr17:43044295-43125482) or a BED file of arbitrary target intervals. Region mode auto-discovers overlapping genes and exons. - Interactive HTML report — Single self-contained HTML file with summary cards, exon bar charts with smooth color gradients, accordion details, glossary, and PASS/FAIL classifications.
- Streaming architecture — O(1) memory per target using Welford's online algorithm for mean/variance and histogram-based exact median. No per-base depth arrays are ever held in memory.
- Parallel execution — Concurrent samtools and region/exon analysis for faster results.
- Dual engine support — Prefers mosdepth when available; falls back to
samtools depth. - Contig auto-detection — Transparently handles both
chr-prefixed (UCSC) and non-prefixed (Ensembl/1000G) BAM contig naming. - Gene alias resolution — Common aliases like
HER2 -> ERBB2andP53 -> TP53are resolved automatically, with fuzzy suggestions for typos. - BED guardrails — Configurable limits on target count, total bases, and file size to prevent accidental whole-exome/whole-genome runs.
- Classification heuristics — Automated PASS / DROP_OUT / UNEVEN / LOW_EXON / LOW_COVERAGE calls with tunable thresholds.
Installation
From Bioconda (recommended)
conda install -c bioconda covsnap
From PyPI
pip install covsnap
From source
git clone https://github.com/enes-ak/covsnap.git
cd covsnap
pip install .
With development/test dependencies
pip install ".[dev]"
Runtime requirements
| Dependency | Version | Required? |
|---|---|---|
| Python | >= 3.9 | Yes |
| pysam | >= 0.22 | Yes |
| numpy | >= 1.24 | Yes |
| samtools | any recent | Yes (engine) |
| mosdepth | >= 0.3 | Optional (preferred engine) |
Note: At least one of
samtoolsormosdepthmust be on your$PATH. When--engine auto(the default), covsnap prefers mosdepth and falls back to samtools.
Quick Start
Graphical interface
Run covsnap with no arguments to launch the GUI:
covsnap
A window opens where you can select your BAM file, choose analysis mode, configure options, and run the analysis — all without typing commands.
Gene mode
Analyze coverage for a gene by name:
covsnap sample.bam BRCA1
This produces covsnap.report.html — an interactive HTML report with coverage metrics and PASS/FAIL classification.
Multiple genes
Analyze several genes in a single run with a comma-separated list:
covsnap sample.bam BRCA1,TP53,ETFDH --exons
With exon-level detail
covsnap sample.bam BRCA1 --exons
Region mode
Specify an explicit genomic region (1-based inclusive coordinates). Overlapping genes and exons are auto-discovered:
covsnap sample.bam chr17:43044295-43125482
BED mode
Use a BED file of target intervals:
covsnap sample.bam --bed targets.bed
Custom output path
covsnap sample.bam BRCA1 -o my_report.html
CRAM files
covsnap sample.cram BRCA1 --reference hg38.fa
HTML Report
covsnap produces a single self-contained HTML file (no external dependencies) containing:
- Summary cards — key metrics at a glance (mean depth, coverage breadth, classification)
- Exon bar chart — per-exon coverage with smooth HSL color gradient (red → amber → teal)
- Accordion details — expandable per-target and per-exon metrics
- Low-coverage blocks — contiguous regions below threshold (when
--emit-lowcovis used) - Glossary — definitions of all metrics and classification terms
Classification Heuristics
Each target is classified using ordered heuristics (first match wins):
| Status | Condition |
|---|---|
| DROP_OUT | pct_zero > 5% OR any zero-coverage block >= 500 bp |
| UNEVEN | mean_depth > 20 AND coefficient of variation > 1.0 |
| LOW_EXON | Any exon with pct_ge_20 < 90% or pct_zero > 5% (exon mode only) |
| LOW_COVERAGE | pct_ge_20 < 95% |
| PASS | pct_ge_20 >= 95% AND pct_zero <= 1% |
All thresholds are tunable via CLI flags:
covsnap sample.bam BRCA1 \
--pass-pct-ge-20 98.0 \
--pass-max-pct-zero 0.5 \
--dropout-pct-zero 3.0 \
--uneven-cv 0.8
BED Guardrails
When using --bed, covsnap enforces limits to prevent accidental whole-exome/whole-genome processing:
| Parameter | Default | Flag |
|---|---|---|
| Max target intervals | 2,000 | --max-targets |
| Max total base pairs | 50 Mb | --max-total-bp |
| Max BED file size | 50 MB | --max-bed-bytes |
When limits are exceeded, the behavior is controlled by --on-large-bed:
| Mode | Behavior |
|---|---|
error |
Exit with code 4 |
warn_and_clip (default) |
Keep the first N targets that fit within limits |
warn_and_sample |
Reservoir sample N targets (deterministic with --large-bed-seed) |
Building the Full Gene Index
The package ships with a built-in dictionary of ~60 clinically relevant genes. For access to the full GENCODE v44 catalog (62,700+ genes, 201,000+ MANE Select exons), build the tabix index:
# Download GENCODE v44 GTF (requires ~1.5 GB)
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
# Build the index
python scripts/build_gene_index.py gencode.v44.annotation.gtf.gz
# Files are written to src/covsnap/data/
This creates:
hg38_genes.tsv.gz+.tbi— Gene-level tabix indexhg38_exons.bed.gz+.tbi— Exon-level tabix index (MANE Select only)hg38_gene_aliases.json.gz— Gene alias mapping
After building, reinstall the package to include the index files:
pip install .
Full CLI Reference
covsnap [-h] [--version] [--bed BED] [--exons] [--reference FASTA]
[--no-index] [--engine {auto,mosdepth,samtools}]
[--threads N] [-o FILE] [--emit-lowcov]
[--lowcov-threshold N] [--lowcov-min-len N]
[--max-targets N] [--max-total-bp N] [--max-bed-bytes BYTES]
[--on-large-bed {error,warn_and_clip,warn_and_sample}]
[--large-bed-seed N] [--pct-thresholds LIST]
[--pass-pct-ge-20 F] [--pass-max-pct-zero F]
[--dropout-pct-zero F] [--uneven-cv F]
[--exon-pct-ge-20 F] [--exon-max-pct-zero F]
[-v] [--quiet]
alignment [target]
Positional arguments
| Argument | Description |
|---|---|
alignment |
Path to BAM or CRAM file |
target |
Gene symbol, comma-separated gene list, or genomic region. Mutually exclusive with --bed |
Commonly used options
| Flag | Description | Default |
|---|---|---|
--bed BED |
BED file of target intervals | — |
--exons |
Enable exon-level statistics (gene mode only) | off |
--reference FASTA |
Reference FASTA for CRAM decoding | — |
--engine |
Depth engine: auto, mosdepth, samtools |
auto |
--threads N |
Parallel workers for samtools / threads for mosdepth | 4 |
-o FILE / --output FILE |
HTML report output path | covsnap.report.html |
--emit-lowcov |
Include low-coverage blocks in the report | off |
-v / --verbose |
Increase verbosity (repeatable) | — |
--quiet |
Suppress non-error output | off |
Coordinate Convention
All output coordinates use 0-based half-open intervals, consistent with BED format:
# A 100 bp region starting at position 1000
contig start end length_bp
chr17 999 1099 100
User-facing region input accepts 1-based inclusive coordinates (e.g. chr17:1000-1099), which are internally converted.
Examples
Gene mode with custom output
covsnap sample.bam BRCA1 -o results/brca1.html
Multiple genes with exon breakdown
covsnap sample.bam BRCA1,TP53,ETFDH --exons -o panel_report.html
Multi-gene panel via BED
covsnap sample.bam --bed panel_targets.bed -o panel_report.html
Exon-level analysis with low-coverage output
covsnap sample.bam BRCA1 --exons --emit-lowcov --lowcov-threshold 20
Strict BED guardrails
covsnap sample.bam --bed wes_targets.bed \
--on-large-bed error \
--max-targets 500 \
--max-total-bp 10000000
Using samtools explicitly with more threads
covsnap sample.bam TP53 --engine samtools --threads 8
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Invalid arguments or input validation failure |
| 2 | Engine error (samtools/mosdepth failure) |
| 3 | Unknown gene name (with fuzzy suggestions printed to stderr) |
| 4 | BED guardrail limits exceeded (when --on-large-bed error) |
Running Tests
pip install ".[test]"
pytest
The test suite uses synthetic BAM files generated on the fly (no real sequencing data needed). Tests requiring the full GENCODE index or mosdepth are automatically skipped if unavailable.
Project Structure
covsnap/
├── src/covsnap/
│ ├── __init__.py # Version, build, annotation constants
│ ├── cli.py # CLI entry point and orchestration
│ ├── annotation.py # Gene lookup, contig detection, region parsing
│ ├── bed.py # Streaming BED parser with guardrails
│ ├── metrics.py # TargetAccumulator (Welford + histogram)
│ ├── engines.py # samtools / mosdepth depth computation
│ ├── gui.py # Tkinter graphical interface
│ ├── html_report.py # Self-contained interactive HTML report
│ ├── report.py # Classification heuristics
│ └── data/ # Gene/exon tabix indexes (GENCODE v44)
├── tests/ # Comprehensive test suite
├── scripts/
│ └── build_gene_index.py # GENCODE GTF → tabix index builder
├── recipes/conda/ # Bioconda-compatible recipe
└── pyproject.toml
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file covsnap-0.3.0.tar.gz.
File metadata
- Download URL: covsnap-0.3.0.tar.gz
- Upload date:
- Size: 5.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8d5d5f4b1be22d46fc11b1adffc73dae2afba6b7d8eb37239642c99c6042346
|
|
| MD5 |
cf4748c8d92dcb5a1fa93a540481e43c
|
|
| BLAKE2b-256 |
f530e75fa030ab17c03fb343e714bb090befce48bd70b63498e7597f03b70f1b
|
Provenance
The following attestation bundles were made for covsnap-0.3.0.tar.gz:
Publisher:
publish.yml on enes-ak/covsnap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
covsnap-0.3.0.tar.gz -
Subject digest:
c8d5d5f4b1be22d46fc11b1adffc73dae2afba6b7d8eb37239642c99c6042346 - Sigstore transparency entry: 1283351035
- Sigstore integration time:
-
Permalink:
enes-ak/covsnap@ec0f763c6568fc4c4b7e61e3bbd60b3cf9ef1639 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/enes-ak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ec0f763c6568fc4c4b7e61e3bbd60b3cf9ef1639 -
Trigger Event:
push
-
Statement type:
File details
Details for the file covsnap-0.3.0-py3-none-any.whl.
File metadata
- Download URL: covsnap-0.3.0-py3-none-any.whl
- Upload date:
- Size: 5.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34154d2a753846a909605b14e0b1a6c0abd220034a2e533c79dbf2a36e22b999
|
|
| MD5 |
1c70fdd0480ab0e2b9442f6ca5e5d801
|
|
| BLAKE2b-256 |
8b2bfeb113918d179ec99da296170676a01cc7b8aa81caa46a9f78b8ca1aeb07
|
Provenance
The following attestation bundles were made for covsnap-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on enes-ak/covsnap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
covsnap-0.3.0-py3-none-any.whl -
Subject digest:
34154d2a753846a909605b14e0b1a6c0abd220034a2e533c79dbf2a36e22b999 - Sigstore transparency entry: 1283351149
- Sigstore integration time:
-
Permalink:
enes-ak/covsnap@ec0f763c6568fc4c4b7e61e3bbd60b3cf9ef1639 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/enes-ak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ec0f763c6568fc4c4b7e61e3bbd60b3cf9ef1639 -
Trigger Event:
push
-
Statement type: