Structural-variant phasing from HP-tagged long-read BAMs
Project description
SvPhaser
Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data
SvPhaser assigns haplotype-aware genotypes to pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, etc.).
It fills a critical gap in long-read SV analysis:
- SV callers (e.g. Sniffles2) discover variants
- SvPhaser phases and genotypes them (
1|0,0|1,1|1, or./.) - with explicit read-level evidence and a quantitative genotype quality (GQ)
SvPhaser is caller-agnostic, deterministic, and designed for large-scale benchmarking and biological interpretation.
Key features
- Post-hoc SV phasing from HP-tagged BAM/CRAM (no re-calling required)
- Per-chromosome parallelization (efficient on HPC and multi-core systems)
- SV-type-aware evidence detection (DEL / INS / INV / BND / DUP)
- Deterministic Δ-based decision logic (no HMMs, no sampling)
- Explicit confidence modeling via GQ and reason codes
- CSV-first design for transparent benchmarking and debugging
- VCF-compliant output with rich
SVP_*INFO annotations
Installation
From PyPI (recommended)
# Requires Python >= 3.9
pip install svphaser
Optional extras:
pip install "svphaser[plots]" # plotting utilities
pip install "svphaser[bench]" # benchmarking helpers
pip install "svphaser[dev]" # development + linting
From source
git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .
Inputs & requirements
SvPhaser requires two inputs only:
-
Unphased SV VCF (
.vcf/.vcf.gz)- Produced by an SV caller (e.g. Sniffles2)
- May optionally contain
RNAMESINFO for precise read support
-
HP-tagged BAM/CRAM
- Long-read alignments with haplotype tags (
HP=1/2) - Generated by an upstream phasing pipeline (e.g. WhatsHap)
- Long-read alignments with haplotype tags (
⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.
Quick start (CLI)
svphaser phase \
sample_unphased.vcf.gz \
sample.sorted_phased.bam \
--out-dir results/ \
--min-support 10 \
--min-tagged-support 3 \
--major-delta 0.60 \
--equal-delta 0.10 \
--support-mode hybrid \
--dynamic-window \
--tie-to-hom-alt \
--gq-bins "30:High,10:Moderate" \
--threads 32
Outputs
For an input sample.vcf.gz, SvPhaser produces:
-
sample_phased.csv— primary analysis artifact- Per-SV read support (
hp1,hp2,nohp) - Derived metrics (
tagged_total,support_total, Δ) - Final decisions (
gt,gq,reason)
- Per-SV read support (
-
sample_phased.vcf(.gz)— interoperability outputFORMAT/GT,FORMAT/GQ- Optional
SVP_*INFO annotations when--svp-infois enabled
The CSV is intended for benchmarking, visualization, and interpretation; the VCF is a downstream-consumable representation.
Algorithm & methodology
A full, implementation-faithful description of the algorithm—including:
- evidence collection
- haplotype decision logic
- pseudoalgorithm
- workflow diagram
is provided in:
➡️ docs/Methodology.md
This document is the authoritative reference for reviewers and users seeking algorithmic clarity.
Python API
from pathlib import Path
from svphaser.phasing.io import phase_vcf
phase_vcf(
Path("sample.vcf.gz"),
Path("sample.sorted_phased.bam"),
out_dir=Path("results"),
min_support=10,
min_tagged_support=3,
major_delta=0.60,
equal_delta=0.10,
support_mode="hybrid",
dynamic_window=True,
tie_to_hom_alt=True,
gq_bins="30:High,10:Moderate",
threads=8,
)
Repository structure
SvPhaser/
├─ src/svphaser/ # core package
├─ docs/ # methodology & design notes
├─ tests/ # unit + regression tests
├─ notebooks/ # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md
Citing SvPhaser
If SvPhaser contributes to your research, please cite:
@software{svphaser2026,
author = {Pranjul Mishra and Sachin Gadakh},
title = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
version = {2.1.x},
year = {2026},
url = {https://github.com/SFGLab/SvPhaser},
note = {PyPI: https://pypi.org/project/svphaser/}
}
For maximum reproducibility, include the exact git commit hash used.
License
SvPhaser is released under the MIT License — see LICENSE.
Contact
Developed at SFG Lab (BioAI).
- Pranjul Mishra — pranjul.mishra@proton.me
- Sachin Gadakh — s.gadakh@cent.uw.edu.pl
Bug reports and feature requests: please open a GitHub issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svphaser-2.2.0.tar.gz.
File metadata
- Download URL: svphaser-2.2.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6cbcab3d704af7948b8a529648bb03f591b27fd2a629c856b43e474bbf62844
|
|
| MD5 |
d6b1f7fc7e8b4f7de05404d2bf159c52
|
|
| BLAKE2b-256 |
c299f05e791f7b5c12bf8a7685f8685eb5e85e48dc9c049246886ffce340bffe
|
File details
Details for the file svphaser-2.2.0-py3-none-any.whl.
File metadata
- Download URL: svphaser-2.2.0-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
081125191e6d90d891dc5dfbd6c8b7750bffc3ce85412b30df92f835b148ca3a
|
|
| MD5 |
94e978ece7d1943c6c9361c40ad1da99
|
|
| BLAKE2b-256 |
4ad3524b45ad1e6d5cc351d5840a2e5773b9426312aa303785a886ff2d80c0e5
|