Structural-variant phasing from HP-tagged long-read BAMs
Project description
SvPhaser
Haplotype-aware structural-variant (SV) genotyper for long-read data
SvPhaser phases pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, …).
Think of it as WhatsHap for insertions/deletions/duplications:
- we do not discover SVs
- we assign haplotype genotypes (
0|1,1|0,1|1, or./.) - and compute a Genotype Quality (GQ) score
All in a single, embarrassingly-parallel pass over the genome.
Highlights
- Fast per-chromosome multiprocessing (scale-out on multi-core CPUs).
- Deterministic Δ-based decision logic (no MCMC / HMM).
- CLI + Python API.
- Non-destructive VCF augmentation: injects phasing fields while preserving the original header and records.
- Configurable confidence bins + optional plots.
Installation
From PyPI (recommended)
# Requires Python >= 3.9
pip install svphaser
Optional extras (if you use them):
pip install "svphaser[plots]"
pip install "svphaser[bench]"
pip install "svphaser[dev]"
From source
git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .
Inputs & requirements
SvPhaser expects:
-
Unphased SV VCF (
.vcf/.vcf.gz)- SVs should already be called by your preferred SV caller.
-
HP-tagged BAM (long-read alignments)
- Reads must contain haplotype tags (e.g.,
HP) produced by an upstream phasing pipeline.
- Reads must contain haplotype tags (e.g.,
If your BAM is not HP-tagged, SvPhaser cannot assign haplotypes.
Quick start (CLI)
svphaser phase \
sample_unphased.vcf.gz \
sample.sorted_phased.bam \
--out-dir results/ \
--min-support 10 \
--major-delta 0.70 \
--equal-delta 0.25 \
--gq-bins "30:High,10:Moderate" \
--threads 32
Outputs
Inside results/:
-
*_phased.vcf— your original VCF with additional INFO fields:HP_GT— phased genotypeHP_GQ— genotype quality scoreHP_GQBIN— confidence bin label (based on your--gq-bins)
-
*_phased.csv— tidy table for plotting / downstream analysis
For algorithmic details, see: docs/methodology.md.
Python API
from pathlib import Path
from svphaser.phasing.io import phase_vcf
phase_vcf(
Path("sample.vcf.gz"),
Path("sample.bam"),
out_dir=Path("results"),
min_support=10,
major_delta=0.70,
equal_delta=0.25,
gq_bins="30:High,10:Moderate",
threads=8,
)
The phased table can also be loaded from the generated CSV for custom analytics.
Repository structure (high level)
SvPhaser/
├─ src/svphaser/ # importable package
├─ tests/ # test suite + small fixtures (if present)
├─ docs/ # methodology + notes
├─ notebooks/ # experiments / analysis (if present)
├─ figures/ # plots & diagrams (if present)
├─ pyproject.toml
└─ CHANGELOG.md
Development
git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest -q
mypy src/svphaser
See CONTRIBUTING.md for contribution guidelines.
Citing SvPhaser
If SvPhaser contributed to your research, please cite:
@software{svphaser2025,
author = {Pranjul Mishra and Sachin Gadakh},
title = {SvPhaser: Haplotype-aware structural-variant genotyping from HP-tagged long-read BAMs},
version = {2.0.6},
year = {2025},
month = nov,
url = {https://github.com/SFGLab/SvPhaser},
note = {PyPI: https://pypi.org/project/svphaser/}
}
(If you need maximum rigor for a paper, cite a specific git commit hash too.)
License
SvPhaser is released under the MIT License — see LICENSE.
Contact
Developed by Team 5 (BioAI Hackathon).
- Pranjul Mishra — pranjul.mishra@proton.me
- Sachin Gadakh — s.gadakh@cent.uw.edu.pl
Issues and feature requests: please open a GitHub issue.
### Two hard notes (don’t ignore)
- If you **don’t actually have CI set up**, don’t show a CI badge. A fake badge is worse than no badge.
- If your repo layout doesn’t include `notebooks/figures/tests fixtures`, either adjust that tree block or remove it to avoid “template smell.”
If you want, paste your **current `.github/workflows` filenames** (or tell me if you have none) and I’ll add the *correct* CI badge line too—without guessing.
::contentReference[oaicite:1]{index=1}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svphaser-2.1.0.tar.gz.
File metadata
- Download URL: svphaser-2.1.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23eb2bdf26fe94aca5ceaadb9fc46f6786a292304b07422b5979374ea5cd9d39
|
|
| MD5 |
3c327651ffe4fd3525402551ab0677dc
|
|
| BLAKE2b-256 |
919fe539dc0994b26ca406b9e4b46fc29df502174e364eb20dca828e54d8ddda
|
File details
Details for the file svphaser-2.1.0-py3-none-any.whl.
File metadata
- Download URL: svphaser-2.1.0-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa01badba50dfea64128fe71350b8171edede25b562f4e441b1e539e18728f4d
|
|
| MD5 |
b9bbea6337ea43290a57a22dd81ba29d
|
|
| BLAKE2b-256 |
aaa33f7b5eb5ed27cccceb34d07f529ccaeae70ce1c2c25a0858b23f5ea5ef15
|