Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) genotyper for long-read data

PyPI version Python License


SvPhaser phases pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, …).

Think of it as WhatsHap for insertions/deletions/duplications:

  • we do not discover SVs
  • we assign haplotype genotypes (0|1, 1|0, 1|1, or ./.)
  • and compute a Genotype Quality (GQ) score

All in a single, embarrassingly-parallel pass over the genome.

Highlights

  • Fast per-chromosome multiprocessing (scale-out on multi-core CPUs).
  • Deterministic Δ-based decision logic (no MCMC / HMM).
  • CLI + Python API.
  • Non-destructive VCF augmentation: injects phasing fields while preserving the original header and records.
  • Configurable confidence bins + optional plots.

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras (if you use them):

pip install "svphaser[plots]"
pip install "svphaser[bench]"
pip install "svphaser[dev]"

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser expects:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • SVs should already be called by your preferred SV caller.
  2. HP-tagged BAM (long-read alignments)

    • Reads must contain haplotype tags (e.g., HP) produced by an upstream phasing pipeline.

If your BAM is not HP-tagged, SvPhaser cannot assign haplotypes.

Quick start (CLI)

svphaser phase \
    sample_unphased.vcf.gz \
    sample.sorted_phased.bam \
    --out-dir results/ \
    --min-support 10 \
    --major-delta 0.70 \
    --equal-delta 0.25 \
    --gq-bins "30:High,10:Moderate" \
    --threads 32

Outputs

Inside results/:

  • *_phased.vcf — your original VCF with additional INFO fields:

    • HP_GT — phased genotype
    • HP_GQ — genotype quality score
    • HP_GQBIN — confidence bin label (based on your --gq-bins)
  • *_phased.csv — tidy table for plotting / downstream analysis

For algorithmic details, see: docs/methodology.md.

Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.bam"),
    out_dir=Path("results"),
    min_support=10,
    major_delta=0.70,
    equal_delta=0.25,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

The phased table can also be loaded from the generated CSV for custom analytics.

Repository structure (high level)

SvPhaser/
├─ src/svphaser/         # importable package
├─ tests/                # test suite + small fixtures (if present)
├─ docs/                 # methodology + notes
├─ notebooks/            # experiments / analysis (if present)
├─ figures/              # plots & diagrams (if present)
├─ pyproject.toml
└─ CHANGELOG.md

Development

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser

python -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"
pytest -q
mypy src/svphaser

See CONTRIBUTING.md for contribution guidelines.

Citing SvPhaser

If SvPhaser contributed to your research, please cite:

@software{svphaser2025,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware structural-variant genotyping from HP-tagged long-read BAMs},
  version = {2.0.6},
  year    = {2025},
  month   = nov,
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

(If you need maximum rigor for a paper, cite a specific git commit hash too.)

License

SvPhaser is released under the MIT License — see LICENSE.

Contact

Developed by Team 5 (BioAI Hackathon).

Issues and feature requests: please open a GitHub issue.


### Two hard notes (don’t ignore)
- If you **don’t actually have CI set up**, don’t show a CI badge. A fake badge is worse than no badge.
- If your repo layout doesn’t include `notebooks/figures/tests fixtures`, either adjust that tree block or remove it to avoid “template smell.”

If you want, paste your **current `.github/workflows` filenames** (or tell me if you have none) and I’ll add the *correct* CI badge line too—without guessing.
::contentReference[oaicite:1]{index=1}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.2.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.2-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.2.tar.gz.

File metadata

  • Download URL: svphaser-2.1.2.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.2.tar.gz
Algorithm Hash digest
SHA256 725f8a4c0710dfbfe555f0b40cf5ce5a7d6439de15a67084ff4769da8b6396c5
MD5 a89fdf54a7eec121230a4cb4a1022b65
BLAKE2b-256 b4a2324396631da35ea01bddd27d0046d9f2c1c7cc8c730825f20476072531ed

See more details on using hashes here.

File details

Details for the file svphaser-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 be023f442ec7403f68fa327bba336038cefbe373376d1f78f7711e3ab8c4c1ff
MD5 e55c9022f54b282b84f709feecd9635c
BLAKE2b-256 7c5cdf2ee974e0dd1cc6e51842ca3d791ebd054ec14a6bcdf6f7b92524d67a44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page