Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data

PyPI version Python License


SvPhaser assigns haplotype-aware genotypes to pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, etc.).

It fills a critical gap in long-read SV analysis:

  • SV callers (e.g. Sniffles2) discover variants
  • SvPhaser phases and genotypes them (1|0, 0|1, 1|1, or ./.)
  • with explicit read-level evidence and a quantitative genotype quality (GQ)

SvPhaser is caller-agnostic, deterministic, and designed for large-scale benchmarking and biological interpretation.


Key features

  • Post-hoc SV phasing from HP-tagged BAM/CRAM (no re-calling required)
  • Per-chromosome parallelization (efficient on HPC and multi-core systems)
  • SV-type-aware evidence detection (DEL / INS / INV / BND / DUP)
  • Deterministic Δ-based decision logic (no HMMs, no sampling)
  • Explicit confidence modeling via GQ and reason codes
  • CSV-first design for transparent benchmarking and debugging
  • VCF-compliant output with rich SVP_* INFO annotations

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras:

pip install "svphaser[plots]"   # plotting utilities
pip install "svphaser[bench]"   # benchmarking helpers
pip install "svphaser[dev]"     # development + linting

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser requires two inputs only:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • Produced by an SV caller (e.g. Sniffles2)
    • May optionally contain RNAMES INFO for precise read support
  2. HP-tagged BAM/CRAM

    • Long-read alignments with haplotype tags (HP=1/2)
    • Generated by an upstream phasing pipeline (e.g. WhatsHap)

⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.


Quick start (CLI)

svphaser phase \
  sample_unphased.vcf.gz \
  sample.sorted_phased.bam \
  --out-dir results/ \
  --min-support 10 \
  --min-tagged-support 3 \
  --major-delta 0.60 \
  --equal-delta 0.10 \
  --support-mode hybrid \
  --dynamic-window \
  --tie-to-hom-alt \
  --gq-bins "30:High,10:Moderate" \
  --threads 32

Outputs

For an input sample.vcf.gz, SvPhaser produces:

  • sample_phased.csvprimary analysis artifact

    • Per-SV read support (hp1, hp2, nohp)
    • Derived metrics (tagged_total, support_total, Δ)
    • Final decisions (gt, gq, reason)
  • sample_phased.vcf(.gz) — interoperability output

    • FORMAT/GT, FORMAT/GQ
    • Optional SVP_* INFO annotations when --svp-info is enabled

The CSV is intended for benchmarking, visualization, and interpretation; the VCF is a downstream-consumable representation.


Algorithm & methodology

A full, implementation-faithful description of the algorithm—including:

  • evidence collection
  • haplotype decision logic
  • pseudoalgorithm
  • workflow diagram

is provided in:

➡️ docs/Methodology.md

This document is the authoritative reference for reviewers and users seeking algorithmic clarity.


Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.sorted_phased.bam"),
    out_dir=Path("results"),
    min_support=10,
    min_tagged_support=3,
    major_delta=0.60,
    equal_delta=0.10,
    support_mode="hybrid",
    dynamic_window=True,
    tie_to_hom_alt=True,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

Repository structure

SvPhaser/
├─ src/svphaser/        # core package
├─ docs/                # methodology & design notes
├─ tests/               # unit + regression tests
├─ notebooks/           # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md

Citing SvPhaser

If SvPhaser contributes to your research, please cite:

@software{svphaser2026,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
  version = {2.1.x},
  year    = {2026},
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

For maximum reproducibility, include the exact git commit hash used.


License

SvPhaser is released under the MIT License — see LICENSE.


Contact

Developed at SFG Lab (BioAI).

Bug reports and feature requests: please open a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.7.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.7-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.7.tar.gz.

File metadata

  • Download URL: svphaser-2.1.7.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.7.tar.gz
Algorithm Hash digest
SHA256 04fe3b710ce9b32e46329ca421704f0b3fbbcb4edc0e62653ee98413d0f3486c
MD5 96ff6dcbbfadc4bce4b295652212a219
BLAKE2b-256 ef1b89f7015f9818f4d9b6270eea05cc9d1974b579c36259d9f0b822ebd9f99d

See more details on using hashes here.

File details

Details for the file svphaser-2.1.7-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.1.7-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2de39f425c47d85bb5327e197f14e3d18cbe812aca4fb21c5bd94f7f34632184
MD5 448a6b65409bb0ab8d6c744f6b1dced2
BLAKE2b-256 93f013882eff4752a2c1f928673143e91f72eb61cc97aa96913a0d594985288f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page