Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data

PyPI version Python License


SvPhaser assigns haplotype-aware genotypes to pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, etc.).

It fills a critical gap in long-read SV analysis:

  • SV callers (e.g. Sniffles2) discover variants
  • SvPhaser phases and genotypes them (1|0, 0|1, 1|1, or ./.)
  • with explicit read-level evidence and a quantitative genotype quality (GQ)

SvPhaser is caller-agnostic, deterministic, and designed for large-scale benchmarking and biological interpretation.


Key features

  • Post-hoc SV phasing from HP-tagged BAM/CRAM (no re-calling required)
  • Per-chromosome parallelization (efficient on HPC and multi-core systems)
  • SV-type-aware evidence detection (DEL / INS / INV / BND / DUP)
  • Deterministic Δ-based decision logic (no HMMs, no sampling)
  • Explicit confidence modeling via GQ and reason codes
  • CSV-first design for transparent benchmarking and debugging
  • VCF-compliant output with rich SVP_* INFO annotations

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras:

pip install "svphaser[plots]"   # plotting utilities
pip install "svphaser[bench]"   # benchmarking helpers
pip install "svphaser[dev]"     # development + linting

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser requires two inputs only:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • Produced by an SV caller (e.g. Sniffles2)
    • May optionally contain RNAMES INFO for precise read support
  2. HP-tagged BAM/CRAM

    • Long-read alignments with haplotype tags (HP=1/2)
    • Generated by an upstream phasing pipeline (e.g. WhatsHap)

⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.


Quick start (CLI)

svphaser phase \
  sample_unphased.vcf.gz \
  sample.sorted_phased.bam \
  --out-dir results/ \
  --min-support 10 \
  --min-tagged-support 3 \
  --major-delta 0.60 \
  --equal-delta 0.10 \
  --support-mode hybrid \
  --dynamic-window \
  --tie-to-hom-alt \
  --gq-bins "30:High,10:Moderate" \
  --threads 32

Outputs

For an input sample.vcf.gz, SvPhaser produces:

  • sample_phased.csvprimary analysis artifact

    • Per-SV read support (hp1, hp2, nohp)
    • Derived metrics (tagged_total, support_total, Δ)
    • Final decisions (gt, gq, reason)
  • sample_phased.vcf(.gz) — interoperability output

    • FORMAT/GT, FORMAT/GQ
    • Optional SVP_* INFO annotations when --svp-info is enabled

The CSV is intended for benchmarking, visualization, and interpretation; the VCF is a downstream-consumable representation.


Algorithm & methodology

A full, implementation-faithful description of the algorithm—including:

  • evidence collection
  • haplotype decision logic
  • pseudoalgorithm
  • workflow diagram

is provided in:

➡️ docs/Methodology.md

This document is the authoritative reference for reviewers and users seeking algorithmic clarity.


Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.sorted_phased.bam"),
    out_dir=Path("results"),
    min_support=10,
    min_tagged_support=3,
    major_delta=0.60,
    equal_delta=0.10,
    support_mode="hybrid",
    dynamic_window=True,
    tie_to_hom_alt=True,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

Repository structure

SvPhaser/
├─ src/svphaser/        # core package
├─ docs/                # methodology & design notes
├─ tests/               # unit + regression tests
├─ notebooks/           # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md

Citing SvPhaser

If SvPhaser contributes to your research, please cite:

@software{svphaser2026,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
  version = {2.1.x},
  year    = {2026},
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

For maximum reproducibility, include the exact git commit hash used.


License

SvPhaser is released under the MIT License — see LICENSE.


Contact

Developed at SFG Lab (BioAI).

Bug reports and feature requests: please open a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.6.post1.dev1.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.6.post1.dev1-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.6.post1.dev1.tar.gz.

File metadata

  • Download URL: svphaser-2.1.6.post1.dev1.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.6.post1.dev1.tar.gz
Algorithm Hash digest
SHA256 f6b120478070e759ec19d0082a6fbdd2661a81b1206fd5a998d999d6813726fc
MD5 5b6826fc3fd0aa282c1c81f0064fde10
BLAKE2b-256 8e8619a8bd9416db802a580b2e7d2e063cfc2e24a3a1bb8f04943cb38b0b02b7

See more details on using hashes here.

File details

Details for the file svphaser-2.1.6.post1.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for svphaser-2.1.6.post1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea5a07f7fe32bd2f4ab7d841e51fc7aa6d99b4c19923674a8197880bdfd9adad
MD5 65aff1fc024f09c8c0cc87e703cedf05
BLAKE2b-256 a4ab88c242985156d3010b222966d148ad32a8994e2abdd84a7fc7999df1b117

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page