Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data

PyPI version Python License


SvPhaser assigns haplotype-aware genotypes to pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, etc.).

It fills a critical gap in long-read SV analysis:

  • SV callers (e.g. Sniffles2) discover variants
  • SvPhaser phases and genotypes them (1|0, 0|1, 1|1, or ./.)
  • with explicit read-level evidence and a quantitative genotype quality (GQ)

SvPhaser is caller-agnostic, deterministic, and designed for large-scale benchmarking and biological interpretation.


Key features

  • Post-hoc SV phasing from HP-tagged BAM/CRAM (no re-calling required)
  • Per-chromosome parallelization (efficient on HPC and multi-core systems)
  • SV-type-aware evidence detection (DEL / INS / INV / BND / DUP)
  • Deterministic Δ-based decision logic (no HMMs, no sampling)
  • Explicit confidence modeling via GQ and reason codes
  • CSV-first design for transparent benchmarking and debugging
  • VCF-compliant output with rich SVP_* INFO annotations

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras:

pip install "svphaser[plots]"   # plotting utilities
pip install "svphaser[bench]"   # benchmarking helpers
pip install "svphaser[dev]"     # development + linting

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser requires two inputs only:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • Produced by an SV caller (e.g. Sniffles2)
    • May optionally contain RNAMES INFO for precise read support
  2. HP-tagged BAM/CRAM

    • Long-read alignments with haplotype tags (HP=1/2)
    • Generated by an upstream phasing pipeline (e.g. WhatsHap)

⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.


Quick start (CLI)

svphaser phase \
  sample_unphased.vcf.gz \
  sample.sorted_phased.bam \
  --out-dir results/ \
  --min-support 10 \
  --min-tagged-support 3 \
  --major-delta 0.60 \
  --equal-delta 0.10 \
  --support-mode hybrid \
  --dynamic-window \
  --tie-to-hom-alt \
  --gq-bins "30:High,10:Moderate" \
  --threads 32

Outputs

For an input sample.vcf.gz, SvPhaser produces:

  • sample_phased.csvprimary analysis artifact

    • Per-SV read support (hp1, hp2, nohp)
    • Derived metrics (tagged_total, support_total, Δ)
    • Final decisions (gt, gq, reason)
  • sample_phased.vcf(.gz) — interoperability output

    • FORMAT/GT, FORMAT/GQ
    • Optional SVP_* INFO annotations when --svp-info is enabled

The CSV is intended for benchmarking, visualization, and interpretation; the VCF is a downstream-consumable representation.


Algorithm & methodology

A full, implementation-faithful description of the algorithm—including:

  • evidence collection
  • haplotype decision logic
  • pseudoalgorithm
  • workflow diagram

is provided in:

➡️ docs/Methodology.md

This document is the authoritative reference for reviewers and users seeking algorithmic clarity.


Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.sorted_phased.bam"),
    out_dir=Path("results"),
    min_support=10,
    min_tagged_support=3,
    major_delta=0.60,
    equal_delta=0.10,
    support_mode="hybrid",
    dynamic_window=True,
    tie_to_hom_alt=True,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

Repository structure

SvPhaser/
├─ src/svphaser/        # core package
├─ docs/                # methodology & design notes
├─ tests/               # unit + regression tests
├─ notebooks/           # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md

Citing SvPhaser

If SvPhaser contributes to your research, please cite:

@software{svphaser2026,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
  version = {2.1.x},
  year    = {2026},
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

For maximum reproducibility, include the exact git commit hash used.


License

SvPhaser is released under the MIT License — see LICENSE.


Contact

Developed at SFG Lab (BioAI).

Bug reports and feature requests: please open a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.6.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.6-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.6.tar.gz.

File metadata

  • Download URL: svphaser-2.1.6.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.6.tar.gz
Algorithm Hash digest
SHA256 655c2d0f58fb5c10416971329bf69a80ef1e92e95eb64e73c2b73f64d938f662
MD5 40c99718cf8833933088e0d7315e095d
BLAKE2b-256 ed67de9800d1287ef6edda5eb4b1cd159b70dcd10a723277ffa760f8e537bd03

See more details on using hashes here.

File details

Details for the file svphaser-2.1.6-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.1.6-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 153cb76ee92e97c05adc0ddb5bc3bfd99fdfa836a82309961f9bfa7f27e13dfa
MD5 87b94365b2566bb6c710211d8395d292
BLAKE2b-256 256ea3b831d204365d5a78ac5e4c23f6f51649f62570e9188ea3f70f3e1d600c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page