Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) genotyper for long-read data

PyPI version Python License


SvPhaser phases pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, …).

Think of it as WhatsHap for insertions/deletions/duplications:

  • we do not discover SVs
  • we assign haplotype genotypes (0|1, 1|0, 1|1, or ./.)
  • and compute a Genotype Quality (GQ) score

All in a single, embarrassingly-parallel pass over the genome.

Highlights

  • Fast per-chromosome multiprocessing (scale-out on multi-core CPUs).
  • Deterministic Δ-based decision logic (no MCMC / HMM).
  • CLI + Python API.
  • Non-destructive VCF augmentation: injects phasing fields while preserving the original header and records.
  • Configurable confidence bins + optional plots.

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras (if you use them):

pip install "svphaser[plots]"
pip install "svphaser[bench]"
pip install "svphaser[dev]"

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser expects:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • SVs should already be called by your preferred SV caller.
  2. HP-tagged BAM (long-read alignments)

    • Reads must contain haplotype tags (e.g., HP) produced by an upstream phasing pipeline.

If your BAM is not HP-tagged, SvPhaser cannot assign haplotypes.

Quick start (CLI)

svphaser phase \
    sample_unphased.vcf.gz \
    sample.sorted_phased.bam \
    --out-dir results/ \
    --min-support 10 \
    --major-delta 0.70 \
    --equal-delta 0.25 \
    --gq-bins "30:High,10:Moderate" \
    --threads 32

Outputs

Inside results/:

  • *_phased.vcf — your original VCF with additional INFO fields:

    • HP_GT — phased genotype
    • HP_GQ — genotype quality score
    • HP_GQBIN — confidence bin label (based on your --gq-bins)
  • *_phased.csv — tidy table for plotting / downstream analysis

For algorithmic details, see: docs/methodology.md.

Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.bam"),
    out_dir=Path("results"),
    min_support=10,
    major_delta=0.70,
    equal_delta=0.25,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

The phased table can also be loaded from the generated CSV for custom analytics.

Repository structure (high level)

SvPhaser/
├─ src/svphaser/         # importable package
├─ tests/                # test suite + small fixtures (if present)
├─ docs/                 # methodology + notes
├─ notebooks/            # experiments / analysis (if present)
├─ figures/              # plots & diagrams (if present)
├─ pyproject.toml
└─ CHANGELOG.md

Development

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser

python -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"
pytest -q
mypy src/svphaser

See CONTRIBUTING.md for contribution guidelines.

Citing SvPhaser

If SvPhaser contributed to your research, please cite:

@software{svphaser2025,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware structural-variant genotyping from HP-tagged long-read BAMs},
  version = {2.0.6},
  year    = {2025},
  month   = nov,
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

(If you need maximum rigor for a paper, cite a specific git commit hash too.)

License

SvPhaser is released under the MIT License — see LICENSE.

Contact

Developed by Team 5 (BioAI Hackathon).

Issues and feature requests: please open a GitHub issue.


### Two hard notes (don’t ignore)
- If you **don’t actually have CI set up**, don’t show a CI badge. A fake badge is worse than no badge.
- If your repo layout doesn’t include `notebooks/figures/tests fixtures`, either adjust that tree block or remove it to avoid “template smell.”

If you want, paste your **current `.github/workflows` filenames** (or tell me if you have none) and I’ll add the *correct* CI badge line too—without guessing.
::contentReference[oaicite:1]{index=1}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.3.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.3-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.3.tar.gz.

File metadata

  • Download URL: svphaser-2.1.3.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.3.tar.gz
Algorithm Hash digest
SHA256 727f2ed11a075c1d14b1e52f6add0027d7ec03917d752fcc89a02cadffd8e028
MD5 2043d88ec6328688b173bdeded1bfa6e
BLAKE2b-256 856bd94969472e23e4575fb5c8f9ad9691fd7a8967edf841faed8b6aaf0d3cbc

See more details on using hashes here.

File details

Details for the file svphaser-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 942ff6758eeb80cd78f0868ff1e0c10209da5883c54b89c0aa03bbadd637b85d
MD5 c60c078ec52cf3e428940a402f0390de
BLAKE2b-256 ff99a5b73bd93d6880b1ebe152bd6b2ac5eef89a7738304cac963e0e3ddc0634

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page