Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype-aware structural-variant (SV) genotyper for long-read data

PyPI version Python License


SvPhaser phases pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, …).

Think of it as WhatsHap for insertions/deletions/duplications:

  • we do not discover SVs
  • we assign haplotype genotypes (0|1, 1|0, 1|1, or ./.)
  • and compute a Genotype Quality (GQ) score

All in a single, embarrassingly-parallel pass over the genome.

Highlights

  • Fast per-chromosome multiprocessing (scale-out on multi-core CPUs).
  • Deterministic Δ-based decision logic (no MCMC / HMM).
  • CLI + Python API.
  • Non-destructive VCF augmentation: injects phasing fields while preserving the original header and records.
  • Configurable confidence bins + optional plots.

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras (if you use them):

pip install "svphaser[plots]"
pip install "svphaser[bench]"
pip install "svphaser[dev]"

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser expects:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • SVs should already be called by your preferred SV caller.
  2. HP-tagged BAM (long-read alignments)

    • Reads must contain haplotype tags (e.g., HP) produced by an upstream phasing pipeline.

If your BAM is not HP-tagged, SvPhaser cannot assign haplotypes.

Quick start (CLI)

svphaser phase \
    sample_unphased.vcf.gz \
    sample.sorted_phased.bam \
    --out-dir results/ \
    --min-support 10 \
    --major-delta 0.70 \
    --equal-delta 0.25 \
    --gq-bins "30:High,10:Moderate" \
    --threads 32

Outputs

Inside results/:

  • *_phased.vcf — your original VCF with additional INFO fields:

    • HP_GT — phased genotype
    • HP_GQ — genotype quality score
    • HP_GQBIN — confidence bin label (based on your --gq-bins)
  • *_phased.csv — tidy table for plotting / downstream analysis

For algorithmic details, see: docs/methodology.md.

Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.bam"),
    out_dir=Path("results"),
    min_support=10,
    major_delta=0.70,
    equal_delta=0.25,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

The phased table can also be loaded from the generated CSV for custom analytics.

Repository structure (high level)

SvPhaser/
├─ src/svphaser/         # importable package
├─ tests/                # test suite + small fixtures (if present)
├─ docs/                 # methodology + notes
├─ notebooks/            # experiments / analysis (if present)
├─ figures/              # plots & diagrams (if present)
├─ pyproject.toml
└─ CHANGELOG.md

Development

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser

python -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"
pytest -q
mypy src/svphaser

See CONTRIBUTING.md for contribution guidelines.

Citing SvPhaser

If SvPhaser contributed to your research, please cite:

@software{svphaser2025,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware structural-variant genotyping from HP-tagged long-read BAMs},
  version = {2.0.6},
  year    = {2025},
  month   = nov,
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

(If you need maximum rigor for a paper, cite a specific git commit hash too.)

License

SvPhaser is released under the MIT License — see LICENSE.

Contact

Developed by Team 5 (BioAI Hackathon).

Issues and feature requests: please open a GitHub issue.


### Two hard notes (don’t ignore)
- If you **don’t actually have CI set up**, don’t show a CI badge. A fake badge is worse than no badge.
- If your repo layout doesn’t include `notebooks/figures/tests fixtures`, either adjust that tree block or remove it to avoid “template smell.”

If you want, paste your **current `.github/workflows` filenames** (or tell me if you have none) and I’ll add the *correct* CI badge line too—without guessing.
::contentReference[oaicite:1]{index=1}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.1.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.1.0-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.1.0.tar.gz.

File metadata

  • Download URL: svphaser-2.1.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.0.tar.gz
Algorithm Hash digest
SHA256 23eb2bdf26fe94aca5ceaadb9fc46f6786a292304b07422b5979374ea5cd9d39
MD5 3c327651ffe4fd3525402551ab0677dc
BLAKE2b-256 919fe539dc0994b26ca406b9e4b46fc29df502174e364eb20dca828e54d8ddda

See more details on using hashes here.

File details

Details for the file svphaser-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa01badba50dfea64128fe71350b8171edede25b562f4e441b1e539e18728f4d
MD5 b9bbea6337ea43290a57a22dd81ba29d
BLAKE2b-256 aaa33f7b5eb5ed27cccceb34d07f529ccaeae70ce1c2c25a0858b23f5ea5ef15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page