Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype‑aware structural‑variant genotyper for long‑read data

PyPI version Tests License


SvPhaser phases pre‑called structural variants (SVs) using HP‑tagged long‑read alignments (PacBio HiFi, ONT Q20+, …). Think of it as WhatsHap for insertions/deletions/duplications: we do not discover SVs; we assign each variant a haplotype genotype (0|1, 1|0, 1|1, or ./.) together with a Genotype Quality (GQ) score – all in a single, embarrassingly‑parallel pass over the genome.

Key highlights

  • Fast, per‑chromosome multiprocessing – linear scale‑out on 32‑core workstations.
  • Deterministic Δ‑based decision tree – no MCMC or hidden state machines.
  • Friendly CLI (svphaser phase …) and importable Python API.
  • Seamless VCF injection – adds HP_GT, HP_GQ, HP_GQBIN INFO tags while copying the original header verbatim.
  • Configurable confidence bins and publication‑ready plots (see result_images/).

Installation

# Requires Python ≥3.9
pip install svphaser            # PyPI (coming soon)
# or
pip install git+https://github.com/your‑org/SvPhaser.git@v0.2.0

cyvcf2, pysam, typer[all], and pandas are pulled in automatically.

Quick‑start

svphaser phase \
    sample_unphased.vcf.gz \
    sample.sorted_phased.bam \
    --out-dir results/ \
    --min-support 10 \
    --major-delta 0.70 \
    --equal-delta 0.25 \
    --gq-bins "30:High,10:Moderate" \
    --threads 32

Outputs (written inside results/)

sample_unphased_phased.vcf   # original VCF + HP_* INFO fields
sample_unphased_phased.csv   # tidy table for plotting / downstream R

See docs/methodology.md and the flow‑chart below for algorithmic details.

SvPhaser methodology

Folder layout

SvPhaser/
├─ src/svphaser/        # importable package
│  ├─ cli.py            # Typer entry‑point
│  ├─ logging.py        # unified log setup
│  └─ phasing/
│     ├─ algorithms.py  # core maths
│     ├─ io.py          # driver & I/O
│     ├─ _workers.py    # per‑chrom processes
│     └─ types.py       # thin dataclasses
├─ tests/               # pytest suite + mini data
├─ docs/                # extra documentation
├─ result_images/       # generated plots & diagrams
└─ CHANGELOG.md

Python usage

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.bam"),
    out_dir=Path("results"),
    min_support=10,
    major_delta=0.70,
    equal_delta=0.25,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

The resulting DataFrame can be loaded from the CSV for custom analytics.

Development & contributing

  1. Clone and create a virtual env:

    git clone https://github.com/your‑org/SvPhaser.git && cd SvPhaser
    python -m venv .venv && source .venv/bin/activate
    pip install -e .[dev]
    
  2. Run the test‑suite & type checks:

    pytest -q
    mypy src/svphaser
    black --check src tests
    
  3. Send a PR targeting the dev branch; one topic per PR.

Please read CONTRIBUTING.md (to come) for style‑guides and the DCO sign‑off.

Citing SvPhaser

If SvPhaser contributed to your research, please cite:

@software{svphaser2024,
  author       = {Pranjul Mishra, Sachin Ghadak, CeNT Lab},
  title        = {SvPhaser: haplotype‑aware SV genotyping},
  version      = {0.2.0},
  date         = {2024-06-18},
  url          = {https://github.com/your‑org/SvPhaser}
}

License

SvPhaser is released under the MIT License – see LICENSE.

📬 Contact

Developed by Team5 (BioAI Hackathon) – Sachin Gadakh & Pranjul Mishra.

Lead contacts: • pranjul.mishra@proton.mes.gadakh@cent.uw.edu.pl

Feedback, feature requests and bug reports are all appreciated — feel free to open a GitHub issue or reach out by e‑mail.


Happy phasing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.0.6.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.0.6-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.0.6.tar.gz.

File metadata

  • Download URL: svphaser-2.0.6.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.0.6.tar.gz
Algorithm Hash digest
SHA256 0deeb61d6bfc30ca032362583ad257abb47a1492bd798db3051433c7ace056df
MD5 bf4b926c4808c69ce12ce6b7b2773bfe
BLAKE2b-256 2d60177253937db1226dd5cb96613722e4f60df8f948fe6e71d77d36fa75b1c4

See more details on using hashes here.

File details

Details for the file svphaser-2.0.6-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.0.6-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 53f96cfc586fbc99c8af92e790b21fc65c32a4706b3c3426ad7960dfb01b31aa
MD5 d4a2c1fc501b0c65c88b71b5d01722c4
BLAKE2b-256 966427b01df76e376fb37cbae391acdeb41edc8bcd852c803c51e7db2d1f4f9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page