Skip to main content

Structural-variant phasing from HP-tagged long-read BAMs

Project description

SvPhaser

Haplotype‑aware structural‑variant genotyper for long‑read data

PyPI version Tests License


SvPhaser phases pre‑called structural variants (SVs) using HP‑tagged long‑read alignments (PacBio HiFi, ONT Q20+, …). Think of it as WhatsHap for insertions/deletions/duplications: we do not discover SVs; we assign each variant a haplotype genotype (0|1, 1|0, 1|1, or ./.) together with a Genotype Quality (GQ) score – all in a single, embarrassingly‑parallel pass over the genome.

Key highlights

  • Fast, per‑chromosome multiprocessing – linear scale‑out on 32‑core workstations.
  • Deterministic Δ‑based decision tree – no MCMC or hidden state machines.
  • Friendly CLI (svphaser phase …) and importable Python API.
  • Seamless VCF injection – adds HP_GT, HP_GQ, HP_GQBIN INFO tags while copying the original header verbatim.
  • Configurable confidence bins and publication‑ready plots (see result_images/).

Installation

# Requires Python ≥3.9
pip install svphaser            # PyPI (coming soon)
# or
pip install git+https://github.com/your‑org/SvPhaser.git@v0.2.0

cyvcf2, pysam, typer[all], and pandas are pulled in automatically.

Quick‑start

svphaser phase \
    sample_unphased.vcf.gz \
    sample.sorted_phased.bam \
    --out-dir results/ \
    --min-support 10 \
    --major-delta 0.70 \
    --equal-delta 0.25 \
    --gq-bins "30:High,10:Moderate" \
    --threads 32

Outputs (written inside results/)

sample_unphased_phased.vcf   # original VCF + HP_* INFO fields
sample_unphased_phased.csv   # tidy table for plotting / downstream R

See docs/methodology.md and the flow‑chart below for algorithmic details.

SvPhaser methodology

Folder layout

SvPhaser/
├─ src/svphaser/        # importable package
│  ├─ cli.py            # Typer entry‑point
│  ├─ logging.py        # unified log setup
│  └─ phasing/
│     ├─ algorithms.py  # core maths
│     ├─ io.py          # driver & I/O
│     ├─ _workers.py    # per‑chrom processes
│     └─ types.py       # thin dataclasses
├─ tests/               # pytest suite + mini data
├─ docs/                # extra documentation
├─ result_images/       # generated plots & diagrams
└─ CHANGELOG.md

Python usage

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.bam"),
    out_dir=Path("results"),
    min_support=10,
    major_delta=0.70,
    equal_delta=0.25,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

The resulting DataFrame can be loaded from the CSV for custom analytics.

Development & contributing

  1. Clone and create a virtual env:

    git clone https://github.com/your‑org/SvPhaser.git && cd SvPhaser
    python -m venv .venv && source .venv/bin/activate
    pip install -e .[dev]
    
  2. Run the test‑suite & type checks:

    pytest -q
    mypy src/svphaser
    black --check src tests
    
  3. Send a PR targeting the dev branch; one topic per PR.

Please read CONTRIBUTING.md (to come) for style‑guides and the DCO sign‑off.

Citing SvPhaser

If SvPhaser contributed to your research, please cite:

@software{svphaser2024,
  author       = {Pranjul Mishra, Sachin Ghadak, CeNT Lab},
  title        = {SvPhaser: haplotype‑aware SV genotyping},
  version      = {0.2.0},
  date         = {2024-06-18},
  url          = {https://github.com/your‑org/SvPhaser}
}

License

SvPhaser is released under the MIT License – see LICENSE.

📬 Contact

Developed by Team5 (BioAI Hackathon) – Sachin Gadakh & Pranjul Mishra.

Lead contacts: • pranjul.mishra@proton.mes.gadakh@cent.uw.edu.pl

Feedback, feature requests and bug reports are all appreciated — feel free to open a GitHub issue or reach out by e‑mail.


Happy phasing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svphaser-2.0.4.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svphaser-2.0.4-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file svphaser-2.0.4.tar.gz.

File metadata

  • Download URL: svphaser-2.0.4.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.0.4.tar.gz
Algorithm Hash digest
SHA256 628e8ecb13c419773e0d47bc1ba3f4004b8669aaf1a48b89fc631b769f1eb0a7
MD5 b6a14085d8d245aee88bc07dece2de64
BLAKE2b-256 80c952662bc9fc10c28684bd25328e93949c7a5a23c7dab67cb6c25aa8c318e3

See more details on using hashes here.

File details

Details for the file svphaser-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: svphaser-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for svphaser-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e7042a6d4ace709d67f2836c6b70f4367a1c1309cc2d6701f478d14889a02bbd
MD5 421296b8e4fa2aff6f5d030385548589
BLAKE2b-256 2f84caab15f6b20288d68a59c782f32006c060dc76073f3cc592b1ed377e3791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page