Structural-variant phasing from HP-tagged long-read BAMs
Project description
SvPhaser
Haplotype‑aware structural‑variant genotyper for long‑read data
SvPhaser phases pre‑called structural variants (SVs) using HP‑tagged long‑read alignments (PacBio HiFi, ONT Q20+, …). Think of it as WhatsHap for insertions/deletions/duplications: we do not discover SVs; we assign each variant a haplotype genotype (0|1, 1|0, 1|1, or ./.) together with a Genotype Quality (GQ) score – all in a single, embarrassingly‑parallel pass over the genome.
Key highlights
- Fast, per‑chromosome multiprocessing – linear scale‑out on 32‑core workstations.
- Deterministic Δ‑based decision tree – no MCMC or hidden state machines.
- Friendly CLI (
svphaser phase …) and importable Python API. - Seamless VCF injection – adds
HP_GT,HP_GQ,HP_GQBININFO tags while copying the original header verbatim. - Configurable confidence bins and publication‑ready plots (see
result_images/).
Installation
# Requires Python ≥3.9
pip install svphaser # PyPI (coming soon)
# or
pip install git+https://github.com/your‑org/SvPhaser.git@v0.2.0
cyvcf2, pysam, typer[all], and pandas are pulled in automatically.
Quick‑start
svphaser phase \
sample_unphased.vcf.gz \
sample.sorted_phased.bam \
--out-dir results/ \
--min-support 10 \
--major-delta 0.70 \
--equal-delta 0.25 \
--gq-bins "30:High,10:Moderate" \
--threads 32
Outputs (written inside results/)
sample_unphased_phased.vcf # original VCF + HP_* INFO fields
sample_unphased_phased.csv # tidy table for plotting / downstream R
See docs/methodology.md and the flow‑chart below for algorithmic details.
Folder layout
SvPhaser/
├─ src/svphaser/ # importable package
│ ├─ cli.py # Typer entry‑point
│ ├─ logging.py # unified log setup
│ └─ phasing/
│ ├─ algorithms.py # core maths
│ ├─ io.py # driver & I/O
│ ├─ _workers.py # per‑chrom processes
│ └─ types.py # thin dataclasses
├─ tests/ # pytest suite + mini data
├─ docs/ # extra documentation
├─ result_images/ # generated plots & diagrams
└─ CHANGELOG.md
Python usage
from pathlib import Path
from svphaser.phasing.io import phase_vcf
phase_vcf(
Path("sample.vcf.gz"),
Path("sample.bam"),
out_dir=Path("results"),
min_support=10,
major_delta=0.70,
equal_delta=0.25,
gq_bins="30:High,10:Moderate",
threads=8,
)
The resulting DataFrame can be loaded from the CSV for custom analytics.
Development & contributing
-
Clone and create a virtual env:
git clone https://github.com/your‑org/SvPhaser.git && cd SvPhaser python -m venv .venv && source .venv/bin/activate pip install -e .[dev]
-
Run the test‑suite & type checks:
pytest -q mypy src/svphaser black --check src tests
-
Send a PR targeting the
devbranch; one topic per PR.
Please read CONTRIBUTING.md (to come) for style‑guides and the DCO sign‑off.
Citing SvPhaser
If SvPhaser contributed to your research, please cite:
@software{svphaser2024,
author = {Pranjul Mishra, Sachin Ghadak, CeNT Lab},
title = {SvPhaser: haplotype‑aware SV genotyping},
version = {0.2.0},
date = {2024-06-18},
url = {https://github.com/your‑org/SvPhaser}
}
License
SvPhaser is released under the MIT License – see LICENSE.
📬 Contact
Developed by Team5 (BioAI Hackathon) – Sachin Gadakh & Pranjul Mishra.
Lead contacts: • pranjul.mishra@proton.me • s.gadakh@cent.uw.edu.pl
Feedback, feature requests and bug reports are all appreciated — feel free to open a GitHub issue or reach out by e‑mail.
Happy phasing!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svphaser-2.0.2.tar.gz.
File metadata
- Download URL: svphaser-2.0.2.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52335f756a72e3b791b94e6fc17c28cad8f7bfc5fb89ca66992b2760bc9494a9
|
|
| MD5 |
e6bfded6fdf55824d2aa7bcd0e21aa50
|
|
| BLAKE2b-256 |
65a10378d0e9ea203237acfd25628549f312cc015fcfe6df28f28ec049dcebcf
|
File details
Details for the file svphaser-2.0.2-py3-none-any.whl.
File metadata
- Download URL: svphaser-2.0.2-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
987825c51e0fb63b44882d58e553f02ae37510667ef9a7d62e27fb2fb7a8c077
|
|
| MD5 |
acf244a03709f07ac72985a208538890
|
|
| BLAKE2b-256 |
99d401fc5228ac2be152a34d04db871f459734db442071cc16ad1b32d6f7a8b6
|