Skip to main content

Variant-aware flanking-sequence extraction and masking for ddPCR assay design

Project description

vflank

CI Docs License: Apache 2.0 Python

Variant-aware flanking-sequence extraction and masking for ddPCR assay design.

vflank is the front-end of a ddPCR assay-design pipeline. It takes genomic variants — small variants (SNPs/indels) and structural variants (fusions) — and emits the sequence an assay is designed around: the masked flanks of each variant or the chimeric junction of a fusion. Primer/probe design itself is delegated downstream to established tools.

📖 Documentation: https://rhshah.github.io/vFlank/

Features

  • Small variants (vflank small) — ±N bp flanks from a MAF, raw + masked FASTA, deduplicated per unique variant (CHR_POS_REF_ALT).
  • Fusions / SVs (vflank fusion) — reverse-complement-aware junction sequences from an iCallSV / iAnnotateSV breakpoint table (columns by name).
  • SNP masking, two backends — local gnomAD VCFs or the gnomAD GraphQL API (no download), each with --pop-data {genome,exome,both}.
  • Patient consensus from a BAM (--bam/--bam-map) — build the flank/junction from the patient's own reads (hom-ALT corrected, het/low-cov handled) so primers match the real template; for both small variants and fusions.
  • No silent failures — genome-build guard, flank-truncation detection, and a categorised skip summary + optional TSV report.

Planned: VCF input (small + BND SV) and downstream emit formats. See docs/ARCHITECTURE.md.

Install

pip install vflank                                   # from PyPI (released versions)
pip install git+https://github.com/rhshah/vFlank.git # latest from GitHub
# development:
git clone https://github.com/rhshah/vFlank.git && cd vFlank
pip install -e ".[dev]"

Requires Python ≥ 3.10 (Linux/macOS) and pysam, pandas, typer, rich.

Docker

Images are published to GHCR on each release:

docker run --rm -v "$PWD:/data" ghcr.io/rhshah/vflank \
    small run /data/variants.maf -r /data/GRCh37.fasta -g hg19 -o /data/out.fasta

Quick start

vflank small run variants.maf \
    --ref-genome /path/to/GRCh37.fasta \
    --pop-vcf-dir /path/to/gnomad_v2.1.1/ \
    --genome-build hg19 \
    --flank 200 \
    --output flanking_sequences.fasta

--genome-build defaults to hg19 (GRCh37 / gnomAD v2.1.1); pass -g hg38 for GRCh38 / gnomAD v4. gnomAD v4 has no GRCh37 build.

Masking sources

Common-SNP masking can come from local gnomAD VCFs or the gnomAD API:

  • --pop-source vcf (default) — local per-chromosome gnomAD VCFs in --pop-vcf-dir. Reproducible, offline, unlimited scale.
  • --pop-source api — the public gnomAD GraphQL API, no download. Best for small cohorts (rate-limited to ~10 requests/min).
# No-download masking via the API (small cohorts):
vflank small run variants.maf -r GRCh37.fasta -g hg19 --pop-source api

Either source honours --pop-data {genome,exome,both} (default genome). both masks a position if it is a common SNP in either the genome or exome cohort. Flanks often fall in non-coding regions where only genomes have data, so genome is the default.

Each variant yields two FASTA records (the __{CHROM}_{POS}_{REF}_{ALT} suffix is what keys deduplication; the {SAMPLE}__ prefix appears only with --bam):

>[{SAMPLE}__]{GENE}__{HGVSp}__{HGVSc}__{CHROM}_{POS}_{REF}_{ALT}
{left_flank}[REF/ALT]{right_flank}
>Masked__[{SAMPLE}__]{GENE}__{HGVSp}__{HGVSc}__{CHROM}_{POS}_{REF}_{ALT}
{left_flank_masked}[REF/ALT]{right_flank_masked}

Chromosome notation (chr1 vs 1) is auto-detected from the FASTA and VCFs. The genome build is sanity-checked against the FASTA's chr1 length.

Project layout

src/vflank/
├── core/   chrom · variant · flanks · popfreq   (pure, testable domain logic)
├── io/     maf · reference · fasta              (file access)
└── cli/    app · small                          (Typer commands)

Documentation

  • docs/DEVELOPER.md — setup, running, testing, using vflank as a library, and extending it (new flank sources, CLI commands).
  • docs/ARCHITECTURE.md — design, scope boundary, and the milestone roadmap.
  • CLAUDE.md — repository conventions and the quality gate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vflank-0.3.0.tar.gz (99.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vflank-0.3.0-py3-none-any.whl (52.5 kB view details)

Uploaded Python 3

File details

Details for the file vflank-0.3.0.tar.gz.

File metadata

  • Download URL: vflank-0.3.0.tar.gz
  • Upload date:
  • Size: 99.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vflank-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d4eab0e1ff43559ac53693459d3de4ac43bd03a8664614ae10dfaa50ed872028
MD5 857831aa1106a6a0f0f3f948eed6cc1b
BLAKE2b-256 615f6c9add28dbdcc2486988a041bd6ef085ed5a7da648efb1df229226b507a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for vflank-0.3.0.tar.gz:

Publisher: release.yml on rhshah/vFlank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vflank-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: vflank-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 52.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vflank-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92deb8e7443fd1258d4c23bf0a79804860ed4822e509ef57dadb06e762525162
MD5 d1a957007bfb7c8f34287b5a147af2d9
BLAKE2b-256 bdf4428ad0eb709a0734ddb6acbded98630e7af0538d33271e60faf37cfa2d91

See more details on using hashes here.

Provenance

The following attestation bundles were made for vflank-0.3.0-py3-none-any.whl:

Publisher: release.yml on rhshah/vFlank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page