Variant-aware flanking-sequence extraction and masking for ddPCR assay design
Project description
vflank
Variant-aware flanking-sequence extraction and masking for ddPCR assay design.
vflank is the front-end of a ddPCR assay-design pipeline. It takes genomic
variants — small variants (SNPs/indels) and structural variants (fusions) — and
emits the sequence an assay is designed around: the masked flanks of each variant
or the chimeric junction of a fusion. Primer/probe design itself is delegated
downstream to established tools.
📖 Documentation: https://rhshah.github.io/vFlank/
Features
- Small variants (
vflank small) — ±N bp flanks from a MAF, raw + masked FASTA, deduplicated per unique variant (CHR_POS_REF_ALT). - Fusions / SVs (
vflank fusion) — reverse-complement-aware junction sequences from an iCallSV / iAnnotateSV breakpoint table (columns by name). - SNP masking, two backends — local gnomAD VCFs or the gnomAD GraphQL API
(no download), each with
--pop-data {genome,exome,both}. - Reference, two backends — a local indexed FASTA or the UCSC API
(
--ref-source api, no download) for runs with no reference on disk. - Patient consensus from a BAM (
--bam/--bam-map) — build the flank/junction from the patient's own reads (hom-ALT corrected, het/low-cov handled) so primers match the real template; for both small variants and fusions. - No silent failures — genome-build guard, flank-truncation detection, and a categorised skip summary + optional TSV report.
Planned: VCF input (small + BND SV) and downstream emit formats.
See docs/ARCHITECTURE.md.
Install
pip install vflank # from PyPI (released versions)
pip install git+https://github.com/rhshah/vFlank.git # latest from GitHub
# development:
git clone https://github.com/rhshah/vFlank.git && cd vFlank
pip install -e ".[dev]"
Requires Python ≥ 3.10 (Linux/macOS) and pysam, pandas, typer, rich.
Docker
Images are published to GHCR on each release:
docker run --rm -v "$PWD:/data" ghcr.io/rhshah/vflank \
small run /data/variants.maf -r /data/GRCh37.fasta -g hg19 -o /data/out.fasta
Quick start
vflank small run variants.maf \
--ref-genome /path/to/GRCh37.fasta \
--pop-vcf-dir /path/to/gnomad_v2.1.1/ \
--genome-build hg19 \
--flank 200 \
--output flanking_sequences.fasta
--genome-build defaults to hg19 (GRCh37 / gnomAD v2.1.1); pass -g hg38
for GRCh38 / gnomAD v4. gnomAD v4 has no GRCh37 build.
Masking sources
Common-SNP masking can come from local gnomAD VCFs or the gnomAD API:
--pop-source vcf(default) — local per-chromosome gnomAD VCFs in--pop-vcf-dir. Reproducible, offline, unlimited scale.--pop-source api— the public gnomAD GraphQL API, no download. Best for small cohorts (rate-limited to ~10 requests/min).
# No-download masking via the API (small cohorts):
vflank small run variants.maf -r GRCh37.fasta -g hg19 --pop-source api
Either source honours --pop-data {genome,exome,both} (default genome).
both masks a position if it is a common SNP in either the genome or exome
cohort. Flanks often fall in non-coding regions where only genomes have data,
so genome is the default.
Reference sources
The reference can likewise come from a local file or an API:
--ref-source file(default) — a local indexed FASTA via--ref-genome. Reproducible, offline, unlimited scale; build sanity-checked by chr1 length.--ref-source api— the UCSC API, no download (--ref-genomenot needed). Best for one-off / hosted runs; throttled to ~1 request/second, so not for bulk.
# Fully no-download (reference + masking from APIs):
vflank small run variants.maf -g hg19 --ref-source api --pop-source api
Each variant yields two FASTA records (the __{CHROM}_{POS}_{REF}_{ALT} suffix
is what keys deduplication; the {SAMPLE}__ prefix appears only with --bam):
>[{SAMPLE}__]{GENE}__{HGVSp}__{HGVSc}__{CHROM}_{POS}_{REF}_{ALT}
{left_flank}[REF/ALT]{right_flank}
>Masked__[{SAMPLE}__]{GENE}__{HGVSp}__{HGVSc}__{CHROM}_{POS}_{REF}_{ALT}
{left_flank_masked}[REF/ALT]{right_flank_masked}
Chromosome notation (chr1 vs 1) is auto-detected from the FASTA and VCFs.
With a local FASTA the genome build is sanity-checked against its chr1 length;
with --ref-source api the requested --genome-build is trusted (a wrong build
surfaces as a UCSC error, not silent wrong sequence).
Project layout
src/vflank/
├── core/ chrom · variant · flanks · popfreq (pure, testable domain logic)
├── io/ maf · reference · fasta (file access)
└── cli/ app · small (Typer commands)
Documentation
- docs/DEVELOPER.md — setup, running, testing, using vflank as a library, and extending it (new flank sources, CLI commands).
- docs/ARCHITECTURE.md — design, scope boundary, and the milestone roadmap.
CLAUDE.md— repository conventions and the quality gate.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vflank-0.5.0.tar.gz.
File metadata
- Download URL: vflank-0.5.0.tar.gz
- Upload date:
- Size: 134.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92ef24ac8415840b8cc90b471c9cedd60b9a42fc768893eb7f888eeafb061097
|
|
| MD5 |
a0609d61c918adf6d90eb66ae1d6b41c
|
|
| BLAKE2b-256 |
cb6a82485d96b7204d0ee18c04c304ecae084c84cee2d2282c9b725cdd8c374a
|
Provenance
The following attestation bundles were made for vflank-0.5.0.tar.gz:
Publisher:
release.yml on rhshah/vFlank
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vflank-0.5.0.tar.gz -
Subject digest:
92ef24ac8415840b8cc90b471c9cedd60b9a42fc768893eb7f888eeafb061097 - Sigstore transparency entry: 1825587694
- Sigstore integration time:
-
Permalink:
rhshah/vFlank@00a61131c6f717e87f2d0c5df5fa04d7a8a0448e -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/rhshah
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@00a61131c6f717e87f2d0c5df5fa04d7a8a0448e -
Trigger Event:
release
-
Statement type:
File details
Details for the file vflank-0.5.0-py3-none-any.whl.
File metadata
- Download URL: vflank-0.5.0-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d148e926fd71dba23e319e457f6c037fc1594bf702150920a975d7a7d9da2c1
|
|
| MD5 |
7692372c4b544f5035fbbf68cbe120e9
|
|
| BLAKE2b-256 |
fa4650c6e975131a6505ed64dd293812559e3297db390e5a733af969b77d89a4
|
Provenance
The following attestation bundles were made for vflank-0.5.0-py3-none-any.whl:
Publisher:
release.yml on rhshah/vFlank
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vflank-0.5.0-py3-none-any.whl -
Subject digest:
3d148e926fd71dba23e319e457f6c037fc1594bf702150920a975d7a7d9da2c1 - Sigstore transparency entry: 1825587888
- Sigstore integration time:
-
Permalink:
rhshah/vFlank@00a61131c6f717e87f2d0c5df5fa04d7a8a0448e -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/rhshah
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@00a61131c6f717e87f2d0c5df5fa04d7a8a0448e -
Trigger Event:
release
-
Statement type: