A production-quality platform for downstream genomic variant interpretation and prioritization

These details have not been verified by PyPI

Project links

Project description

VariantFlow

A production-quality platform for downstream genomic variant interpretation and prioritization

Accepts ANNOVAR multianno files and automatically performs variant filtering, ACMG classification, candidate gene prioritization, pathway enrichment, and generates publication-ready reports and interactive dashboards — with full reproducibility tracking.

Made with ❤️ in INDIA || Dr Prabudh Goel Lab, AIIMS New Delhi

Overview
Key Features
Architecture
Installation
Quick Start
CLI Reference
Input Formats
Output Structure
Configuration
Variant Scoring
Pathway Enrichment
Dashboard
Cohort and Family Analysis
Reproducibility
Citation
License

Overview

VariantFlow is a modular, extensible Python platform designed for downstream analysis of ANNOVAR-annotated genomic variant files. It is built for clinical genomics research and is intended for publication in journals such as BMC Genomics, Bioinformatics, and Briefings in Bioinformatics.

The platform takes a standard ANNOVAR multianno file as input and executes a complete, auditable analysis pipeline — from raw variant filtering through to HTML, Excel, and PDF reports — without requiring any manual column mapping or configuration.

Key Features

Feature	Description
Automatic column detection	ColumnMapper uses regex pattern matching across 50+ field types — never hardcodes ANNOVAR column names. Supports gnomAD v2/v3/v4.1.1, ClinVar date-stamped columns, SIFT4G, and more
Multi-tier filtering	Sequential quality → population frequency → functional consequence → exonic consequence → ACMG benign removal pipeline with full audit trail
ClinVar interpretation	Parses CLNSIG text; auto-detects presence-flag columns and falls back to InterVar classification
InterVar ACMG	Full evidence extraction — PVS1, PS1–4, PM1–6, PP1–5, BA1, BS1–4, BP1–7
Transparent scoring	Configurable multi-factor variant score with per-variant breakdown
Gene prioritization	Ranked candidate gene tables with natural-language score explanations
GO / KEGG / Reactome	Enrichment via gseapy with bubble plots and bar charts per database
Interactive dashboard	Multi-page Dash app with live filters, drill-down tables, and export
Multi-format reports	HTML, Excel (multi-sheet), PDF — all with lab branding
Cohort analysis	Shared/unique variants, gene burden, recurrent genes across samples
Family analysis	De novo, autosomal recessive, compound het, X-linked detection
Reproducibility	`project.json` manifest + auto-generated methods text for manuscripts
3D visualizations	3D variant landscape (Score × CADD × REVEL) and 3D pathway landscape

Architecture

variantflow/
├── core/               # Data models (Pydantic), exceptions, logging, pipeline orchestrator
├── io/                 # ColumnMapper, MultiannoReader — auto-detect all ANNOVAR fields
├── filters/            # Quality, population frequency, functional, exonic, ACMG filters
├── annotations/        # ClinVar engine, InterVar ACMG evidence parser
├── scoring/            # Transparent multi-factor variant scorer
├── prioritization/     # Gene ranker with score explanation
├── enrichment/         # GO BP/MF/CC, KEGG, Reactome via gseapy
├── statistics/         # Summary statistics engine → statistics.json
├── visualization/      # Plotly 2D figures + 3D landscapes + enrichment plots
├── dashboard/          # Multi-page Dash app with live callbacks
├── reports/            # HTML (Jinja2), Excel (openpyxl), PDF (ReportLab)
├── cohort/             # Multi-sample shared/unique/burden analysis
├── family/             # Pedigree-based inheritance detection
├── config/             # Pydantic v2 settings — fully configurable, env-var overridable
└── cli/                # Typer CLI — analyze / dashboard / cohort / family

Installation

From source (recommended)

git clone https://github.com/imrobintomar/VariantFlow.git
cd VariantFlow
pip install -e . --no-build-isolation

Dependencies

pip install pandas numpy scipy plotly dash dash-bootstrap-components \
            gseapy openpyxl reportlab jinja2 pydantic pydantic-settings \
            typer rich loguru tqdm

Docker

docker build -t variantflow:1.0.0 .
docker run --rm -v $(pwd)/data:/data -v $(pwd)/results:/results \
  variantflow:1.0.0 analyze /data/sample.hg38_multianno.txt --output /results

Quick Start

# Single-sample analysis
python variantflow_run.py analyze sample.hg38_multianno.txt \
  --output results/ --sample-id SAMPLE01

# Launch interactive dashboard
python variantflow_run.py dashboard results/ --port 8050

# Cohort analysis (directory of multianno files)
python variantflow_run.py cohort cohort_dir/ --output cohort_results/

# Family / trio analysis
python variantflow_run.py family family_dir/ \
  --proband PROBAND01 --father DAD01 --mother MOM01

CLI Reference

`analyze`

python variantflow_run.py analyze <input_file> [OPTIONS]

Arguments:
  input_file          ANNOVAR multianno file (.txt or .txt.gz)

Options:
  -o, --output        Output directory           [default: variantflow_results]
  -s, --sample-id     Sample identifier          [default: sample]
  -g, --genome        Genome build: hg38 / hg19  [default: hg38]
  --af                AF threshold (rare variant) [default: 0.01]
  --min-dp            Minimum read depth          [default: 10]
  --nonframeshift     Include nonframeshift indels
  --no-enrichment     Skip pathway enrichment
  --no-pdf            Skip PDF report
  -c, --config        JSON configuration file
  -v, --verbose       Verbose logging

`dashboard`

python variantflow_run.py dashboard <results_dir> [OPTIONS]

Options:
  --host    Dashboard host  [default: 127.0.0.1]
  --port    Dashboard port  [default: 8050]
  --debug   Enable debug mode

`cohort`

python variantflow_run.py cohort <cohort_dir> [OPTIONS]

Options:
  -o, --output   Output directory  [default: cohort_results]
  --pattern      File glob pattern [default: *.txt]

`family`

python variantflow_run.py family <family_dir> [OPTIONS]

Options:
  -p, --proband  Proband sample ID  [required]
  -f, --father   Father sample ID
  -m, --mother   Mother sample ID
  -o, --output   Output directory  [default: family_results]

Input Formats

VariantFlow accepts standard ANNOVAR multianno files:

Format	Example
Plain text	`sample.hg38_multianno.txt`
Plain text	`sample.hg19_multianno.txt`
Gzip compressed	`sample.hg38_multianno.txt.gz`
Tab-separated	`sample.hg38_multianno.tsv`

Automatically detected fields include:

Genomic coordinates: Chr, Start, End, Ref, Alt
Gene annotations: Gene.refGene, Func.refGene, ExonicFunc.refGene, AAChange.refGene
Population frequencies: gnomad411_exome_AF, gnomAD_exome_ALL, ExAC_ALL, 1000g2015aug_all
ClinVar: CLNSIG, clinvar_20260503 (date-stamped), CLNDN
InterVar: InterVar_automated, InterVar_ACMG
Predictors: REVEL_score, CADD_phred, SIFT_score, SIFT4G_score, Polyphen2_HDIV_score
Other: GERP++_RS, phyloP100way_vertebrate, MutationTaster_pred, SpliceAI_DS_max

Output Structure

results/
├── report.html                  # Self-contained interactive HTML report
├── VariantFlow_Report.xlsx       # Multi-sheet Excel workbook
│   ├── Summary                  # Key statistics and metadata
│   ├── CandidateVariants        # Top 500 variants ranked by score
│   ├── CandidateGenes           # Ranked candidate genes
│   ├── ClinVar_Pathogenic        # Pathogenic / Likely Pathogenic variants
│   ├── InterVar_Pathogenic       # ACMG Pathogenic / LP variants
│   └── go_* / kegg / reactome   # Enrichment results per database
├── report.pdf                   # PDF report with tables and methods
├── CandidateVariants.tsv        # Tab-separated candidate variants
├── CandidateGenes.tsv           # Tab-separated ranked genes
├── statistics.json              # Full summary statistics
├── project.json                 # Reproducibility manifest
├── methods.txt                  # Auto-generated methods section
└── figures/
    ├── filtering_funnel.html
    ├── clinvar_distribution.html
    ├── intervar_distribution.html
    ├── gene_ranking.html
    ├── chromosome_distribution.html
    ├── variant_score_histogram.html
    ├── af_distribution.html
    ├── acmg_evidence.html
    ├── variant_landscape_3d.html
    ├── enrichment_go_biological_process_dot.html
    ├── enrichment_go_biological_process_bar.html
    ├── enrichment_go_cellular_component_dot.html
    ├── enrichment_kegg_dot.html
    ├── enrichment_reactome_dot.html
    └── pathway_landscape_3d.html

Configuration

VariantFlow uses a Pydantic v2 settings system. All parameters can be overridden via:

JSON config file (--config my_config.json)
Environment variables (prefix VF_)

Example `config.json`

{
  "project_name": "Rare Disease Study",
  "sample_id": "PATIENT_001",
  "genome_build": "hg38",
  "output_dir": "results/",
  "filters": {
    "active_af_threshold": 0.001,
    "min_dp": 20,
    "include_nonframeshift": true
  },
  "scoring": {
    "clinvar_pathogenic": 10.0,
    "revel_high": 3.0,
    "cadd_very_high": 3.0
  },
  "enrichment": {
    "organism": "human",
    "qvalue_cutoff": 0.05,
    "top_n_terms": 20
  }
}

Environment variable override

export VF_FILTERS__ACTIVE_AF_THRESHOLD=0.001
export VF_FILTERS__MIN_DP=20
export VF_LOG_LEVEL=DEBUG
python variantflow_run.py analyze sample.txt

Variant Scoring

VariantFlow uses a transparent, configurable multi-factor scoring system. Every score contribution is stored in a score_breakdown column for full auditability.

Source	Criterion	Score
ClinVar	Pathogenic	+10.0
ClinVar	Likely Pathogenic	+8.0
ClinVar	VUS	+3.0
ClinVar	Likely Benign	-2.0
ClinVar	Benign	-5.0
InterVar	Pathogenic	+8.0
InterVar	Likely Pathogenic	+6.0
Consequence	Stop-gain / Stop-loss / Start-loss	+5.0
Consequence	Frameshift indel	+5.0
Consequence	Splicing	+3.0
Consequence	Nonsynonymous SNV	+2.0
Population AF	< 0.0001 (ultra-rare)	+4.0
Population AF	< 0.001 (very rare)	+3.0
Population AF	< 0.01 (rare)	+1.5
REVEL	≥ 0.75	+3.0
REVEL	0.50 – 0.75	+1.5
CADD	≥ 30	+3.0
CADD	20 – 30	+2.0
SIFT	Deleterious	+1.0
PolyPhen-2	Damaging	+1.0

All weights are configurable in config.json under the scoring key.

Note: Variants classified as Benign or Likely Benign by InterVar are automatically removed from the candidate set after annotation.

Pathway Enrichment

Enrichment analysis is performed using gseapy against:

Database	Gene Sets
Gene Ontology	GO Biological Process 2023
Gene Ontology	GO Molecular Function 2023
Gene Ontology	GO Cellular Component 2023
KEGG	KEGG 2021 Human
Reactome	Reactome 2022

Each database produces:

Bubble plot — x = -log₁₀(adj. p-value), size = gene count, color = odds ratio
Bar chart — ranked terms colored by gene count
Full results table with export

Significance threshold: adjusted p-value ≤ 0.2 (Benjamini-Hochberg). Requires ≥ 5 candidate genes.

Dashboard

The interactive Dash dashboard (http://127.0.0.1:8050) provides eight analysis pages:

Page	Content
Overview	KPI cards, filtering funnel, ClinVar/InterVar/chromosome distribution
Variant Explorer	Live-filtered table with score slider, ClinVar and InterVar dropdowns, histogram
Genes	Ranked bar chart (color-coded by ClinVar), Top N slider, full gene table
ClinVar	Classification distribution pie chart, filtered variant table
InterVar	ACMG classification bar chart, evidence criterion heatmap
Enrichment	Bubble + bar plots per database (GO CC, GO BP, GO MF, KEGG, Reactome)
3D Landscape	3D variant landscape (Score × CADD × REVEL) and 3D pathway landscape
Chromosome	Variant density by chromosome

All tables support column filtering, sorting, and Excel export.

Cohort and Family Analysis

Cohort

python variantflow_run.py cohort cohort_dir/ --output cohort_results/

Outputs:

cohort_shared_variants.tsv — variants present in ≥ 2 samples
cohort_unique_variants.tsv — sample-private variants
cohort_gene_burden.tsv — per-gene variant counts per sample
cohort_recurrent_genes.tsv — genes affected in ≥ 2 samples

Family (Trio/Quad)

python variantflow_run.py family family_dir/ \
  --proband PROBAND --father FATHER --mother MOTHER

Detects and outputs:

family_de_novo.tsv
family_autosomal_recessive.tsv
family_compound_heterozygous.tsv
family_x_linked.tsv

Reproducibility

Every analysis generates a project.json manifest containing:

{
  "run_id": "16e51f54",
  "variantflow_version": "1.0.0",
  "created_at": "2026-06-03T10:36:51",
  "python_version": "3.13.11",
  "genome_build": "hg38",
  "input_files": ["sample.hg38_multianno.txt"],
  "filters_applied": ["quality", "population_frequency", "functional_consequence",
                       "exonic_consequence", "acmg_benign_removal"],
  "total_input_variants": 86299,
  "total_output_variants": 917,
  "total_candidate_genes": 100,
  "config": { "..." }
}

A methods.txt file is also generated, ready to paste into a manuscript Methods section.

Citation

If you use VariantFlow in your research, please cite:

Tomar R. (2024). VariantFlow: A production-quality platform for genomic variant interpretation and prioritization. Dr Prabudh Goel Lab, AIIMS New Delhi. GitHub. https://github.com/imrobintomar/VariantFlow

Contributing

Contributions are welcome. Please open an issue before submitting a pull request. All contributors must follow the existing code style (black, ruff) and include unit tests.

# Run tests
pytest tests/unit/ -v --cov=variantflow

# Lint
ruff check variantflow/
black variantflow/

License

See LICENSE for full terms.

Made with ❤️ in INDIA || Dr Prabudh Goel Lab, AIIMS New Delhi
github.com/imrobintomar/VariantFlow

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variantflow-1.0.0.tar.gz (63.7 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

variantflow-1.0.0-py3-none-any.whl (69.8 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file variantflow-1.0.0.tar.gz.

File metadata

Download URL: variantflow-1.0.0.tar.gz
Upload date: Jun 3, 2026
Size: 63.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for variantflow-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ee1e40f42d2120a0ecd4add90f871ba925b9ab8cb8940d6d4f46f3b2cde08b74`
MD5	`1ffa1747cb93dca0b8eb4328c271ac84`
BLAKE2b-256	`b1fdc7b816f6d02a15037e6bb6bb1d30a9fd2cc79377b6601392ff181a469547`

See more details on using hashes here.

File details

Details for the file variantflow-1.0.0-py3-none-any.whl.

File metadata

Download URL: variantflow-1.0.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 69.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for variantflow-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67ebd8d1698fdae38649ac630739b47ab1eaa347f9cb068c373f8e443e0184f8`
MD5	`9132e95ce4e8385ab1cb463ad2b24b96`
BLAKE2b-256	`a5aeb62ab5506fc321710c673f5b9bc98d1eb0a40b02fcfa9d19c1ca57542e5d`

See more details on using hashes here.

variantflow 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

VariantFlow

Table of Contents

Overview

Key Features

Architecture

Installation

From source (recommended)

Dependencies

Docker

Quick Start

CLI Reference

analyze

dashboard

cohort

family

Input Formats

Output Structure

Configuration

Example config.json

Environment variable override

Variant Scoring

Pathway Enrichment

Dashboard

Cohort and Family Analysis

Cohort

Family (Trio/Quad)

Reproducibility

Citation

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`analyze`

`dashboard`

`cohort`

`family`

Example `config.json`