Skip to main content

Headless Venn diagram analysis and rendering — companion to the Venn Diagram Lab web tool.

Project description

venn-diagram-lab - Python package

CI PyPI version Python versions License: MIT DOI (Zenodo concept)

Headless Python companion to the Venn Diagram Lab web tool. Build, render, and statistically analyse Venn / UpSet diagrams from CSV / TSV / GMT / GMX inputs — same 44 SVG models, same intersection/Jaccard/hypergeometric statistics, same PDF report layout — but in a notebook, a Snakemake rule, or a CI job, with no browser involved. You can integrate it in your programatic workflow.

Working in R? The same analysis + rendering pipeline is available as a CRAN package: vennDiagramLab (install.packages("vennDiagramLab")). Source + docs: in the project's r/ folder · pkgdown site: https://zoliqua.github.io/Venn-Diagram-Lab/r/. The three implementations (web tool, Python, R) share the same SVG model library and produce byte-equivalent TSV outputs — see tests/test_parity_with_webapp.py.

1. Install

pip install venn-diagram-lab

That's it — all bundled SVG templates, sample datasets, and the vdl CLI ship with the wheel.

System deps (cairosvg): the PDF/PNG render path uses cairosvg, which needs the cairo native library. On Linux the wheel works out of the box once you have libcairo2. On macOS run brew install cairo pango. On Windows install the GTK3 runtime (cairo bundled). CI is currently Linux-only for this reason — see CHANGELOG.md "Known limitations".

For development (clone + editable):

git clone https://github.com/ZoliQua/Venn-Diagram-Lab.git
cd Venn-Diagram-Lab
python -m venv .venv && source .venv/bin/activate
python python/scripts/sync_data.py        # populates _data/ from the React side
pip install -e "python/[dev]"

1.1 Quickstart (30 seconds)

from venn_diagram_lab import load_sample, analyze

result = analyze(load_sample("dataset_real_cancer_drivers_4"))
print(result.set_sizes)
# {'Vogelstein': 138, 'COSMIC_CGC': 581, 'OncoKB': 1231, 'IntOGen': 633}

# Render the Venn diagram
result.render_venn().save("cancer_drivers.svg")

# Or a full multi-page PDF report
result.to_pdf_report("cancer_drivers_report.pdf")

1.2 Complete User Guide

You can reach the complete USER GUIDE here in markdown format or downlad it as a PDF File.

2. Loading your own data

from venn_diagram_lab import load_csv, load_tsv, load_gmt, load_gmx, Dataset, analyze

# Binary 0/1 columns
ds = load_csv("genes.csv", binary=True)

# Aggregated (each column = a set, cells = item names)
ds = load_csv("pathways.csv", binary=False)

# GMT (one set per line)
ds = load_gmt("hallmark.gmt")

# In-memory dict
ds = Dataset.from_dict({
    "Set A": ["x", "y", "z"],
    "Set B": ["y", "z", "w"],
})

result = analyze(ds)

3. Visualisations

Method Output Best for
result.render_venn() SVG (vector) Publication; up to 9 sets via 44 bundled templates
result.render_venn(model='proportional') SVG Area-proportional; 2-3 sets only
result.render_upset() matplotlib Figure 5+ sets where Venn is hard to read
result.render_network() matplotlib Figure Pairwise relationships at a glance
result.to_pdf_report(path) Multi-page PDF One-shot publication-ready report

All visualisation methods accept the same kwargs as the underlying render.* functions — see their docstrings for full reference.

render_venn_svg (and result.render_venn(...)) also accepts show_items=True to display item identifiers inside each region, item_options={...} for layout tuning, and highlight=[...] for a spotlight mode that desaturates non-matching set shapes. The Region accessors and Boolean DSL section below covers these in depth.

3.1. Matplotlib backend

Three SVG renderers have parallel matplotlib variants for users who want to compose plots into their own subplot grids (paper figures, multi-panel dashboards). The SVG variants stay byte-identical to the web tool's output; the matplotlib variants return an MplImage(fig, legend) wrapper.

Function Mirrors Use when
render_venn_mpl(result, ax=None) render_venn_svg You want a Venn in a plt.subplots grid. Supports 2/3/4-set classic + proportional; 5+ raises IncompatibleModelError.
render_share_distribution_mpl(dataset, ax=None) render_share_distribution_svg You want axis labels / grid / log scale on the share-distribution histogram.
render_cluster_heatmap_mpl(result, ax=None, linkage=...) render_cluster_heatmap_svg You want a colorbar + dendrograms on the cluster heatmap.

CLI counterparts: vdl render venn --backend mpl --out v.png, vdl render share-dist --backend mpl --out s.png, vdl render heatmap --backend mpl --cluster --out h.png. The default backend stays svg.

SvgImage exposes both _repr_svg_() and _repr_mimebundle_() Jupyter hooks, so calling an SvgImage object directly in a notebook cell renders inline without an explicit display(SVG(...)) import.

4. Statistics and Charts

stats = result.statistics       # lazy compute
print(stats.jaccard)            # square pandas DataFrame
print(stats.hypergeometric)     # long-form: pair, intersection, expected, p_value, p_adjusted, ...

compute_pairwise produces 5 metric tables:

  • Jaccard,
  • Sørensen-Dice,
  • Overlap Coefficient,
  • Fold Enrichment, and the
  • Hypergeometric long-f

4.1. Enrichment bar + lollipop charts

The webtool's Statistics panel shows the same pairwise data three ways (bar, lollipop, heatmap). Python exposes all three:

from venn_diagram_lab import analyze, load_sample
from venn_diagram_lab.render.svg import (
    render_cluster_heatmap_svg,
    render_enrichment_bar_svg,
    render_enrichment_lollipop_svg,
)

result = analyze(load_sample("dataset_real_cancer_drivers_4"))

render_enrichment_bar_svg(result).save("bar.svg")
render_enrichment_lollipop_svg(result, metric="foldEnrichment").save("lollipop.svg")
render_cluster_heatmap_svg(result, linkage="average").save("heatmap.svg")

Both bar and lollipop accept metric="neglog10fdr" (default — bar height ∝ -log10(BH-FDR)) or metric="foldEnrichment" (bar height ∝ fold-enrichment over the universe). Significant pairs (FDR<0.05) fill with #2e7d32; non-significant with #888888. Markers ***/**/* mark FDR<0.001/<0.01/<0.05 respectively.

CLI:

vdl render bar --sample
vdl render lollipop --sample --metric foldEnrichment

4.2. Item Share Distribution + Cluster Heatmap

Two additional statistics surfaces complement the pairwise tables:

Function Purpose
item_share_distribution(matrix) Histogram of how many sets each item belongs to (1, 2, ..., n)
cluster_set_order(D, method="average") UPGMA / complete / single hierarchical linkage on a distance matrix; returns leaf order + dendrogram merges
render_share_distribution_svg(dataset) SVG bar chart of the item-share distribution
render_cluster_heatmap_svg(result, linkage="average") Pairwise-Jaccard heatmap with UPGMA-reordered axes and side dendrograms
from venn_diagram_lab import load_sample, analyze
from venn_diagram_lab.share_distribution import item_share_distribution
from venn_diagram_lab.render.svg import (
    render_share_distribution_svg,
    render_cluster_heatmap_svg,
)

ds = load_sample("dataset_real_cancer_drivers_4")
dist = item_share_distribution(ds.matrix)   # {1: 722, 2: 275, 3: 277, 4: 120}

img = render_share_distribution_svg(ds)
print(img.svg[:200])

result = analyze(ds)
heatmap = render_cluster_heatmap_svg(result, linkage="average")
print(heatmap.svg[:200])

Both renderers return the same SvgImage dataclass as render_venn_svg, so .save("plot.svg" | "plot.png" | "plot.pdf") works uniformly.

4.3. PDF and ZIP reports

to_pdf_report produces a multi-page PDF with the dataset overview, Venn + UpSet on a single page, paginated statistics tables, an Item Share Distribution page, an optional Cluster Heatmap page, and a network panel with the significant pairwise edges. A Credits and Cite footer closes the report.

result.to_pdf_report("report.pdf", cluster_heatmap=True)
# or via CLI:
vdl report pdf --sample --cluster-heatmap

vdl report zip bundles everything in one archive: the PDF, four SVGs (Venn, UpSet, Network, Share Distribution), three TSVs (regions, matrix, statistics), a three-sheet enrichment_statistics_{n}-sets.xlsx workbook (Jaccard, Sørensen-Dice, Intersection Enrichment), and a README.txt with provenance + the About This Report methodology section.

5. Region accessors and Boolean DSL

Four composable helpers for selecting items and regions inside an analysed diagram. The accessors and the DSL chain naturally with the renderers in the previous section.

from venn_diagram_lab import (
    analyze, load_sample,
    intersection_items, exclusive_items, union_items,
    parse_region_expression, render_venn_svg,
)

res = analyze(load_sample("dataset_real_cancer_drivers_4"))

# Items in EVERY named set (regardless of other memberships).
intersection_items(res, ["Vogelstein", "COSMIC_CGC", "OncoKB"])

# Items in EXACTLY this combination (and no other set).
exclusive_items(res, ["Vogelstein", "COSMIC_CGC"])

# Items in ANY of the named sets (deduplicated).
union_items(res, ["Vogelstein", "COSMIC_CGC"])

# Boolean DSL -> sorted list of region bitmasks.
masks = parse_region_expression("A & B + B & C", n_sets=4)
# -> [3, 6, 7, 11, 14, 15]

# Spotlight: render the Venn with only the matching regions highlighted.
render_venn_svg(res, highlight=masks, show_items=True,
                item_options={"max_items_per_region": 8,
                              "truncate_long_names": 10}).save("spotlight.svg")

DSL grammar: atoms A..I, & intersection, | or + union, ~ or ! complement, parentheses for grouping. Precedence (highest first): unary > intersection > union; binary operators are left-associative.

render_venn_svg's show_items=True replaces the per-region count text with the actual item identifiers as multi-line tspans inside the existing Count_* text nodes. item_options keys (defaults): max_items_per_region=20, ncol_items=1, truncate_long_names=12 (0 disables), line_height=10, font_size=8, show_counts_with_items=False, ellipsis="...". Regions exceeding max_items_per_region show a trailing "+N more" line.

highlight accepts a sequence of region labels (["AB", "ABC"]) or of region bitmasks (e.g. the output of parse_region_expression). Sets that do not contribute to any highlighted region are desaturated to #cccccc at 25% opacity.

CLI surface:

# Items in this exact combination.
vdl data items dataset_real_cancer_drivers_4 \
    --mode exclusive --sets A,B --out -

# DSL debug / validator.
vdl data regions --expr "A & B + B & C" --n-sets 4
# -> 3,6,7,11,14,15

# Spotlight render with DSL highlight.
vdl render venn --sample --show-items \
    --highlight-expr "A & B & ~C & ~D" --out /tmp/spot.svg

6. Export to TSV

The TSV file matches the web tool's byte-for-byte.

result.to_region_summary_tsv("regions.tsv")     # depth-sorted region table
result.to_matrix_tsv("matrix.tsv")              # one row per item with set membership
result.to_statistics_tsv("statistics.tsv")      # pairwise stats with FDR

These match the React web tool's three Export buttons exactly — including float formatting and spreadsheet-formula escaping. Byte-equivalence is enforced by the cross-package parity tests (pytest python/tests/test_parity_with_webapp.py) for all 5 bundled samples.

7. Command-line interface

The wheel installs a vdl console script with a Typer-based subapp layout (commands are listed alphabetically in vdl --help and inside each subapp). Every subcommand has an extended --help page with a How to try it example block, and every dataset-consuming command accepts a --sample flag that runs the demo on the bundled dataset_real_cancer_drivers_4 fixture without a positional argument.

The full tree is discoverable in one shot via vdl tree. Below is the v2.2.2 catalog grouped by subapp:

7.1. Top-level shortcuts

Command Purpose
vdl version Print the package version (single line, script-friendly).
vdl list-models Rich table of the 44 bundled SVG models.
vdl list-samples Rich table of bundled sample datasets.
vdl share-dist <input> [--out F] Item Share Distribution histogram (shortcut for vdl render share-dist).
vdl cluster <input> [--linkage M] [--out F] Cluster-rendered heatmap (shortcut for vdl render heatmap --cluster).
vdl tree Print every command in the CLI as a tree.
vdl about Short overview of Venn diagrams (abridged from the web tool's About dialog).
vdl credits Authors, citation, and links (web tool / Zenodo / PyPI / CRAN).
vdl analyze ... Deprecated (removed in v2.3) — Swiss-army analyzer; see migration hints.
vdl render-sample ... Deprecated (removed in v2.3) — bundled-sample shortcut for the old analyze.

7.2. vdl render — visual outputs

Each render command accepts INPUT (file path or bundled sample name) or --sample for demo mode. Format is inferred from the --out extension (.svg / .png / .pdf); default output is <stem>__<kind>.svg in CWD.

Command Purpose
vdl render bar <input> [--metric M] Pairwise-enrichment bar chart (-log10 FDR or foldEnrichment).
vdl render heatmap <input> [--cluster --linkage M] Pairwise Jaccard / FDR heatmap, optionally cluster-reordered.
vdl render lollipop <input> [--metric M] Pairwise-enrichment lollipop chart (same metric set as bar).
vdl render network <input> Force-directed set-relationship network.
vdl render share-dist <input> Item Share Distribution histogram (item-membership counts per k).
vdl render upset <input> UpSet plot for the dataset's intersection structure.
vdl render venn <input> Venn diagram (44 SVG models + area-proportional 2/3-set).
vdl render all <input> --output-dir D One-shot bundle: all five SVGs into one directory.

7.3. vdl export — TSV table writers

Stdout pipe via --out - (text format only). All commands route through the same RegionResult writer methods as the Python API.

Command Purpose
vdl export region-summary <input> Per-region exclusive + inclusive counts + items.
vdl export matrix <input> Binary item × set membership matrix.
vdl export statistics <input> Pairwise Jaccard / Dice / OC / FE / hypergeometric / BH-FDR.
vdl export pairwise <input> Alias of statistics (common bioinformatics synonym).

7.4. vdl report — multi-page bundles

Command Purpose
vdl report pdf <input> [--cluster-heatmap] --out R.pdf Multi-page PDF report (mirrors the web tool's Report PDF). --cluster-heatmap appends the cluster-ordered Jaccard heatmap page.
vdl report zip <input> --out R.zip Full bundle ZIP (4 SVGs + 3 TSVs + 1 XLSX + 1 PDF + README.txt).

7.5. vdl data — data operations

Command Purpose
vdl data convert <in> <out> Format conversion (TSV ⇄ CSV).
vdl data describe <input> Quick summary (set count, item count, top regions).
vdl data fit-model <input> Recommend a catalog-resident model name for the input's set count.
vdl data lookup <input> <item> Find which Venn region(s) contain a given item (script-friendly Find Item).
vdl data samples List bundled sample datasets.
vdl data validate <input> [--text] [--strict] Schema check; JSON by default, --text for colourised output. Exit 1 on errors (--strict promotes warnings).

7.6. vdl model — model catalog

Command Purpose
vdl model list Names of the 44 bundled models.
vdl model info <name> Set count, geometry family, display name for one model.
vdl model svg <name> [--out F] Write the raw bundled SVG template (no substitution).

7.7. vdl workflow — project helpers

Command Purpose
vdl workflow init <dir> Scaffold data/ + output/ + sample analysis.yaml.
vdl workflow bench <input> Per-stage timing (load / analyze / 5 renderers / total).
vdl workflow run-from <cfg.yaml> Execute every step in a YAML config (outputs: [{kind, out, …}]).

7.8. Examples

# Demo mode (no input file needed)
vdl render venn --sample
vdl render heatmap --sample --cluster --linkage average
vdl share-dist --sample
vdl report pdf --sample --out demo.pdf

# Real workflow
vdl data validate data/my_genes.tsv --text
vdl render all data/my_genes.tsv --output-dir output/
vdl export pairwise data/my_genes.tsv --out output/stats.tsv

# Config-driven batch run
vdl workflow init my_project/
edit my_project/analysis.yaml
vdl workflow run-from my_project/analysis.yaml

# Discovery
vdl tree                       # all 33 commands in one tree
vdl render --help              # subapp help (alphabetical)
vdl render venn --help         # extended help with "How to try it" examples

8. Notebook gallery

Twelve executable notebooks live under python/examples/:

# Notebook Topic
01 01_quickstart.ipynb First analysis in 10 cells
02 02_real_cancer_drivers.ipynb Biological walkthrough (cancer driver catalogs); v2.2.3 adds Item Share Distribution + Cluster Heatmap on the 4-catalog overlap
03 03_proportional_diagrams.ipynb Area-proportional 2/3-set demos
04 04_upset_vs_venn_vs_network.ipynb Choosing the right visualisation; v2.2.3 adds Cluster Heatmap as a 4th option + 4-row decision summary
05 05_statistics_deep_dive.ipynb Jaccard / Dice / Hypergeometric / BH-FDR + v2.2.3 bar/lollipop/cluster-heatmap side-by-side
06 06_pipeline_integration.ipynb Snakemake + Nextflow templates + vdl workflow run-from YAML config runner
07 07_pdf_reports.ipynb Multi-page PDF reports, with cluster_heatmap=True opt-in
08 08_custom_styling_and_export.ipynb lxml SVG post-processing + multi-format export
09 09_cli_workflows.ipynb New (v2.2.3) — Drive the vdl CLI from Python notebooks via subprocess; demo of 6 representative commands + decision table
10 10_enrichment_plots_comparison.ipynb New (v2.2.3) — Bar / Lollipop / Heatmap / Cluster Heatmap / Item Share Distribution side-by-side comparison + metric switching
11 11_data_validation_and_lookup.ipynb New (v2.2.3) — Data-hygiene workflow: vdl data validate (JSON+text+strict), batch vdl data lookup over a gene list, describe / fit-model / convert
12 12_region_accessors_and_dsl.ipynb Region accessors + Boolean DSL + spotlight render; composability chain (DSL → mask → highlight) + a CLI sub-tutorial via subprocess.run (vdl data items / vdl data regions / vdl render venn --highlight-expr)

Each notebook is built from a python/scripts/notebooks/_build_NN_*.py script and executed nightly on CI to prevent bit-rot.

9. Bundled sample datasets

Name Sets Items Source
dataset_real_cancer_drivers_4 4 1394 Vogelstein / COSMIC CGC / OncoKB / IntOGen catalogs
dataset_real_msigdb_cancer_pathways 5 777 MSigDB Hallmark cancer pathways
dataset_real_msigdb_immune_pathways 4 521 MSigDB Hallmark immune pathways
dataset_mock_gene_sets 6 97 Synthetic for demos (trimmed in v2.1.1)
dataset_mock_streaming_platforms 8 800 TV/movie titles across 8 streaming services
from venn_diagram_lab import list_samples, load_sample
list_samples()
ds = load_sample("dataset_real_cancer_drivers_4")

10. Contributing

The repo monorepos the React web tool and this Python package. After cloning:

cd Venn-Diagram-Lab
python -m venv .venv && source .venv/bin/activate
python python/scripts/sync_data.py
pip install -e "python/[dev]"
pytest python/tests/ -q

Run the slow notebook suite (~3 min):

pytest python/tests/test_notebooks.py -m slow

Regenerate the parity-test fixtures (requires Node 20+):

npm install
npm run fixtures:parity

Conventional commit prefixes used: feat(python):, fix(python):, chore(python):, docs(python):, test(python):.

11. Versioning

Strict SemVer. Pre-1.0 minor bumps may include behavior changes; see CHANGELOG.md.

12. License, Citation and Credits

MIT — see LICENSE.

12.1. Citation

If you use this package in research, please cite the software using the Zenodo concept (all-versions) DOI — it always resolves to the latest archived release, so there is nothing to update per version:

Dul Z., Ölbei M., Thomas N. S. B., Si Ammour A., Csikász-Nagy A. (2026).
Venn Diagram Lab — interactive Venn / UpSet diagrams.
https://www.venndiagramlab.org/
DOI: 10.5281/zenodo.19510813 (concept, all versions)

DOI

The R companion package also has a CRAN-minted DOI: 10.32614/CRAN.package.vennDiagramLab.

See CITATION.cff for machine-readable citation metadata.

12.2. See also

Language Platform Package / URL Status
Typescript Web https://www.venndiagramlab.org/ live
Python PyPi venn-diagram-lab on PyPI live
R CRAN vennDiagramLab on CRAN live
R Bioconductor vennDiagramLab on Bioconductor submission pending

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

venn_diagram_lab-2.4.0.tar.gz (4.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

venn_diagram_lab-2.4.0-py3-none-any.whl (4.3 MB view details)

Uploaded Python 3

File details

Details for the file venn_diagram_lab-2.4.0.tar.gz.

File metadata

  • Download URL: venn_diagram_lab-2.4.0.tar.gz
  • Upload date:
  • Size: 4.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for venn_diagram_lab-2.4.0.tar.gz
Algorithm Hash digest
SHA256 21b5dc1f49e21c177266563118a9ba39ee787bfbc730bb628e46266f5ae629be
MD5 10b63cffb92d7073b3d69cf204ac234f
BLAKE2b-256 07b6907191c40c63ec93e269f9a4b144356f69f3623980ce2f574afa85a94110

See more details on using hashes here.

Provenance

The following attestation bundles were made for venn_diagram_lab-2.4.0.tar.gz:

Publisher: python-publish.yml on ZoliQua/Venn-Diagram-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file venn_diagram_lab-2.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for venn_diagram_lab-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb2d654fc23fd4bf553a6082fc69e62e5575920c0cb279437d3042057bbc4217
MD5 e7869f81c57a404130ed2ab8d9233fe9
BLAKE2b-256 c40ee823d5fec60abdfbbb5ea8cdf5c0d47bc830447d6d384a2c10b4ad247220

See more details on using hashes here.

Provenance

The following attestation bundles were made for venn_diagram_lab-2.4.0-py3-none-any.whl:

Publisher: python-publish.yml on ZoliQua/Venn-Diagram-Lab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page