Skip to main content

Publication-ready regional association plots with LD coloring, gene tracks, and recombination overlays

Project description

CI codecov PyPI Bioconda License: GPL v3 Python 3.10+ Ruff Matplotlib Plotly Bokeh Pandas pyLocusZoom logo

pyLocusZoom

Publication-ready regional association plots with LD coloring, gene tracks, and recombination overlays.

Inspired by LocusZoom and locuszoomr.

Features

  1. Regional association plot:

    • Multi-species support: Built-in reference data for Canis lupus familiaris (CanFam3.1/CanFam4) and Felis catus (FelCat9), or optionally provide your own for any species
    • LD coloring: SNPs colored by linkage disequilibrium (R²) with lead variant
    • Gene tracks: Annotated gene/exon positions below the association plot
    • Recombination rate: Overlay showing recombination rate across region (Canis lupus familiaris only)
    • SNP labels (matplotlib): Automatic labeling of lead SNPs with RS ID
    • Tooltips (Bokeh and Plotly): Mouseover for detailed SNP data

Example regional association plot

  1. Stacked plots: Compare multiple GWAS/phenotypes vertically
  2. eQTL plot: Expression QTL data aligned with association plots and gene tracks
  3. Fine-mapping plots: Visualize SuSiE credible sets with posterior inclusion probabilities
  4. Multiple charting libraries: matplotlib (static), plotly (interactive), bokeh (dashboards)
  5. Pandas and PySpark support: Works with both Pandas and PySpark DataFrames for large-scale genomics data
  6. Convenience data file loaders: Load and validate common GWAS, eQTL and fine-mapping file formats

Installation

pip install pylocuszoom

Or with uv:

uv add pylocuszoom

Or with conda (Bioconda):

conda install -c bioconda pylocuszoom

Quick Start

from pylocuszoom import LocusZoomPlotter

# Initialize plotter (loads reference data for canine)
plotter = LocusZoomPlotter(species="canine")

# Create regional plot
fig = plotter.plot(
    gwas_df,                    # DataFrame with ps, p_wald, rs columns
    chrom=1,
    start=1000000,
    end=2000000,
    lead_pos=1500000,           # Highlight lead SNP
)

fig.savefig("regional_plot.png", dpi=150)

Full Example

from pylocuszoom import LocusZoomPlotter

plotter = LocusZoomPlotter(
    species="canine",                   # or "feline", or None for custom
    plink_path="/path/to/plink",        # Optional, auto-detects if on PATH
)

fig = plotter.plot(
    gwas_df,
    chrom=1,
    start=1000000,
    end=2000000,
    lead_pos=1500000,
    ld_reference_file="genotypes.bed",  # For LD calculation
    genes_df=genes_df,                  # Gene annotations
    exons_df=exons_df,                  # Exon annotations
    show_recombination=True,            # Overlay recombination rate
    snp_labels=True,                    # Label top SNPs
    label_top_n=5,                      # How many to label
    pos_col="ps",                       # Column name for position
    p_col="p_wald",                     # Column name for p-value
    rs_col="rs",                        # Column name for SNP ID
    figsize=(12, 8),
)

Genome Builds

The default genome build for canine is CanFam3.1. For CanFam4 data:

plotter = LocusZoomPlotter(species="canine", genome_build="canfam4")

Recombination maps are automatically lifted over from CanFam3.1 to CanFam4 coordinates using the UCSC liftOver chain file.

Using with Other Species

# Feline (LD and gene tracks, user provides recombination data)
plotter = LocusZoomPlotter(species="feline")

# Custom species (provide all reference data)
plotter = LocusZoomPlotter(
    species=None,
    recomb_data_dir="/path/to/recomb_maps/",
)

# Or provide data per-plot
fig = plotter.plot(
    gwas_df,
    chrom=1, start=1000000, end=2000000,
    recomb_df=my_recomb_dataframe,
    genes_df=my_genes_df,
)

Backends

pyLocusZoom supports multiple rendering backends:

# Static publication-quality plot (default)
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000, backend="matplotlib")
fig.savefig("plot.png", dpi=150)

# Interactive Plotly (hover tooltips, pan/zoom)
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000, backend="plotly")
fig.write_html("plot.html")

# Interactive Bokeh (dashboard-ready)
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000, backend="bokeh")
Backend Output Best For Features
matplotlib Static PNG/PDF/SVG Publications, presentations Full feature set with SNP labels
plotly Interactive HTML Web reports, data exploration Hover tooltips, pan/zoom
bokeh Interactive HTML Dashboards, web apps Hover tooltips, pan/zoom

Note: All backends support scatter plots, gene tracks, recombination overlay, and LD legend. SNP labels (auto-positioned with adjustText) are matplotlib-only; interactive backends use hover tooltips instead.

Stacked Plots

Compare multiple GWAS results vertically with shared x-axis:

fig = plotter.plot_stacked(
    [gwas_height, gwas_bmi, gwas_whr],
    chrom=1,
    start=1000000,
    end=2000000,
    panel_labels=["Height", "BMI", "WHR"],
    genes_df=genes_df,
)

Example stacked plot

eQTL Overlay

Add expression QTL data as a separate panel:

eqtl_df = pd.DataFrame({
    "pos": [1000500, 1001200, 1002000],
    "p_value": [1e-6, 1e-4, 0.01],
    "gene": ["BRCA1", "BRCA1", "BRCA1"],
})

fig = plotter.plot_stacked(
    [gwas_df],
    chrom=1, start=1000000, end=2000000,
    eqtl_df=eqtl_df,
    eqtl_gene="BRCA1",
    genes_df=genes_df,
)

Example eQTL overlay plot

Fine-mapping Visualization

Visualize SuSiE or other fine-mapping results with credible set coloring:

finemapping_df = pd.DataFrame({
    "pos": [1000500, 1001200, 1002000, 1003500],
    "pip": [0.85, 0.12, 0.02, 0.45],  # Posterior inclusion probability
    "cs": [1, 1, 0, 2],               # Credible set assignment (0 = not in CS)
})

fig = plotter.plot_stacked(
    [gwas_df],
    chrom=1, start=1000000, end=2000000,
    finemapping_df=finemapping_df,
    finemapping_cs_col="cs",
    genes_df=genes_df,
)

Example fine-mapping plot

PySpark Support

For large-scale genomics data, pass PySpark DataFrames directly:

from pylocuszoom import LocusZoomPlotter, to_pandas

# PySpark DataFrame (automatically converted)
fig = plotter.plot(spark_gwas_df, chrom=1, start=1000000, end=2000000)

# Or convert manually with sampling for very large data
pandas_df = to_pandas(spark_gwas_df, sample_size=100000)

Install PySpark support: uv add pylocuszoom[spark]

Loading Data from Files

pyLocusZoom includes loaders for common GWAS, eQTL, and fine-mapping file formats:

from pylocuszoom import (
    # GWAS loaders
    load_gwas,           # Auto-detect format
    load_plink_assoc,    # PLINK .assoc, .assoc.linear, .qassoc
    load_regenie,        # REGENIE .regenie
    load_bolt_lmm,       # BOLT-LMM .stats
    load_gemma,          # GEMMA .assoc.txt
    load_saige,          # SAIGE output
    # eQTL loaders
    load_gtex_eqtl,      # GTEx significant pairs
    load_eqtl_catalogue, # eQTL Catalogue format
    # Fine-mapping loaders
    load_susie,          # SuSiE output
    load_finemap,        # FINEMAP .snp output
    # Gene annotations
    load_gtf,            # GTF/GFF3 files
    load_bed,            # BED files
)

# Auto-detect GWAS format from filename
gwas_df = load_gwas("results.assoc.linear")

# Or use specific loader
gwas_df = load_regenie("ukb_results.regenie")

# Load gene annotations
genes_df = load_gtf("genes.gtf", feature_type="gene")
exons_df = load_gtf("genes.gtf", feature_type="exon")

# Load eQTL data
eqtl_df = load_gtex_eqtl("GTEx.signif_pairs.txt.gz", gene="BRCA1")

# Load fine-mapping results
fm_df = load_susie("susie_output.tsv")

Data Formats

GWAS Results DataFrame

Required columns (names configurable via pos_col, p_col, rs_col):

Column Type Required Description
ps int Yes Genomic position in base pairs (1-based). Must match coordinate system of genes/recombination data.
p_wald float Yes Association p-value (0 < p ≤ 1). Values are -log10 transformed for plotting.
rs str No SNP identifier (e.g., "rs12345" or "chr1:12345"). Used for labeling top SNPs if snp_labels=True.

Example:

gwas_df = pd.DataFrame({
    "ps": [1000000, 1000500, 1001000],
    "p_wald": [1e-8, 1e-6, 0.05],
    "rs": ["rs123", "rs456", "rs789"],
})

Genes DataFrame

Column Type Required Description
chr str or int Yes Chromosome identifier. Accepts "1", "chr1", or 1. The "chr" prefix is stripped for matching.
start int Yes Gene start position (bp, 1-based). Transcript start for strand-aware genes.
end int Yes Gene end position (bp, 1-based). Must be ≥ start.
gene_name str Yes Gene symbol displayed in track (e.g., "BRCA1", "TP53"). Keep short for readability.

Example:

genes_df = pd.DataFrame({
    "chr": ["1", "1", "1"],
    "start": [1000000, 1050000, 1100000],
    "end": [1020000, 1080000, 1150000],
    "gene_name": ["GENE1", "GENE2", "GENE3"],
})

Exons DataFrame (optional)

Provides exon/intron structure. If omitted, genes are drawn as simple rectangles.

Column Type Required Description
chr str or int Yes Chromosome identifier.
start int Yes Exon start position (bp).
end int Yes Exon end position (bp).
gene_name str Yes Parent gene symbol. Must match gene_name in genes DataFrame.

Recombination DataFrame

Column Type Required Description
pos int Yes Genomic position (bp). Should span the plotted region with reasonable density (every ~10kb).
rate float Yes Recombination rate in centiMorgans per megabase (cM/Mb). Typical range: 0-50 cM/Mb.

Example:

recomb_df = pd.DataFrame({
    "pos": [1000000, 1010000, 1020000],
    "rate": [0.5, 2.3, 1.1],
})

Recombination Map Files

When using recomb_data_dir, files must be named chr{N}_recomb.tsv (e.g., chr1_recomb.tsv, chrX_recomb.tsv).

Format: Tab-separated with header row:

Column Description
chr Chromosome number (without "chr" prefix)
pos Position in base pairs
rate Recombination rate (cM/Mb)
cM Cumulative genetic distance (optional, not used for plotting)
chr	pos	rate	cM
1	10000	0.5	0.005
1	20000	1.2	0.017
1	30000	0.8	0.025

Reference Data

Canine recombination maps are downloaded from Campbell et al. 2016 on first use.

To manually download:

from pylocuszoom import download_canine_recombination_maps

download_canine_recombination_maps()

Logging

Logging uses loguru and is configured via the log_level parameter (default: "INFO"):

# Suppress logging
plotter = LocusZoomPlotter(log_level=None)

# Enable DEBUG level for troubleshooting
plotter = LocusZoomPlotter(log_level="DEBUG")

Requirements

  • Python >= 3.10
  • matplotlib >= 3.5.0
  • pandas >= 1.4.0
  • numpy >= 1.21.0
  • loguru >= 0.7.0
  • plotly >= 5.0.0
  • bokeh >= 3.8.2
  • kaleido >= 0.2.0 (for plotly static export)
  • pyliftover >= 0.4 (for CanFam4 coordinate liftover)
  • PLINK 1.9 (for LD calculations) - must be on PATH or specify plink_path

Optional:

  • pyspark >= 3.0.0 (for PySpark DataFrame support) - uv add pylocuszoom[spark]

Documentation

License

GPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylocuszoom-0.5.0.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylocuszoom-0.5.0-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

File details

Details for the file pylocuszoom-0.5.0.tar.gz.

File metadata

  • Download URL: pylocuszoom-0.5.0.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pylocuszoom-0.5.0.tar.gz
Algorithm Hash digest
SHA256 5aa46c51631e2c736867b85144f390da94f813146eeef5f943038a772820022e
MD5 7ee874e3b9331a2e7cfa9d13f2ba2637
BLAKE2b-256 87200e40635ad9efe650dc0fc2eb948e954edee8a0a46682b1f5e7cdd351c39c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylocuszoom-0.5.0.tar.gz:

Publisher: publish.yml on michael-denyer/pyLocusZoom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylocuszoom-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pylocuszoom-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pylocuszoom-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90f224066c22bcf53f6e538efa23ed5c99cb7293d2deaf4af3de11db6956c30d
MD5 80f02b252fb80cb5fae9f217847595b5
BLAKE2b-256 48ff6baa0c9a2f32a2da70ed68fb462107940cb0d45ce4e85d0cedbfab1111fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylocuszoom-0.5.0-py3-none-any.whl:

Publisher: publish.yml on michael-denyer/pyLocusZoom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page