Skip to main content

Regional association plots for GWAS results with LD coloring, gene tracks, and recombination rate overlays

Project description

pyLocusZoom

CI License: GPL v3 Python 3.10+ Ruff

Matplotlib Plotly Bokeh Pandas

pyLocusZoom logo

Regional association plots for GWAS results with LD coloring, gene tracks, and recombination rate overlays.

Inspired by LocusZoom and locuszoomr.

Features

  • LD coloring: SNPs colored by linkage disequilibrium (R²) with lead variant
  • Gene track: Annotated gene/exon positions below the association plot
  • Recombination rate: Overlay showing recombination rate across region (Canis lupus familiaris only)
  • SNP labels: Automatic labeling of top SNPs with RS ID or nearest gene
  • Species support: Built-in Canis lupus familiaris (CanFam3.1/CanFam4), Felis catus (FelCat9), or custom species
  • CanFam4 support: Automatic coordinate liftover for recombination maps
  • Multiple backends: matplotlib (static), plotly (interactive), bokeh (dashboards)
  • Stacked plots: Compare multiple GWAS/phenotypes vertically
  • eQTL overlay: Expression QTL data as separate panel
  • PySpark support: Handles large-scale genomics DataFrames

Installation

uv add pylocuszoom

Or with pip:

pip install pylocuszoom

Quick Start

from pylocuszoom import LocusZoomPlotter

# Initialize plotter (loads reference data for dog)
plotter = LocusZoomPlotter(species="dog")

# Create regional plot
fig = plotter.plot(
    gwas_df,                    # DataFrame with ps, p_wald, rs columns
    chrom=1,
    start=1000000,
    end=2000000,
    lead_pos=1500000,           # Highlight lead SNP
)

fig.savefig("regional_plot.png", dpi=150)

Full Example

from pylocuszoom import LocusZoomPlotter

plotter = LocusZoomPlotter(
    species="dog",                      # or "cat", or None for custom
    plink_path="/path/to/plink",        # Optional, auto-detects if on PATH
)

fig = plotter.plot(
    gwas_df,
    chrom=1,
    start=1000000,
    end=2000000,
    lead_pos=1500000,
    ld_reference_file="genotypes.bed",  # For LD calculation
    genes_df=genes_df,                  # Gene annotations
    exons_df=exons_df,                  # Exon annotations
    show_recombination=True,            # Overlay recombination rate
    snp_labels=True,                    # Label top SNPs
    label_top_n=5,                      # How many to label
    pos_col="ps",                       # Column name for position
    p_col="p_wald",                     # Column name for p-value
    rs_col="rs",                        # Column name for SNP ID
    figsize=(12, 8),
)

Genome Builds

The default genome build for dog is CanFam3.1. For CanFam4 data:

plotter = LocusZoomPlotter(species="dog", genome_build="canfam4")

Recombination maps are automatically lifted over from CanFam3.1 to CanFam4 coordinates using the UCSC liftOver chain file.

Using with Other Species

# Cat (LD and gene tracks, user provides recombination data)
plotter = LocusZoomPlotter(species="cat")

# Custom species (provide all reference data)
plotter = LocusZoomPlotter(
    species=None,
    recomb_data_dir="/path/to/recomb_maps/",
)

# Or provide data per-plot
fig = plotter.plot(
    gwas_df,
    chrom=1, start=1000000, end=2000000,
    recomb_df=my_recomb_dataframe,
    genes_df=my_genes_df,
)

Interactive Backends

Choose between static (matplotlib) and interactive (plotly, bokeh) outputs:

# Static publication-quality plot (default)
plotter = LocusZoomPlotter(species="dog", backend="matplotlib")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)
fig.savefig("plot.png", dpi=150)

# Interactive with plotly (hover tooltips, zoom/pan)
plotter = LocusZoomPlotter(species="dog", backend="plotly")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)
fig.write_html("plot.html")

# Interactive with bokeh (dashboard-friendly)
plotter = LocusZoomPlotter(species="dog", backend="bokeh")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)

Interactive plots show SNP details (RS ID, p-value, R²) on hover.

Stacked Plots

Compare multiple GWAS results vertically with shared x-axis:

fig = plotter.plot_stacked(
    [gwas_height, gwas_bmi, gwas_whr],
    chrom=1,
    start=1000000,
    end=2000000,
    panel_labels=["Height", "BMI", "WHR"],
    genes_df=genes_df,
)

eQTL Overlay

Add expression QTL data as a separate panel:

eqtl_df = pd.DataFrame({
    "pos": [1000500, 1001200, 1002000],
    "p_value": [1e-6, 1e-4, 0.01],
    "gene": ["BRCA1", "BRCA1", "BRCA1"],
})

fig = plotter.plot_stacked(
    [gwas_df],
    chrom=1, start=1000000, end=2000000,
    eqtl_df=eqtl_df,
    eqtl_gene="BRCA1",
    genes_df=genes_df,
)

PySpark Support

For large-scale genomics data, pass PySpark DataFrames directly:

from pylocuszoom import LocusZoomPlotter, to_pandas

# PySpark DataFrame (automatically converted)
fig = plotter.plot(spark_gwas_df, chrom=1, start=1000000, end=2000000)

# Or convert manually with sampling for very large data
pandas_df = to_pandas(spark_gwas_df, sample_size=100000)

Install PySpark support: uv add pylocuszoom[spark]

Data Formats

GWAS Results DataFrame

Required columns (names configurable via pos_col, p_col, rs_col):

Column Type Required Description
ps int Yes Genomic position in base pairs (1-based). Must match coordinate system of genes/recombination data.
p_wald float Yes Association p-value (0 < p ≤ 1). Values are -log10 transformed for plotting.
rs str No SNP identifier (e.g., "rs12345" or "chr1:12345"). Used for labeling top SNPs if snp_labels=True.

Example:

gwas_df = pd.DataFrame({
    "ps": [1000000, 1000500, 1001000],
    "p_wald": [1e-8, 1e-6, 0.05],
    "rs": ["rs123", "rs456", "rs789"],
})

Genes DataFrame

Column Type Required Description
chr str or int Yes Chromosome identifier. Accepts "1", "chr1", or 1. The "chr" prefix is stripped for matching.
start int Yes Gene start position (bp, 1-based). Transcript start for strand-aware genes.
end int Yes Gene end position (bp, 1-based). Must be ≥ start.
gene_name str Yes Gene symbol displayed in track (e.g., "BRCA1", "TP53"). Keep short for readability.

Example:

genes_df = pd.DataFrame({
    "chr": ["1", "1", "1"],
    "start": [1000000, 1050000, 1100000],
    "end": [1020000, 1080000, 1150000],
    "gene_name": ["GENE1", "GENE2", "GENE3"],
})

Exons DataFrame (optional)

Provides exon/intron structure. If omitted, genes are drawn as simple rectangles.

Column Type Required Description
chr str or int Yes Chromosome identifier.
start int Yes Exon start position (bp).
end int Yes Exon end position (bp).
gene_name str Yes Parent gene symbol. Must match gene_name in genes DataFrame.

Recombination DataFrame

Column Type Required Description
pos int Yes Genomic position (bp). Should span the plotted region with reasonable density (every ~10kb).
rate float Yes Recombination rate in centiMorgans per megabase (cM/Mb). Typical range: 0-50 cM/Mb.

Example:

recomb_df = pd.DataFrame({
    "pos": [1000000, 1010000, 1020000],
    "rate": [0.5, 2.3, 1.1],
})

Recombination Map Files

When using recomb_data_dir, files must be named chr{N}_recomb.tsv (e.g., chr1_recomb.tsv, chrX_recomb.tsv).

Format: Tab-separated with header row:

Column Description
chr Chromosome number (without "chr" prefix)
pos Position in base pairs
rate Recombination rate (cM/Mb)
cM Cumulative genetic distance (optional, not used for plotting)
chr	pos	rate	cM
1	10000	0.5	0.005
1	20000	1.2	0.017
1	30000	0.8	0.025

Reference Data

Dog recombination maps are downloaded from Campbell et al. 2016 on first use.

To manually download:

from pylocuszoom import download_dog_recombination_maps

download_dog_recombination_maps()

Logging

Logging uses loguru and is configured via the log_level parameter (default: "INFO"):

# Suppress logging
plotter = LocusZoomPlotter(log_level=None)

# Enable DEBUG level for troubleshooting
plotter = LocusZoomPlotter(log_level="DEBUG")

Requirements

  • Python >= 3.10
  • matplotlib >= 3.5.0
  • pandas >= 1.4.0
  • numpy >= 1.21.0
  • loguru >= 0.7.0
  • plotly >= 5.0.0
  • bokeh >= 3.8.2
  • kaleido >= 0.2.0 (for plotly static export)
  • pyliftover >= 0.4 (for CanFam4 coordinate liftover)
  • PLINK 1.9 (for LD calculations) - must be on PATH or specify plink_path

Optional:

  • pyspark >= 3.0.0 (for PySpark DataFrame support) - uv add pylocuszoom[spark]

License

GPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylocuszoom-0.1.0.tar.gz (175.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylocuszoom-0.1.0-py3-none-any.whl (44.9 kB view details)

Uploaded Python 3

File details

Details for the file pylocuszoom-0.1.0.tar.gz.

File metadata

  • Download URL: pylocuszoom-0.1.0.tar.gz
  • Upload date:
  • Size: 175.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pylocuszoom-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d2af862d8b5619685ab6f1486a1ed944b6afb49dd7f96ca139c474b8bff5867
MD5 31d094f67297ee902f30521c2daa3fbe
BLAKE2b-256 e50a238c8a5d84ca65a22a2caae2f86f4f27e2c45e1c78fd9426e3577407c2c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylocuszoom-0.1.0.tar.gz:

Publisher: publish.yml on michael-denyer/pyLocusZoom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pylocuszoom-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pylocuszoom-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pylocuszoom-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 100f3054706ff25003026f0888a61cc539a1016402137f898a45824a70fa4c1d
MD5 660d5e6c166b879ba9f8420d1269c2a1
BLAKE2b-256 5c7070bd93d1b332bb7337d032b87e53f19c6f3bdb4c7f25357a995ff87ba437

See more details on using hashes here.

Provenance

The following attestation bundles were made for pylocuszoom-0.1.0-py3-none-any.whl:

Publisher: publish.yml on michael-denyer/pyLocusZoom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page