Regional association plots for GWAS results with LD coloring, gene tracks, and recombination rate overlays
Project description
pyLocusZoom
Regional association plots for GWAS results with LD coloring, gene tracks, and recombination rate overlays.
Inspired by LocusZoom and locuszoomr.
Features
- LD coloring: SNPs colored by linkage disequilibrium (R²) with lead variant
- Gene track: Annotated gene/exon positions below the association plot
- Recombination rate: Overlay showing recombination rate across region (Canis lupus familiaris only)
- SNP labels: Automatic labeling of top SNPs with RS ID or nearest gene
- Species support: Built-in Canis lupus familiaris (CanFam3.1/CanFam4), Felis catus (FelCat9), or custom species
- CanFam4 support: Automatic coordinate liftover for recombination maps
- Multiple backends: matplotlib (static), plotly (interactive), bokeh (dashboards)
- Stacked plots: Compare multiple GWAS/phenotypes vertically
- eQTL overlay: Expression QTL data as separate panel
- PySpark support: Handles large-scale genomics DataFrames
Installation
uv add pylocuszoom
Or with pip:
pip install pylocuszoom
Quick Start
from pylocuszoom import LocusZoomPlotter
# Initialize plotter (loads reference data for dog)
plotter = LocusZoomPlotter(species="dog")
# Create regional plot
fig = plotter.plot(
gwas_df, # DataFrame with ps, p_wald, rs columns
chrom=1,
start=1000000,
end=2000000,
lead_pos=1500000, # Highlight lead SNP
)
fig.savefig("regional_plot.png", dpi=150)
Full Example
from pylocuszoom import LocusZoomPlotter
plotter = LocusZoomPlotter(
species="dog", # or "cat", or None for custom
plink_path="/path/to/plink", # Optional, auto-detects if on PATH
)
fig = plotter.plot(
gwas_df,
chrom=1,
start=1000000,
end=2000000,
lead_pos=1500000,
ld_reference_file="genotypes.bed", # For LD calculation
genes_df=genes_df, # Gene annotations
exons_df=exons_df, # Exon annotations
show_recombination=True, # Overlay recombination rate
snp_labels=True, # Label top SNPs
label_top_n=5, # How many to label
pos_col="ps", # Column name for position
p_col="p_wald", # Column name for p-value
rs_col="rs", # Column name for SNP ID
figsize=(12, 8),
)
Genome Builds
The default genome build for dog is CanFam3.1. For CanFam4 data:
plotter = LocusZoomPlotter(species="dog", genome_build="canfam4")
Recombination maps are automatically lifted over from CanFam3.1 to CanFam4 coordinates using the UCSC liftOver chain file.
Using with Other Species
# Cat (LD and gene tracks, user provides recombination data)
plotter = LocusZoomPlotter(species="cat")
# Custom species (provide all reference data)
plotter = LocusZoomPlotter(
species=None,
recomb_data_dir="/path/to/recomb_maps/",
)
# Or provide data per-plot
fig = plotter.plot(
gwas_df,
chrom=1, start=1000000, end=2000000,
recomb_df=my_recomb_dataframe,
genes_df=my_genes_df,
)
Interactive Backends
Choose between static (matplotlib) and interactive (plotly, bokeh) outputs:
# Static publication-quality plot (default)
plotter = LocusZoomPlotter(species="dog", backend="matplotlib")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)
fig.savefig("plot.png", dpi=150)
# Interactive with plotly (hover tooltips, zoom/pan)
plotter = LocusZoomPlotter(species="dog", backend="plotly")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)
fig.write_html("plot.html")
# Interactive with bokeh (dashboard-friendly)
plotter = LocusZoomPlotter(species="dog", backend="bokeh")
fig = plotter.plot(gwas_df, chrom=1, start=1000000, end=2000000)
Interactive plots show SNP details (RS ID, p-value, R²) on hover.
Stacked Plots
Compare multiple GWAS results vertically with shared x-axis:
fig = plotter.plot_stacked(
[gwas_height, gwas_bmi, gwas_whr],
chrom=1,
start=1000000,
end=2000000,
panel_labels=["Height", "BMI", "WHR"],
genes_df=genes_df,
)
eQTL Overlay
Add expression QTL data as a separate panel:
eqtl_df = pd.DataFrame({
"pos": [1000500, 1001200, 1002000],
"p_value": [1e-6, 1e-4, 0.01],
"gene": ["BRCA1", "BRCA1", "BRCA1"],
})
fig = plotter.plot_stacked(
[gwas_df],
chrom=1, start=1000000, end=2000000,
eqtl_df=eqtl_df,
eqtl_gene="BRCA1",
genes_df=genes_df,
)
PySpark Support
For large-scale genomics data, pass PySpark DataFrames directly:
from pylocuszoom import LocusZoomPlotter, to_pandas
# PySpark DataFrame (automatically converted)
fig = plotter.plot(spark_gwas_df, chrom=1, start=1000000, end=2000000)
# Or convert manually with sampling for very large data
pandas_df = to_pandas(spark_gwas_df, sample_size=100000)
Install PySpark support: uv add pylocuszoom[spark]
Data Formats
GWAS Results DataFrame
Required columns (names configurable via pos_col, p_col, rs_col):
| Column | Type | Required | Description |
|---|---|---|---|
ps |
int | Yes | Genomic position in base pairs (1-based). Must match coordinate system of genes/recombination data. |
p_wald |
float | Yes | Association p-value (0 < p ≤ 1). Values are -log10 transformed for plotting. |
rs |
str | No | SNP identifier (e.g., "rs12345" or "chr1:12345"). Used for labeling top SNPs if snp_labels=True. |
Example:
gwas_df = pd.DataFrame({
"ps": [1000000, 1000500, 1001000],
"p_wald": [1e-8, 1e-6, 0.05],
"rs": ["rs123", "rs456", "rs789"],
})
Genes DataFrame
| Column | Type | Required | Description |
|---|---|---|---|
chr |
str or int | Yes | Chromosome identifier. Accepts "1", "chr1", or 1. The "chr" prefix is stripped for matching. |
start |
int | Yes | Gene start position (bp, 1-based). Transcript start for strand-aware genes. |
end |
int | Yes | Gene end position (bp, 1-based). Must be ≥ start. |
gene_name |
str | Yes | Gene symbol displayed in track (e.g., "BRCA1", "TP53"). Keep short for readability. |
Example:
genes_df = pd.DataFrame({
"chr": ["1", "1", "1"],
"start": [1000000, 1050000, 1100000],
"end": [1020000, 1080000, 1150000],
"gene_name": ["GENE1", "GENE2", "GENE3"],
})
Exons DataFrame (optional)
Provides exon/intron structure. If omitted, genes are drawn as simple rectangles.
| Column | Type | Required | Description |
|---|---|---|---|
chr |
str or int | Yes | Chromosome identifier. |
start |
int | Yes | Exon start position (bp). |
end |
int | Yes | Exon end position (bp). |
gene_name |
str | Yes | Parent gene symbol. Must match gene_name in genes DataFrame. |
Recombination DataFrame
| Column | Type | Required | Description |
|---|---|---|---|
pos |
int | Yes | Genomic position (bp). Should span the plotted region with reasonable density (every ~10kb). |
rate |
float | Yes | Recombination rate in centiMorgans per megabase (cM/Mb). Typical range: 0-50 cM/Mb. |
Example:
recomb_df = pd.DataFrame({
"pos": [1000000, 1010000, 1020000],
"rate": [0.5, 2.3, 1.1],
})
Recombination Map Files
When using recomb_data_dir, files must be named chr{N}_recomb.tsv (e.g., chr1_recomb.tsv, chrX_recomb.tsv).
Format: Tab-separated with header row:
| Column | Description |
|---|---|
chr |
Chromosome number (without "chr" prefix) |
pos |
Position in base pairs |
rate |
Recombination rate (cM/Mb) |
cM |
Cumulative genetic distance (optional, not used for plotting) |
chr pos rate cM
1 10000 0.5 0.005
1 20000 1.2 0.017
1 30000 0.8 0.025
Reference Data
Dog recombination maps are downloaded from Campbell et al. 2016 on first use.
To manually download:
from pylocuszoom import download_dog_recombination_maps
download_dog_recombination_maps()
Logging
Logging uses loguru and is configured via the log_level parameter (default: "INFO"):
# Suppress logging
plotter = LocusZoomPlotter(log_level=None)
# Enable DEBUG level for troubleshooting
plotter = LocusZoomPlotter(log_level="DEBUG")
Requirements
- Python >= 3.10
- matplotlib >= 3.5.0
- pandas >= 1.4.0
- numpy >= 1.21.0
- loguru >= 0.7.0
- plotly >= 5.0.0
- bokeh >= 3.8.2
- kaleido >= 0.2.0 (for plotly static export)
- pyliftover >= 0.4 (for CanFam4 coordinate liftover)
- PLINK 1.9 (for LD calculations) - must be on PATH or specify
plink_path
Optional:
- pyspark >= 3.0.0 (for PySpark DataFrame support) -
uv add pylocuszoom[spark]
License
GPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pylocuszoom-0.1.0.tar.gz.
File metadata
- Download URL: pylocuszoom-0.1.0.tar.gz
- Upload date:
- Size: 175.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d2af862d8b5619685ab6f1486a1ed944b6afb49dd7f96ca139c474b8bff5867
|
|
| MD5 |
31d094f67297ee902f30521c2daa3fbe
|
|
| BLAKE2b-256 |
e50a238c8a5d84ca65a22a2caae2f86f4f27e2c45e1c78fd9426e3577407c2c8
|
Provenance
The following attestation bundles were made for pylocuszoom-0.1.0.tar.gz:
Publisher:
publish.yml on michael-denyer/pyLocusZoom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pylocuszoom-0.1.0.tar.gz -
Subject digest:
5d2af862d8b5619685ab6f1486a1ed944b6afb49dd7f96ca139c474b8bff5867 - Sigstore transparency entry: 855157165
- Sigstore integration time:
-
Permalink:
michael-denyer/pyLocusZoom@8cdd2678d4a51dc34c0297d6a87963828f762b23 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/michael-denyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8cdd2678d4a51dc34c0297d6a87963828f762b23 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pylocuszoom-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pylocuszoom-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
100f3054706ff25003026f0888a61cc539a1016402137f898a45824a70fa4c1d
|
|
| MD5 |
660d5e6c166b879ba9f8420d1269c2a1
|
|
| BLAKE2b-256 |
5c7070bd93d1b332bb7337d032b87e53f19c6f3bdb4c7f25357a995ff87ba437
|
Provenance
The following attestation bundles were made for pylocuszoom-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on michael-denyer/pyLocusZoom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pylocuszoom-0.1.0-py3-none-any.whl -
Subject digest:
100f3054706ff25003026f0888a61cc539a1016402137f898a45824a70fa4c1d - Sigstore transparency entry: 855157169
- Sigstore integration time:
-
Permalink:
michael-denyer/pyLocusZoom@8cdd2678d4a51dc34c0297d6a87963828f762b23 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/michael-denyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8cdd2678d4a51dc34c0297d6a87963828f762b23 -
Trigger Event:
release
-
Statement type: