Skip to main content

Multi-track circular and linear Manhattan plot generation for GWAS summary statistics

Project description

pycmplot

Multi-track circular and linear Manhattan plot generation for GWAS summary statistics.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
|  PACKAGE FOR CIRCULAR AND LINEAR MANHATTAN PLOTTING  |
|                    Kevin Esoh, 2026                  |
|                    kesohku1@jh.edu                   |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

This package will take any number of per SNP/variant summary statistics, be it GWAS, selection scans (e.g. iHS, EHH, FST), etc and generate Manhattan plots. If given a single file, a single one-track Manhattan plot will be generated. Multiple files with results in the generation of a multi-track stacked Manhattan plot.

In the process, the package will generate a hits summary table for variants with p-value (or whatever statistic for significance is used) below the user-specified significance threshold. This hits summary table will contain annotated gene names, in addition to other annotations, that would then be used to annotate the plots.

Importantly, the package allows for conversion of hg19 genomic coordinates to hg38 coordinates. This ensures that summary stats obtained using different imputation panels, for instance, can be processed in the same run. That is, users can simply concatenate multiple summary stats files together, such as those for the same trait but analysed using different imputation panels. Users only need to add a new column specifying the genome build (hg19 or hg38) of the variants. Then the --build_column option of the package should be used to indicate the column and then the package will liftover all postions in hg19 to hg38 ensuring that hits table generation and plotting are done with one unified corrdinate system.

A key functionality of the package is its ability to auto-detect certain columns if ommited on the command-line or python API:

  • Chromosome column: -chr, --chrom_column or ommited
  • Basepair position column: -pos, --pos_column or ommited
  • SNP or Marker ID column: -snp, --snp_column or ommited
  • P-value (or whatever value) column: -p, --pval_column or ommited
  • Build version column: -b, --build_column or ommited

Candidate names for each of the columns is shown below.

# Resolve column names
chr_candidates = [chrom, 'CHR', 'CHROM', 'Chromosome', '#CHROM', '#CHR', 'Chrom', 'chrom', 'chr', 'chromosome', '#chr', '#chrom']
pos_candidates = [pos, 'BP', 'POS', 'bp', 'pos', 'Basepair']
snp_candidates = [snp, 'SNP', 'RSID', 'rsID', 'MarkerName', 'MarkerID', 'Predictor', 'Marker', 'SNPID', 'ID']
pvl_candidates = [pcol, 'P', 'P-value', 'Wald_P', 'pvalue', 'p_val', 'pval']
bld_candidates = [build, 'BUILD', 'Genome', 'Genome_Build', 'Genome-build']

Since GWAS summary stats files can be very large, to improve speed and memory efficiency, it is highly recommended to use -tp, --trim_pval with a value to exclude variants with p-value above a certain threshold, e.g. 0.01 (1e-2) or 0.001 (1e-3).


Installation

From PyPI

pip install pycmplot

From GitHub

git clone https://github.com/esohkevin/pycmplot.git

cd pycmplot

pip install -e .

# or

pip install -e . --break-system-packages

Use python virtual environment if local installation is not possible

python -m venv ~/bin/pycmplot

source ~/bin/pycmplot/bin/activate

pip install --upgrade pip setuptools wheel

# then follow any of the installation steps above

Test the installation

pycmplot -h

Dependencies

Package Purpose
pandas, numpy Data loading & statistics
matplotlib Plotting backend
pycirclize Circular (Circos-style) tracks
natsort Natural chromosome sorting
adjustText Label collision avoidance
pyliftover hg19 to hg38 coordinate conversion
Pillow Image utilities

Command-line usage

Linear Manhattan (default)

pycmplot \
  --sum_stats HbF.tsv.gz,MCV.txt.gz,MCH.tsv.gz \
  --labels HbF,MCV,MCH \
  --logp \
  --signif_line \
  --highlight \
  --annotate GENE \
  --output_dir ./results \
  --output_format png \
  --dpi 300

Circular Manhattan

pycmplot \
  --sum_stats HbF.tsv.gz,MCV.tsv.gz \
  --labels HbF,MCV \
  --mode cm \
  --trim_pval 0.01 \
  --logp \
  --signif_threshold \
  --plot_title "RBC Traits" \
  --output_dir ./results

Key options

Flag Description Default
-s, --sum_stats Comma-separated sumstats files required
-l, --labels Comma-separated track labels required
-b, --build_column Genome build column name (containing hg18/hg19/hg38) required
-m, --mode lm linear or cm circular lm
-qq, --qq_plot Also generate a QQ-plot off (coming soon...)
--logp Plot -log10(p) off
-sig, --signif_threshold Genome-wide significance threshold off (auto 0.05/N)
-sigl, --signif_line Value for genome-wide significance line if different from -sig 5e-8
-sug, --suggest_threshold Threshold for suggestive signals off
-hl, --highlight Highlight significant loci off
-a, --annotate Annotate with SNP or GENE SNP
-tp, --trim_pval Trim variants above this p-value for speed off
-st, --sort_track Sort tracks by label or chrom_len input order
-od, --output_dir Output directory .
-of, --output_format Output format (png, pdf, svg, jpg) png

Run pycmplot -h for the full option list.


Python API

from pycmplot import plot_linear
import pandas as pd

df1 = pd.read_csv("HbF.tsv.gz", sep="\t")
df2 = pd.read_csv("MCV.tsv.gz", sep="\t")

plot_linear(
    tracks=[df1, df2],
    track_labels=["HbF", "MCV"],
    chr_col="CHR",
    pos_col="POS",
    p_col="P",
    logp=True,
    highlight=True,
    plot_title="results/HbF_MCV.png",
    figsize=(15, 8),
)

Package structure

pycmplot/
├── pyproject.toml
├── setup.py
├── setup.cfg
├── README.md
└── pycmplot/
      ├── __init__.py          # public API exports
      ├── __main__.py          # python -m pycmplot
      ├── _core.py             # main() orchestration
      ├── cli.py               # argparse definitions
      ├── constants.py         # chromosome lengths, biotype weights
      ├── resources.py         # external resource path config
      ├── io.py                # sumstat loading, delimiter detection
      ├── stats.py             # get_lead_snps, get_highlight_snps
      ├── liftover.py          # lazy hg19→hg38 liftover
      ├── annotation.py        # nearest-gene annotation, hits table
      └── plotting/
          ├── __init__.py
          ├── linear.py        # plot_linear
          └── circular.py      # plot_circular, compute_track_radii_dict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycmplot-0.1.5.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycmplot-0.1.5-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file pycmplot-0.1.5.tar.gz.

File metadata

  • Download URL: pycmplot-0.1.5.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pycmplot-0.1.5.tar.gz
Algorithm Hash digest
SHA256 de5e978b00992a403efd70b1888fb278fc8c8c0f64ae8c38e7e4ff76ee10c04e
MD5 76993ec568928da74420de5dd4df6c40
BLAKE2b-256 f636fd78f34dcae937cab7d9a4032b5140997ee5da7b1580878e7a2d0db9eeb2

See more details on using hashes here.

File details

Details for the file pycmplot-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pycmplot-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pycmplot-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fe8104be063b6d27a296bbb044e96351e2cb28fa7017e1b1c679d1383e9de091
MD5 c78bb643839b2bec47e39f3937db52d7
BLAKE2b-256 5dcbc47526a9f7e48a68f5eebf2e7cf3135701e957dd75a28b4c1d28f13c2a99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page