Skip to main content

Geneview: A python package for genomics data visualization.

Project description

geneview: A python package for visualizing genomics data

PyPI Version Python Tests Code Coverage

geneview is a library for making attractive and informative genomics graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including support for numpy and pandas data structures. And now it is actively developed.

Some of the features that geneview offers are:

  • Manhattan plot — GWAS association results with significance thresholds, top-SNP annotation, and chromosome zoom.
  • Q-Q plot — Quantile-quantile plots for P-value distributions with genomic inflation factor (λ).
  • Admixture plot — Population structure visualization from ADMIXTURE output (.Q files) with hierarchical clustering.
  • Venn diagram — Set intersection diagrams for 2–6 datasets with customizable petal labels and colors.
  • Karyotype plot — Cytogenetic band visualization with G-banding color schemes.
  • Color palettes — Curated color schemes (XKCD RGB, Circos, matplotlib colormaps) optimized for genomics figures.
  • High-level abstractions for structuring grids of plots that let you easily build complex visualizations.

Installation

To install the released version, just do

pip install geneview

This command will install geneview and all the dependencies.

Install from source

git clone https://github.com/ShujiaHuang/geneview.git
cd geneview
pip install .

Quick start

Manhattan and Q-Q plot

We use a PLINK2.x association output data gwas.csv which is in geneview-data directory, as the input for the plots below. Here is the format preview of gwas:

#CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE T_STAT P
chr1 904165 1_904165 G A A ADD 282 -0.0908897 0.195476 -0.464967 0.642344
chr1 1563691 1_1563691 T G G ADD 271 0.447021 0.422194 1.0588 0.290715
chr1 1707740 1_1707740 T G G ADD 283 0.149911 0.161387 0.928888 0.353805
chr1 2284195 1_2284195 T C C ADD 275 -0.024704 0.13966 -0.176887 0.859739
chr1 2779043 1_2779043 T C T ADD 272 -0.111771 0.139929 -0.79877 0.425182
chr1 2944527 1_2944527 G A A ADD 276 -0.054472 0.166038 -0.32807 0.743129
chr1 3803755 1_3803755 T C T ADD 283 -0.0392713 0.128528 -0.305547 0.760193
chr1 4121584 1_4121584 A G G ADD 279 0.120902 0.127063 0.951511 0.342239
chr1 4170048 1_4170048 C T T ADD 280 0.250807 0.143423 1.74873 0.0815274
chr1 4180842 1_4180842 C T T ADD 277 0.209195 0.146122 1.43165 0.153469
chr1 6053630 1_6053630 T G G ADD 269 -0.210917 0.129069 -1.63414 0.103503
chr1 7569602 1_7569602 C T C ADD 281 -0.136834 0.13265 -1.03154 0.303249
chr1 7575666 1_7575666 T C C ADD 277 -0.231278 0.159448 -1.45049 0.14815

Manhattan plot with default parameters

The manhattanplot() function in geneview takes a data frame with columns containing the chromosomal name/id, chromosomal position, P-value and optionally the name of SNP(e.g. rsID in dbSNP).

By default, manhattanplot() looks for column names corresponding to those outout by the plink2 association results, namely, #CHROM, POS, P, and ID, although different column names can be specificed by user. Calling manhattanplot() function with a data frame of GWAS results as the single argument draws a basic manhattan plot, defaulting to a darkblue and lightblue color scheme.

import matplotlib.pyplot as plt
import geneview as gv

# load data
df = gv.load_dataset("gwas")
# Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.
ax = gv.manhattanplot(data=df)
plt.show()

manhattan_plot.png

Rotate the x-axis tick label by setting xticklabel_kws to avoid label overlap:

ax = manhattanplot(data=df, xticklabel_kws={"rotation": "vertical"})

manhattan_plot.png

Or rotate the labels 45 degrees by setting xticklabel_kws={"rotation": 45}.

When run with default parameters, the manhattanplot() function draws horizontal lines drawn at $-log_{10}{(1e-5)}$ for "suggestive" associations and $-log_{10}{(5e-8)}$ for the "genome-wide significant" threshold. These can be move to different locations or turned off completely with the arguments suggestiveline and genomewideline, respectively.

ax = manhattanplot(data=df,
                   suggestiveline=None,  # Turn off suggestiveline
                   genomewideline=None,  # Turn off genomewideline
                   xticklabel_kws={"rotation": "vertical"})

manhattan_plot_xviertical_noline.png

The behavior of the manhattanplot function changes slightly when results from only a single chromosome is used. Here, instead of plotting alternating colors and chromosome ID on the x-axis, the SNP's position on the chromosome is plotted on the x-axis:

# plot only results of chromosome 8.
manhattanplot(data=df, CHR="chr8", xlabel="Chromosome 8")

manhattan_plot_xviertical_noline.png

manhattanplot() funcion has the ability to highlight SNPs with significant GWAS signal and annotate the Top SNP, which has the lowest P-value:

ax = manhattanplot(data=df,
                   sign_marker_p=1e-6,  # highline the significant SNP with ``sign_marker_color`` color.
                   is_annotate_topsnp=True,  # annotate the top SNP
                   xticklabel_kws={"rotation": "vertical"})

manhattan_anno_plot.png

Additionally, highlighting SNPs of interest can be combined with limiting to a single chromosome to enable "zooming" into a particular region containing SNPs of interest.

manhattan_anno_plot.png

Show a better manhattan plot

Futher graphical parameters can be passed to the manhattanplot() function to control thing like plot title, point character, size, colors, etc. Here is the example:

import matplotlib.pyplot as plt
import geneview as gv

# common parameters for plotting
plt_params = {
    "pdf.fonttype": 42,
    "font.sans-serif": "Arial",
    "legend.fontsize": 14,
    "axes.titlesize": 18,
    "axes.labelsize": 16,
    "xtick.labelsize": 14,
    "ytick.labelsize": 14
}
plt.rcParams.update(plt_params)

# Create a manhattan plot
f, ax = plt.subplots(figsize=(12, 4), facecolor="w", edgecolor="k")
xtick = set(["chr" + i for i in list(map(str, range(1, 10))) + ["11", "13", "15", "18", "21", "X"]])
_ = gv.manhattanplot(data=df,
                     marker=".",
                     sign_marker_p=1e-6,  # Genome wide significant p-value
                     sign_marker_color="r",
                     snp="ID",  # The column name of annotation information for top SNPs.

                     title="Test",
                     xtick_label_set=xtick,
                  
                     xlabel="Chromosome",
                     ylabel=r"$-log_{10}{(P)}$",

                     sign_line_cols=["#D62728", "#2CA02C"],
                     hline_kws={"linestyle": "--", "lw": 1.3},

                     is_annotate_topsnp=True,
                     ld_block_size=50000,  # 50000 bp
                     text_kws={"fontsize": 12,
                               "arrowprops": dict(arrowstyle="-", color="k", alpha=0.6)},
                     ax=ax)

manhattan.png

QQ plot with default parameters

The qqplot() function can be used to generate a Q-Q plot to visualize the distribution of association "P-value". The qqplot() function takes a vector of P-values as its the only required argument.

import matplotlib.pyplot as plt
import geneview as gv

# load data
df = gv.load_dataset("gwas")
# Plot a basic manhattan plot with horizontal xtick labels and the figure will display in screen.
ax = gv.qqplot(data=df["P"])
plt.show()

qq.png

Show a better QQ plot

Futher graphical parameters can be passed to qqplot() to control the plot title, axis labels, point characters, colors, points sizes, etc. Here is the example:

import matplotlib.pyplot as plt
import geneview as gv

f, ax = plt.subplots(figsize=(6, 6), facecolor="w", edgecolor="k")
_ = gv.qqplot(data=df["P"],
              marker="o",
              title="Test",
              xlabel=r"Expected $-log_{10}{(P)}$",
              ylabel=r"Observed $-log_{10}{(P)}$",
              ax=ax)

Admixture plot

Generate Admixture plot from the raw admixture output result:

simple example for admixtureplot

import matplotlib.pyplot as plt
from geneview import load_dataset
from geneview import admixtureplot

f, ax = plt.subplots(1, 1, figsize=(14, 2), facecolor="w", constrained_layout=True, dpi=300)
admixtureplot(data=load_dataset("admixture_output.Q"), 
              population_info=load_dataset("admixture_population.info"),
              ylabel_kws={"rotation": 45, "ha": "right"},
              ax=ax)

admixtureplot

or

import matplotlib.pyplot as plt
import geneview as gv

admixture_output_fn = gv.load_dataset("admixture_output.Q")
population_group_fn = gv.load_dataset("admixture_population.info")

# define the order for population to plot
pop_group_1kg = ["KHV", "CDX", "CHS", "CHB", "JPT", "BEB", "STU", "ITU", "GIH", "PJL", "FIN", 
                 "CEU", "GBR", "IBS", "TSI", "PEL", "PUR", "MXL", "CLM", "ASW", "ACB", "GWD", 
                 "MSL", "YRI", "ESN", "LWK"]

f, ax = plt.subplots(1, 1, figsize=(14, 2), facecolor="w", constrained_layout=True, dpi=300)
gv.admixtureplot(data=admixture_output_fn, 
                        population_info=population_group_fn,
                        edgewidth=2.0,
                        group_order=pop_group_1kg,
                        shuffle_popsample_kws={"frac": 0.5},
                        ylabel_kws={"rotation": 45, "ha": "right"},
                        ax=ax)

admixtureplot

Venn plots

Venn diagrams for 2, 3, 4, 5, 6 sets.

Venn.png

Minimal venn plot example

import geneview as gv

table = {
    "Dataset 1": {"A", "B", "D", "E"},
    "Dataset 2": {"C", "F", "B", "G"},
    "Dataset 3": {"J", "C", "K"}
}
ax = gv.venn(table) 

venn.png

Manual adjustment of petal labels

If necessary, the labels on the petals (i.e., various intersections in the Venn diagram) can be adjusted manually.

For this, generate_petal_labels() can be called first to get the petal_labels dictionary, which can be modified.

After modification, pass petal_labels to functions venn().

from numpy.random import choice
import geneview as gv

dataset_dict = {
    name: set(choice(1000, 250, replace=False))
    for name in list("ABCD")
}

petal_labels = gv.generate_petal_labels(dataset_dict.values(), fmt="{logic}\n({percentage:.1f}%)") 
ax = gv.venn(data=petal_labels, names=list(dataset_dict.keys()), legend_use_petal_color=True)

venn4.png

Karyotype plot

Karyotype plots display cytogenetic bands with standard G-banding stain colors.

import matplotlib.pyplot as plt
import geneview as gv

k_fn = gv.load_dataset("karyotype_human_hg19.txt")
fig, ax = plt.subplots(figsize=(20, 5))
_ = gv.karyoplot(k_fn, ax=ax)
plt.show()

Dependencies

Geneview supports Python 3.7+ and requires the following packages:

Citation

If you use geneview in your research, please cite:

Huang, S. geneview: A python package for visualizing genomics data. https://github.com/ShujiaHuang/geneview

License

Released under a GPL-3.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geneview-0.3.0.tar.gz (87.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geneview-0.3.0-py3-none-any.whl (155.1 kB view details)

Uploaded Python 3

File details

Details for the file geneview-0.3.0.tar.gz.

File metadata

  • Download URL: geneview-0.3.0.tar.gz
  • Upload date:
  • Size: 87.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for geneview-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bba0ddff0eace3b63a5bcd374443bc4e8d003cc1614732d4387e7e8348815d95
MD5 1adce939b4532adfe4ab6538ebeb86ed
BLAKE2b-256 5e4d7a19047e43d789a698b1637324f9eed46cd2e5c1ad5570d1be20418c592c

See more details on using hashes here.

File details

Details for the file geneview-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: geneview-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 155.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for geneview-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1ddee6bd109cc6e7c0cbadfda8383def711ad97c70ef35b11c5767b707da066
MD5 6579125415278def79b549116d961dce
BLAKE2b-256 980c7d729ced9d0eb2933ace384310c0b18e897d3eed519417dc8ca2964f6553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page