Skip to main content

Library for gene set enrichment analysis (ontology analysis) and visualisation

Project description

The library overview

The dgeontology library provides a simple way of: (1) performing GSEA (gene set enrichment analysis/ontology analysis) on DGE (differential gene expression) or similar results; (2) visualisation of integrated GSEA and DGE results in one highly informative and visually appealing circular chart. Sizes of the final circular chart slices correspond to the number of results linked to a given ontology label (results falling into a given ontology group/category). Subsequent and adjacent radial fragments of the slices are coloured according to each DGE result fold change value (red for up-regulation and blue for down-regulation). Since the fold change values are first sorted, each slice of the final circular chart becomes a heatmap of fold change scale across a given ontology label (group).

Library installation using pip

Installation of the dgeontology library with pip is quite straightforward:

pip install dgeontology

A quick example of the library usage

The input data used in this example can be found in the input subdirectory in the root directory of this repository. Once you have the dgeontology library installed, you can test its functionality using the aforementioned input data and the code below.

# Import Pandas library for the input data handling and
# the dgeont_plot() function from the dgeontology library
# for rendering DGE/GSEA pie charts.
import pandas as pd
from dgeontology import dgeont_plot

# Load DGE results for a complete population,
# i.e. even for those entities (genes, proteins)
# that could be present in the sample (bacterial transcriptome,
# proteome, etc.) but were not (e.g. non-transcribed genes).
# In this example two groups were compared:
# wt51e_lg - wild type, mt51e_lg - mutant.
# Importantly, the column that contains IDs of analysed
# entities is set to be the DataFrame index.
dge_df = pd.read_csv(
    'input/mt51e_wt51e_DGE.tsv',
    index_col = 'locus_tag',
    sep = '\t'
)

# Load metadata for all entities (genes, transcripts, proteins, etc.).
# Importantly, the column that contains IDs of analysed
# entities is set to be the DataFrame index.
meta_df = pd.read_csv(
    'input/rn.tsv',
    index_col = 'locus_tag',
    sep = '\t'
)

# Run the dgeont_plot() function providing all 5 required arguments
# and additionally modifying values of 3 optional ones in order to center
# the pie chart within the figure better.
fin_df, filt_df, ont_df, fig, ax, ax_bar = dgeont_plot(
    dge_df, meta_df, fold_col='log2FoldChange', pval_col='padj', onts_col='cog',
    fold_th=1.0, fdr_th=0.05, fig_h=2.7, xmin=-1.75, xmax=2.15
)

# Save the rendered pie chart to a PNG file with a non-transparent
# background and the resolution of 300 DPI.
fig.savefig('test_plot.png', transparent=False, dpi=300)

More examples in Jupyter notebooks

The above and more examples of dgeontology library usage are presented and described in details in dgeontology_basic_examples.ipynb and dgeontology_extra_examples.ipynb Jupyter notebooks, which are available in this repository.

The dgeont_plot() function in details

The dgeont_plot() function takes DGE or similar results and based on provided metadata (ontology labels) performs GSEA and renders a rich circular chart that depicts the results of the analysis. The function requires 7 obligatory arguments (2 positional and 5 keyword arguments). The default values of other 22 optional keyword arguments can be modified in order to fine-tune the final chart.

Required positional arguments:

  • dge_df – Pandas DataFrame containing DGE results. The DataFrame must be indexed with analysed entities IDs (e.g. transcript IDs).
  • meta_df – Pandas DataFrame linking entities IDs to ontology labels. The DataFrame must also be indexed with analysed entities IDs.

Required keyword arguments:

  • fold_col – the name of the column in dge_df that contains fold change values, a string value.
  • pval_col – the name of the column in dge_df that contains FDR values, a string value.
  • onts_col – the name of the column in meta_df that contains ontology labels, a string value.
  • fold_th – a minimal threshold value for fold_col (fold change) absolute values used for filtering the results in dge_df, a float value.
  • fdr_th – a maximal threshold value for pval_col (FDR) used for filtering the results in dge_df, a float value.

Optional keyword arguments that allow to use additional ontology data:

  • type_col – the name of the column in meta_df that describes the sequence type, a string value. The column is solely used in respect to ncRNA and tRNA values. Importantly, when type_col is not None, any other ontology labels, if provided in the remaining columns, are ignored for rows described as ncRNA and tRNA. Default value: None (do not use ncRNA and tRNA sequence types as ontology labels).
  • bont_col – the column name that contains additional ontology data, a string value. The values of the column are treated as binary (true or false, whether the values are empty/NA or any non-empty value) and assigned with bnt_label. If bont_col is not None, bont_label is merged with labels provided in onts_col. Default value: None (no binary ontology column is provided).
  • bont_label – If bont_col is not None, bont_label must be set to a string value that will be treated as an extra ontology label for any row that is non-empty with respect to bont_col. Default value: None (no binary ontology column is provided).

Optional keyword arguments that allow to modify the set of ontology labels being used:

  • sel_onts – ontology labels that are to be depicted in the final pie chart, a list of string values. Default value: None (depict all ontology labels ordered in a descending order with respect to the number of results linked to them).
  • skip_onts – ontology labels that are not to be depicted in the final pie chart, a list of string values. Default value: None (do not skip any ontology label).
  • min_size – a minimal count of results (rows from the filtered dge_df) assigned to an ontology label that are required for the label to be depicted in the final pie chart, an integer value. Default value: 0 (depict all ontology labels).

Optional keyword arguments that allow to modify the formatting of the final pie chart figure and axes:

  • fig_w – Matplotlib Figure width in inches, a float value. Default value: 10.0.
  • fig_h – Matplotlib Figure height in inches, a float value. Default value: 3.0.
  • dpi – Matplotlib Figure resolution in DPI (dots per inch), a float value. Default value: 150.0.
  • xmin – the lower limit value for the X axis, a float value. Default value: -2.5.
  • xmax – the upper limit value for the X axis, a float value. Default value: 2.5.

Optional keyword arguments that allow to modify the formatting of the final pie chart elements:

  • pie_r – the radius of the pie chart scaffold circle, a float value. Default value: 0.30.
  • scale – general scale factor, a float value. Change to increase or decrease the relative wedge radial sizes, especially if inner parts pass through the middle of the chart. Default value: 0.03.
  • angle_offset – the angle offset for placing wedges on the scaffold circle in degrees (0.0 - 360.0), a float value. By default the third wedge/slice starts at 12:00 o'clock, top center, which seems to be optimal for size-ordered slices. Default value: 0.0.
  • margin – margin between each wedge and the number of cases as well as that number and the terminal part of the connector that join a wedge and a label, a float value. Default value: 0.03.
  • label_at – the X coordinate at which ontology (wedge) labels are left-aligned on the right side of the pie chart, or -X at which ontology (wedge) labels are right-aligned on the left side of the pie chart, a float value. Change it to bring labels closer or move further from the pie chart. Default value: 0.70.
  • label_height – the vertical span a label is assumed to occupy, a float value. Change to increase or decrease the vertical spacing between adjacent labels, especially in case of overlapping labels. Default value: 0.08.
  • label_font – the font size for ontology labels, a float value. Default value: 8.5.
  • num_font – the font size for numbers of results, a float value Default value: 6.0.
  • scale_bar_label – A label that appears above the scale bar, a string value. Default value: 'Log$_{2}$ fold change',
  • sbar_font – The scale bar font size, a float value. Default value: 8.0.
  • max_fold – the maximum fold change value for the fold scale, a float value. If None, it is set automatically to the highest absolute fold value. Set it manually if you want to generate charts that depict results in a fixed scale. Default value: None (automatic scale).

The function returns a tuple of six elements:

  • fin_dfdge_df merged with meta_df on the index column, a Pandas DataFrame.
  • filt_dffin_df filtered with respect to fold_th and fdr_th that is used for GSEA, a Pandas DataFrame.
  • ont_df – GSEA results for all ontology labels, a Pandas DataFrame.
  • fig – Matplotlib Figure with the final pie chart.
  • ax – Matplotlib Axes with the final pie chart.
  • ax_bar – Matplitlib Axes with the scale bar.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgeontology-1.0.0.tar.gz (28.7 kB view details)

Uploaded Source

File details

Details for the file dgeontology-1.0.0.tar.gz.

File metadata

  • Download URL: dgeontology-1.0.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.4

File hashes

Hashes for dgeontology-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4628d3db7298a238bc5887a07644da7d80329a2ca89c2c98d54d4e7c35faedfb
MD5 c10eab792d6e61e03e25af5d09590a8a
BLAKE2b-256 3f5b20a8b076f3e7f7f497df514e5e334575bf399808ec30712a86114c9bf421

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page