Skip to main content

Implementation of GREAT in Python

Project description

greatpy

Tests Documentation

Implementation of GREAT in Python

Installation

You require Python 3.8 or newer installed on your system. In case you do not have Python installed, we recommend installing Miniconda <https://docs.conda.io/en/latest/miniconda.html>_.

Options to install greatpy:

  1. Install the latest release of greatpy from PyPI <https://pypi.org/project/greatpy/>:
 pip install greatpy
  1. Install the latest development version:
 pip install git+https://github.com/theislab/greatpy.git@main

Notebooks

Information link
Create regulatory domains file (regdom) notebook
Enrichment test (binomial/hypergeometric) notebook
Plotting of results notebook
Comparisons with GREAT notebook

Getting started

Please refer to

What is greatpy:

greatpy is a bioinformatics method that associates custom genomic regions to Gene Ontology (GO) terms by weighting genomic neighborhoods. It is based on and inspired by and inspired by GREAT (Genomic Regions Enrichment of Annotations Tool).

GREAT figure issue from GREAT article

Usage:

1. Create regulatory domain from tss

  • Translate tab-separated files (.tsv or .bed format) containing the following information:
    1. Transcription start site annotations:chromosome_number \t position \t strand \t gene_name.
    2. Chromosome sizes file should have the following columns :chromosome_number \t chromosome_size.

See data for input files

regdom = greatpy.tl.create_regdom(
    tss_file=Input_TSS_path,  # eg : "../data/human/hg38/tss.bed"
    chr_sizes_file=Input_chromosome_size_path,  # eg : "../data/human/hg38/chr_size.bed"
    association_rule="Basalplusextention",
    out_path=path_save_output,
)

Allowed association rules are:

  • Basalplusextention
  • OneCloset
  • TwoCloset

2. Get enrichment of GO term in the tests genomics regions

  • This step calculates the significance of a custom set of genomic annotations through peak-gene mapping, using distal cis-regulatory regions of the genome.
  • Input files :
  • test file should have the following columns :chr \t chr_start \t chr_end.
  • regulatory domain file should have the following columns :chr \t chr_start \t chr_end \t name \t tss strand
  • chromosome size file should have the following columns :chromosome_number \t chromosome_size.
  • annotation file should have the following columns :ensembl \t id \t name \t ontology.group \t gene.name \t symbol

See test cases for genomic input files.

res = greatpy.tl.enrichment(
    test_file=Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_file=regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    chr_size_file=chromosome_size_path_or_df,  # eg : "../data/human/hg38/chr_size.bed"
    annotation_file=annotation_path_or_df,  # eg : "../data/human/ontologies.csv"
)

Allowed tests for this function such as :

  • binom (default True): it calculates the binomial p-value.
  • hypergeom (default True): it calculates the hypergeometric p-value.

Additionally, it is also possible to apply a Bonferroni and/or FDR correction to the found p-values:

res = great.tl.set_fdr(res, alpha=0.05)
res = great.tl.set_bonferroni(res, alpha=0.05)

3. Plot

1 Distribution of custom genomic annotations in regulatory domains
  • Number of genetic associations per genomic region.
  • Distance to the associated gene TSS for each genomic region studied.
  • Absolute distance to the associated gene TSS for each genomic region studied.
fig, ax = plt.subplots(1, 3, figsize=(30, 8))
greatpy.pl.graph_nb_asso_per_peaks(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
greatpy.pl.graph_dist_tss(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
greatpy.pl.graph_absolute_dist_tss(
    Input_path_or_df,  # eg : "../data/tests/test_data/input/10_MAX.bed"
    regdom_path_or_df,  # eg : "../data/human/hg38/regdom.bed"
    ax[0],
)
plt.show()

2 Enrichments by GO terms (dotplot) - one input
plot = enrichment_df.rename(columns={"binom_p_value": "p_value", "go_term": "name"})
plt.figure(figsize=(10, 10))
great.pl.plot_enrich(plot)

3 Enrichments by GO terms (dotplot) - multiple inputs
test = ["name_bindome_biosample_1", "name_bindome_biosample_2", "..."]
tmp_df = great.tl.enrichment_multiple(
    tests=test,
    regdom_file="../data/human/hg38/regulatory_domain.bed",
    chr_size_file="../data/human/hg38/chr_size.bed",
    annotation_file="../data/human/ontologies.csv",
    binom=True,
    hypergeom=True,
)

dotplot of multi sample

Notes

Both binomial and hypergeometric tests may be susceptible to biases of which one must be aware to analyze the results critically. The binomial test reduces the hypergeometric bias by taking into account exactly the size of the regulatory domains of the genes, whereas the hypergeometric test compensates for the bias of the binomial test by counting each gene only once. The two types of tests are complementary and are recommended to be analyzed together.

Release notes

See the changelog.

Contact

For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.

Citation

If greatpy is useful for your research, please consider to cite as:

@software{greatpy,
author = {Ibarra, Mauger-Birocheau},
doi = {},
month = {},
title = {{greatpy}},
url = {https://github.com/theislab/greatpy},
year = {2022}
}

References

@article{GREAT,
author   = {McLean, C.
            and Bristor, D.
            and Hiller, M. et al.},
title    = {GREAT improves functional interpretation of cis-regulatory regions},
journal  = {Nat Biotechnol},
year     = {2010},
month    = {May},
day      = {02},
volume   = {28},
number   = {495},
pages    = {501},
doi      = {10.1038/nbt.1630},
url      = {https://doi.org/10.1038/nbt.1630}
}
@Manual{rGREAT,
title = {rGREAT: GREAT Analysis - Functional Enrichment on Genomic Regions},
author = {Zuguang Gu},
year = {2022},
note = {https://github.com/jokergoo/rGREAT, http://great.stanford.edu/public/html/},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

greatpy-0.0.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

greatpy-0.0.1-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file greatpy-0.0.1.tar.gz.

File metadata

  • Download URL: greatpy-0.0.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for greatpy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cfbeb49ee6662bec2e1e1bbe7ec881326016066f494712640ff71c228816956a
MD5 1479eae8110749eda771126b2038fbdb
BLAKE2b-256 abf9ec3c6d1eab0a209b4b2628603efc8db8e12aaa180dc12194f82ff9db3ad7

See more details on using hashes here.

File details

Details for the file greatpy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: greatpy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for greatpy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a33f3e9b36f442daf8768a6cbe583797783f910d0b69136d32fd1bd4870b4c3
MD5 31849650223eb9855ca32882a0bb6b75
BLAKE2b-256 73e3a30d4d2fcee3cdf16cd52824077bf036fd62540741ae31287095f00869cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page