goenrichment

GO enrichment analysis from a list of gene names using a precomputed database

These details have not been verified by PyPI

Project links

Project description

GO Enrichment package

This package execute GO enrichment analysis froma list of gene names using a precomputed database. The GO terms are analyze using a hypergeometric test.

GO enrichment database

The GO graph structure is created from the Gene Ontology OBO file http://current.geneontology.org/ontology/go.obo

NCBI gene

The NCBI gene database is used to include genes to the GO terms graph. The required files are: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz and ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz

These files can be filter for a specific taxonomy id. This example is for human: 9606

gunzip -c gene_info.gz | grep -P "^9606\t" > gene_info_${taxid}
gzip gene_info_${taxid}
gunzip -c gene2go.gz | grep -P "^9606\t" > gene2go_${taxid}
gzip gene2go_${taxid}

Uniprot GOA

The Uniprot GOA files can be also used to add more genes to the GO graph. The complete file is: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz

Uniprot GOA also include some pre-filtered organism: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

TSV file: gene<tab>GO term

Any TSV file with the relationship between gene names and GO term can also be included into the database. The file just need to include in the first column the gene name and in the second column the GO term. Any other extra column will be ignored.

Ensembl BioMart

The Ensembl data can be alos include using their BioMart tool. Go to the Ensembl Biomart website: http://useast.ensembl.org/biomart/. Using this tool a TSV file can be generated with gene names in the first column and GO term in the second column.

Database creation

This example is for human. Please, note all input files should be gzipped.

goenrich_createdb --gene_info gene_info.gz --gene2go gene2go.gz --goa_uniprot goa_uniprot_all.gaf.gz --gobo go.obo --taxid 9606 --goenrichDB goenrichDB_20190419.pickle

Usage

usage: goenrich_createdb [-h] [--gene_info GENE_INFO] [--gene2go GENE2GO]
                     [--tsv TSV] [--goenrichDB GOENRICHDB]
                     [--goa_uniprot GOA_UNIPROT] [--gobo GOBO] [--taxid TAXID]
                     -o O

Creates pickle data structure used by "goenrich.py"

optional arguments:
    -h, --help            show this help message and exit
    --gene_info GENE_INFO
                        NCBI gene_info file
    --gene2go GENE2GO     NCBI gene2go file
    --tsv TSV             TSV file with at least two columns: Gene_name<tab>GO
                        terms
    --goenrichDB GOENRICHDB
                        Previous created goenrich pickle file. The new genes
                        will be added to this database
    --goa_uniprot GOA_UNIPROT
                        Uniprot GOA file GAF format
    --gobo GOBO           UGO Obo file from Gene Ontology
    --taxid TAXID         Process genes for tax id if it is possible
    -o O                  Pickle output file name

Pre-computed databases

We offer some pre-computed database https://ftp.ncbi.nlm.nih.gov/pub/goenrichment/

Go enrichment analysis

The analysis is executed using the script goenrich.py. The input file is a text file with one gene name per line.

goenrich --goenrichDB gene2GO_human.pickle -i query.tsv -o goenrich.tsv

The gene2GO_human.pickle can be downloaded from https://ftp.ncbi.nlm.nih.gov/pub/goenrichment/goenrichDB_human.pickle

usage: goenrich [-h] -i I -o O [--goenrichDB GOENRICHDB]
                   [--min_category_depth MIN_CATEGORY_DEPTH]
                   [--min_category_size MIN_CATEGORY_SIZE]
                   [--max_category_size MAX_CATEGORY_SIZE] [--alpha ALPHA]

Calculate GO enrichment from a list of genes. Default database organism: human

optional arguments:
    -h, --help            show this help message and exit
    -i I                  Input list of gene names
    -o O                  TSV file with all results
    --goenrichDB GOENRICHDB
                        Gene2GO pickle file created with "goenrichDB.py". If
                        not provided the database is loaded from:
    --min_category_depth MIN_CATEGORY_DEPTH
                        Min GO term graph depth to include in the report.
                        Default: 4
    --min_category_size MIN_CATEGORY_SIZE
                        Min number of gene in a GO term to include in the
                        report. Default: 3
    --max_category_size MAX_CATEGORY_SIZE
                        Max number of gene in a GO term to include in the
                        report. Default: 500
    --alpha ALPHA         Alpha value for p-value correction. Default: 0.05

Requirements

Python 3.8
- numpy
- scipy
- statsmodels
- pandas
- networkx

Public Domain notice

National Center for Biotechnology Information.

This software is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the authors' official duties as United States Government employees and thus cannot be copyrighted. This software is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction.

Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose.

Please cite NCBI in any work or product based on this material.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.3

Nov 1, 2021

1.0.2

May 22, 2021

1.0.1

Mar 29, 2021

0.0.10a3 pre-release

Aug 27, 2019

0.0.9a3 pre-release

Jul 3, 2019

0.0.8a3 pre-release

Jun 6, 2019

0.0.5a3 pre-release

Apr 22, 2019

0.0.4a3 pre-release

Apr 19, 2019

0.0.3a3 pre-release

Apr 19, 2019

0.0.2a3 pre-release

Apr 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goenrichment-1.0.3.tar.gz (10.6 kB view details)

Uploaded Nov 1, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

goenrichment-1.0.3-py3-none-any.whl (13.6 kB view details)

Uploaded Nov 1, 2021 Python 3

File details

Details for the file goenrichment-1.0.3.tar.gz.

File metadata

Download URL: goenrichment-1.0.3.tar.gz
Upload date: Nov 1, 2021
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.2.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for goenrichment-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`7de141c20914883b6f7716ef5c5c2e445b80175463ad074c8280207259165c9b`
MD5	`f30da2a0735593863d120a59852cfe03`
BLAKE2b-256	`61b682ed10ce4a571bca8a99cf603178571deb60cdd4639cd7e2b4e70aae024a`

See more details on using hashes here.

File details

Details for the file goenrichment-1.0.3-py3-none-any.whl.

File metadata

Download URL: goenrichment-1.0.3-py3-none-any.whl
Upload date: Nov 1, 2021
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.2.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for goenrichment-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90c097226f986f81861dd0c7a7eb34c7c03a1215fc7217458703c55bcede9876`
MD5	`839cf5b93afea95fe0475a020385fa5b`
BLAKE2b-256	`abaef72c3023e779a9cddcd2fb6267d729913bbac095d29eeb2049aa65310a9b`

See more details on using hashes here.

goenrichment 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GO Enrichment package

GO enrichment database

NCBI gene

Uniprot GOA

TSV file: gene<tab>GO term

Ensembl BioMart

Database creation

Usage

Pre-computed databases

Go enrichment analysis

Requirements

Public Domain notice

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes