API-enabled Gene Annotation
Project description
annoPipeline - an API-enabled gene annotation pipeline
annoPipeline uses APIs from mygene.info and Entrez esummary to annotate a user-provided list of gene symbols.
Generates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at GeneRIF).
Designed to be useful for tasks such as:
- identifying relevant publications for a given function
- analyzing publications trends for genes belonging to a common pathway
- identifying influential PIs for a given gene network.
Reqirements:
-
Written for use with Python 3.7, not tested on other versions.
-
annoPipeline requires:
- numpy >= 1.16.2
- pandas >= 0.24.2
- Biopython >= 1.73
- openpyxl >= 2.6.1
- requests >= 2.21.0
To Install:
pip install annoPipeline
Or clone the repo from github. Then, in the annoPipeline directory, run:
python setup.py install
Required dependencies will be installed if missing, may take a few seconds.
Example usage:
Execute the full annotation pipeline on a list of gene symbols like this:
import annoPipeline as ap
# define a list of genes you would like annotated
geneList = ['CDK2', 'FGFR1', 'SLC6A4']
# annoPipeline will execute full annotation pipeline (see individual functions below).
df = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.
- ap.annoPipeline will default save annotation output to Excel file named by geneList symbols separated by '_'.
Warning!
If querying a single gene, still pass as a list. For example:
import annoPipeline as ap
df = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version
v0.0.1 Functionality
Task 1:
- From the MyGeneInfo API, use the “Gene query service" GET method to return details on a given list of human gene symbols.
- From the returned json, parse out the “name", “symbol" and “entrezgene" values and print to screen
Use queryGenes():
import annoPipeline as ap
geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)
Task 2:
- Using the appropriate identifier from the above result, send a query to the MyGeneInfo “Gene annotation services" method for each gene
- From the resulting json, collate up to 5 generif descriptions per gene
- Write the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs
Use getAnno():
import annoPipeline as ap
geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False
- returns pandas df with genes and up to 5 generifs from mygene.info.
- default saveExcel=False, to save output to Excel must state True
- if True, Excel file will be named by geneList symbols separated by '_'.
Task 3:
- Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.
Use addBibs():
import annoPipeline as ap
geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1)
l3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info
- Currently returns the following bibliographic information when available:
- PubDate
- Source
- Title
- LastAuthor
- DOI
- PmcRefCount
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file annoPipeline-0.0.1.tar.gz
.
File metadata
- Download URL: annoPipeline-0.0.1.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed4d17e48eadb908f76c06839ab27670250e11686d1602b4820882cacc58ea00 |
|
MD5 | 93c9e719d5691b3bfa5c47dfe023f2c2 |
|
BLAKE2b-256 | fdab578fb6f6863f06729d45184ea985778d9848a503518f9c2356773aa91002 |
File details
Details for the file annoPipeline-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: annoPipeline-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a60b89c80a1c1144c0587e1f3f1766ce8afaf432229f2fe30a276610c18516b1 |
|
MD5 | 636d79ba9b92281e959916825e7d4f4f |
|
BLAKE2b-256 | 135a6045cf8ff00ba2299465ce3f171cddf70552025a2afef020c768b8ac6784 |