Skip to main content

API-enabled Gene Annotation

Project description

annoPipeline - an API-enabled gene annotation pipeline

annoPipeline uses APIs from mygene.info and Entrez esummary to annotate a user-provided list of gene symbols.

Generates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at GeneRIF).

Designed to be useful for tasks such as:

  • identifying relevant publications for a given function
  • analyzing publications trends for genes belonging to a common pathway
  • identifying influential PIs for a given gene network.

Reqirements:

  • Written for use with Python 3.7, not tested on other versions.

  • annoPipeline requires:

    • numpy >= 1.16.2
    • pandas >= 0.24.2
    • Biopython >= 1.73
    • openpyxl >= 2.6.1
    • requests >= 2.21.0

To Install:

pip install annoPipeline

Or clone the repo from github. Then, in the annoPipeline directory, run:

python setup.py install

Required dependencies will be installed if missing, may take a few seconds.

Example usage:

Execute the full annotation pipeline on a list of gene symbols like this:

import annoPipeline as ap

# define a list of genes you would like annotated
geneList = ['CDK2', 'FGFR1', 'SLC6A4']

# annoPipeline will execute full annotation pipeline (see individual functions below). 
df = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.
  • ap.annoPipeline will default save annotation output to Excel file named by geneList symbols separated by '_'.

Warning!

If querying a single gene, still pass as a list. For example:

import annoPipeline as ap

df = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version

v0.0.1 Functionality

Task 1:

  1. From the MyGeneInfo API, use the “Gene query service" GET method to return details on a given list of human gene symbols.
  2. From the returned json, parse out the “name", “symbol" and “entrezgene" values and print to screen

Use queryGenes():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']

l1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)

Task 2:

  1. Using the appropriate identifier from the above result, send a query to the MyGeneInfo “Gene annotation services" method for each gene
  2. From the resulting json, collate up to 5 generif descriptions per gene
  3. Write the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs

Use getAnno():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False
  • returns pandas df with genes and up to 5 generifs from mygene.info.
  • default saveExcel=False, to save output to Excel must state True
  • if True, Excel file will be named by geneList symbols separated by '_'.

Task 3:

  1. Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.

Use addBibs():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1)
l3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info
  • Currently returns the following bibliographic information when available:
    • PubDate
    • Source
    • Title
    • LastAuthor
    • DOI
    • PmcRefCount

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annoPipeline-0.0.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

annoPipeline-0.0.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file annoPipeline-0.0.1.tar.gz.

File metadata

  • Download URL: annoPipeline-0.0.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for annoPipeline-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ed4d17e48eadb908f76c06839ab27670250e11686d1602b4820882cacc58ea00
MD5 93c9e719d5691b3bfa5c47dfe023f2c2
BLAKE2b-256 fdab578fb6f6863f06729d45184ea985778d9848a503518f9c2356773aa91002

See more details on using hashes here.

File details

Details for the file annoPipeline-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: annoPipeline-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for annoPipeline-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a60b89c80a1c1144c0587e1f3f1766ce8afaf432229f2fe30a276610c18516b1
MD5 636d79ba9b92281e959916825e7d4f4f
BLAKE2b-256 135a6045cf8ff00ba2299465ce3f171cddf70552025a2afef020c768b8ac6784

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page