API-enabled Gene Annotation

These details have not been verified by PyPI

Project links

Homepage

Project description

annoPipeline - an API-enabled gene annotation pipeline

annoPipeline uses APIs from mygene.info and Entrez esummary to annotate a user-provided list of gene symbols.

Generates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at GeneRIF).

Designed to be useful for tasks such as:

identifying relevant publications for a given function
analyzing publications trends for genes belonging to a common pathway
identifying influential PIs for a given gene network.

Reqirements:

Written for use with Python 3.7, not tested on other versions.
annoPipeline requires:
- numpy >= 1.16.2
- pandas >= 0.24.2
- Biopython >= 1.73
- openpyxl >= 2.6.1
- requests >= 2.21.0

To Install:

pip install annoPipeline

Or clone the repo from github. Then, in the annoPipeline directory, run:

python setup.py install

Required dependencies will be installed if missing, may take a few seconds.

Example usage:

Execute the full annotation pipeline on a list of gene symbols like this:

import annoPipeline as ap

# define a list of genes you would like annotated
geneList = ['CDK2', 'FGFR1', 'SLC6A4']

# annoPipeline will execute full annotation pipeline (see individual functions below). 
df = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.

ap.annoPipeline will default save annotation output to Excel file named by geneList symbols separated by '_'.

Warning!

If querying a single gene, still pass as a list. For example:

import annoPipeline as ap

df = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version

v0.0.1 Functionality

Task 1:

From the MyGeneInfo API, use the “Gene query service" GET method to return details on a given list of human gene symbols.
From the returned json, parse out the “name", “symbol" and “entrezgene" values and print to screen

Use queryGenes():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']

l1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)

Task 2:

Using the appropriate identifier from the above result, send a query to the MyGeneInfo “Gene annotation services" method for each gene
From the resulting json, collate up to 5 generif descriptions per gene
Write the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs

Use getAnno():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False

returns pandas df with genes and up to 5 generifs from mygene.info.
default saveExcel=False, to save output to Excel must state True
if True, Excel file will be named by geneList symbols separated by '_'.

Task 3:

Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.

Use addBibs():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1)
l3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info

Currently returns the following bibliographic information when available:
- PubDate
- Source
- Title
- LastAuthor
- DOI
- PmcRefCount

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.1

Apr 16, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annoPipeline-0.0.1.tar.gz (5.0 kB view details)

Uploaded Apr 16, 2019 Source

Built Distribution

annoPipeline-0.0.1-py3-none-any.whl (6.9 kB view details)

Uploaded Apr 16, 2019 Python 3

File details

Details for the file annoPipeline-0.0.1.tar.gz.

File metadata

Download URL: annoPipeline-0.0.1.tar.gz
Upload date: Apr 16, 2019
Size: 5.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for annoPipeline-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ed4d17e48eadb908f76c06839ab27670250e11686d1602b4820882cacc58ea00`
MD5	`93c9e719d5691b3bfa5c47dfe023f2c2`
BLAKE2b-256	`fdab578fb6f6863f06729d45184ea985778d9848a503518f9c2356773aa91002`

See more details on using hashes here.

File details

Details for the file annoPipeline-0.0.1-py3-none-any.whl.

File metadata

Download URL: annoPipeline-0.0.1-py3-none-any.whl
Upload date: Apr 16, 2019
Size: 6.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for annoPipeline-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a60b89c80a1c1144c0587e1f3f1766ce8afaf432229f2fe30a276610c18516b1`
MD5	`636d79ba9b92281e959916825e7d4f4f`
BLAKE2b-256	`135a6045cf8ff00ba2299465ce3f171cddf70552025a2afef020c768b8ac6784`

See more details on using hashes here.

annoPipeline 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

annoPipeline - an API-enabled gene annotation pipeline

Reqirements:

To Install:

Example usage:

Warning!

v0.0.1 Functionality

Task 1:

Task 2:

Task 3:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes