Skip to main content

Python Client for MyGene.Info services.

Project description

https://pepy.tech/badge/mygene https://img.shields.io/pypi/dm/mygene.svg https://badge.fury.io/py/mygene.svg https://img.shields.io/pypi/pyversions/mygene.svg https://img.shields.io/pypi/format/mygene.svg https://img.shields.io/pypi/status/mygene.svg

Intro

MyGene.Info provides simple-to-use REST web services to query/retrieve gene annotation data. It’s designed with simplicity and performance emphasized. mygene, is an easy-to-use Python wrapper to access MyGene.Info services.

Since v3.1.0, mygene Python package has become a thin wrapper of underlying biothings_client package, a universal Python client for all BioThings APIs, including MyGene.info. The installation of mygene will install biothings_client automatically. The following code snippets are essentially equivalent:

  • Continue using mygene package

    In [1]: import mygene
    In [2]: mg = mygene.MyGeneInfo()
  • Use biothings_client package directly

    In [1]: from biothings_client import get_client
    In [2]: mg = get_client('gene')

After that, the use of mg instance is exactly the same, e.g. the usage examples below.

Requirements

python >=2.7 (including python3)

(Python 2.6 might still work, but it’s not supported any more since v3.1.0.)

biothings_client (>=0.2.0, install using “pip install biothings_client”)

Optional dependencies

pandas (install using “pip install pandas”) is required for returning a list of gene objects as DataFrame.

Installation

Option 1

pip install mygene

Option 2

download/extract the source code and run:

python setup.py install
Option 3

install the latest code directly from the repository:

pip install -e git+https://github.com/biothings/mygene.py#egg=mygene

Version history

CHANGES.txt

Tutorial

Documentation

http://mygene-py.readthedocs.org/

Usage

In [1]: import mygene

In [2]: mg = mygene.MyGeneInfo()

In [3]: mg.getgene(1017)
Out[3]:
{'_id': '1017',
 'entrezgene': 1017,
 'name': 'cyclin-dependent kinase 2',
 'symbol': 'CDK2',
 'taxid': 9606,
 ...
}

# use "fields" parameter to return a subset of fields
In [4]: mg.getgene(1017, fields='name,symbol,refseq')
Out[4]:
{'_id': '1017',
 'name': 'cyclin-dependent kinase 2',
 'refseq': {'genomic': ['AC_000144.1',
   'NC_000012.11',
   'NG_028086.1',
   'NT_029419.12',
   'NW_001838059.1'],
  'protein': ['NP_001789.2', 'NP_439892.2'],
  'rna': ['NM_001798.3', 'NM_052827.2']},
 'symbol': 'CDK2'}

In [5]: mg.getgene(1017, fields=['name', 'symbol', 'refseq.rna'])
Out[5]:
{'_id': '1017',
 'name': 'cyclin-dependent kinase 2',
 'refseq': {'rna': ['NM_001798.5', 'NM_052827.3']},
 'symbol': 'CDK2'}


In [6]: mg.getgenes([1017,1018,'ENSG00000148795'], fields='name,symbol,entrezgene,taxid')
Out[6]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '1018',
  'entrezgene': 1018,
  'name': 'cyclin-dependent kinase 3',
  'query': '1018',
  'symbol': 'CDK3',
  'taxid': 9606},
 {'_id': '1586',
  'entrezgene': 1586,
  'name': 'cytochrome P450, family 17, subfamily A, polypeptide 1',
  'query': 'ENSG00000148795',
  'symbol': 'CYP17A1',
  'taxid': 9606}]

# return results in Pandas DataFrame
In [7]: mg.getgenes([1017,1018,'ENSG00000148795'], fields='name,symbol,entrezgene,taxid', as_dataframe=True)
Out[7]:
                  _id  entrezgene  \
query
1017             1017        1017
1018             1018        1018
ENSG00000148795  1586        1586

                                                              name   symbol  \
query
1017                                     cyclin-dependent kinase 2     CDK2
1018                                     cyclin-dependent kinase 3     CDK3
ENSG00000148795  cytochrome P450, family 17, subfamily A, polyp...  CYP17A1

                 taxid
query
1017              9606
1018              9606
ENSG00000148795   9606

[3 rows x 5 columns]

In [8]:  mg.query('cdk2', size=5)
Out[8]:
{'hits': [{'_id': '1017',
   '_score': 373.24667,
   'entrezgene': 1017,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'CDK2',
   'taxid': 9606},
  {'_id': '12566',
   '_score': 353.90176,
   'entrezgene': 12566,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'Cdk2',
   'taxid': 10090},
  {'_id': '362817',
   '_score': 264.88477,
   'entrezgene': 362817,
   'name': 'cyclin dependent kinase 2',
   'symbol': 'Cdk2',
   'taxid': 10116},
  {'_id': '52004',
   '_score': 21.221401,
   'entrezgene': 52004,
   'name': 'CDK2-associated protein 2',
   'symbol': 'Cdk2ap2',
   'taxid': 10090},
  {'_id': '143384',
   '_score': 18.617256,
   'entrezgene': 143384,
   'name': 'CDK2-associated, cullin domain 1',
   'symbol': 'CACUL1',
   'taxid': 9606}],
 'max_score': 373.24667,
 'took': 10,
 'total': 28}

In [9]: mg.query('reporter:1000_at')
Out[9]:
{'hits': [{'_id': '5595',
   '_score': 11.163337,
   'entrezgene': 5595,
   'name': 'mitogen-activated protein kinase 3',
   'symbol': 'MAPK3',
   'taxid': 9606}],
 'max_score': 11.163337,
 'took': 6,
 'total': 1}

In [10]: mg.query('symbol:cdk2', species='human')
Out[10]:
{'hits': [{'_id': '1017',
   '_score': 84.17707,
   'entrezgene': 1017,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'CDK2',
   'taxid': 9606}],
 'max_score': 84.17707,
 'took': 27,
 'total': 1}

In [11]: mg.querymany([1017, '695'], scopes='entrezgene', species='human')
Finished.
Out[11]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606}]

In [12]: mg.querymany([1017, '695'], scopes='entrezgene', species=9606)
Finished.
Out[12]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606}]

In [13]: mg.querymany([1017, '695'], scopes='entrezgene', species=9606, as_dataframe=True)
Finished.
Out[13]:
        _id  entrezgene                                       name symbol  \
query
1017   1017        1017                  cyclin-dependent kinase 2   CDK2
695     695         695  Bruton agammaglobulinemia tyrosine kinase    BTK

       taxid
query
1017    9606
695     9606

[2 rows x 5 columns]

In [14]: mg.querymany([1017, '695', 'NA_TEST'], scopes='entrezgene', species='human')
Finished.
Out[14]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606},
 {'notfound': True, 'query': 'NA_TEST'}]

# query all human kinases using fetch_all parameter:
In [15]: kinases = mg.query('name:kinase', species='human', fetch_all=True)
In [16]: kinases
Out [16]" <generator object _fetch_all at 0x7fec027d2eb0>

# kinases is a Python generator, now you can loop through it to get all 1073 hits:
In [16]: for gene in kinases:
   ....:     print gene['_id'], gene['symbol']
Out [16]: <output omitted here>

Contact

Drop us any question or feedback:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mygene-3.2.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

mygene-3.2.1-py2.py3-none-any.whl (5.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mygene-3.2.1.tar.gz.

File metadata

  • Download URL: mygene-3.2.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for mygene-3.2.1.tar.gz
Algorithm Hash digest
SHA256 72fc6f7b316fd7ef9716a2c72fc6ea812b6c22396004f67150f15f068cc78d9f
MD5 382613cf435045fcaabb90828dd48153
BLAKE2b-256 0c7d690e82d8a1218bf7fe5fd0a15c857e302f6d8daf880f60d4f050bf512ed8

See more details on using hashes here.

File details

Details for the file mygene-3.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: mygene-3.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for mygene-3.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3e2f067e631433e829626ef6cee9754d7fc5a591cb6a5c6023f40fa22fbe5927
MD5 4cdfacf766056890ec78e0a8ac2f0f02
BLAKE2b-256 1c4eefae7b0e2b27c725921acda52e3f3ee7b0c9bd8849b2bb1ae02554cc6a62

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page