Skip to main content

Python Client for MyGene.Info services.

Project description

https://pepy.tech/badge/mygene https://img.shields.io/pypi/dm/mygene.svg https://badge.fury.io/py/mygene.svg https://img.shields.io/pypi/pyversions/mygene.svg https://img.shields.io/pypi/format/mygene.svg https://img.shields.io/pypi/status/mygene.svg

Intro

MyGene.Info provides simple-to-use REST web services to query/retrieve gene annotation data. It’s designed with simplicity and performance emphasized. mygene, is an easy-to-use Python wrapper to access MyGene.Info services.

Since v3.1.0, mygene Python package has become a thin wrapper of underlying biothings_client package, a universal Python client for all BioThings APIs, including MyGene.info. The installation of mygene will install biothings_client automatically. The following code snippets are essentially equivalent:

  • Continue using mygene package

    In [1]: import mygene
    In [2]: mg = mygene.MyGeneInfo()
  • Use biothings_client package directly

    In [1]: from biothings_client import get_client
    In [2]: mg = get_client('gene')

After that, the use of mg instance is exactly the same, e.g. the usage examples below.

Requirements

python >=2.7 (including python3)

(Python 2.6 might still work, but it’s not supported any more since v3.1.0.)

biothings_client (>=0.2.0, install using “pip install biothings_client”)

Optional dependencies

pandas (install using “pip install pandas”) is required for returning a list of gene objects as DataFrame.

Installation

Option 1

pip install mygene

Option 2

download/extract the source code and run:

python setup.py install
Option 3

install the latest code directly from the repository:

pip install -e git+https://github.com/biothings/mygene.py#egg=mygene

Version history

CHANGES.txt

Tutorial

Documentation

http://mygene-py.readthedocs.org/

Usage

In [1]: import mygene

In [2]: mg = mygene.MyGeneInfo()

In [3]: mg.getgene(1017)
Out[3]:
{'_id': '1017',
 'entrezgene': 1017,
 'name': 'cyclin-dependent kinase 2',
 'symbol': 'CDK2',
 'taxid': 9606,
 ...
}

# use "fields" parameter to return a subset of fields
In [4]: mg.getgene(1017, fields='name,symbol,refseq')
Out[4]:
{'_id': '1017',
 'name': 'cyclin-dependent kinase 2',
 'refseq': {'genomic': ['AC_000144.1',
   'NC_000012.11',
   'NG_028086.1',
   'NT_029419.12',
   'NW_001838059.1'],
  'protein': ['NP_001789.2', 'NP_439892.2'],
  'rna': ['NM_001798.3', 'NM_052827.2']},
 'symbol': 'CDK2'}

In [5]: mg.getgene(1017, fields=['name', 'symbol', 'refseq.rna'])
Out[5]:
{'_id': '1017',
 'name': 'cyclin-dependent kinase 2',
 'refseq': {'rna': ['NM_001798.5', 'NM_052827.3']},
 'symbol': 'CDK2'}


In [6]: mg.getgenes([1017,1018,'ENSG00000148795'], fields='name,symbol,entrezgene,taxid')
Out[6]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '1018',
  'entrezgene': 1018,
  'name': 'cyclin-dependent kinase 3',
  'query': '1018',
  'symbol': 'CDK3',
  'taxid': 9606},
 {'_id': '1586',
  'entrezgene': 1586,
  'name': 'cytochrome P450, family 17, subfamily A, polypeptide 1',
  'query': 'ENSG00000148795',
  'symbol': 'CYP17A1',
  'taxid': 9606}]

# return results in Pandas DataFrame
In [7]: mg.getgenes([1017,1018,'ENSG00000148795'], fields='name,symbol,entrezgene,taxid', as_dataframe=True)
Out[7]:
                  _id  entrezgene  \
query
1017             1017        1017
1018             1018        1018
ENSG00000148795  1586        1586

                                                              name   symbol  \
query
1017                                     cyclin-dependent kinase 2     CDK2
1018                                     cyclin-dependent kinase 3     CDK3
ENSG00000148795  cytochrome P450, family 17, subfamily A, polyp...  CYP17A1

                 taxid
query
1017              9606
1018              9606
ENSG00000148795   9606

[3 rows x 5 columns]

In [8]:  mg.query('cdk2', size=5)
Out[8]:
{'hits': [{'_id': '1017',
   '_score': 373.24667,
   'entrezgene': 1017,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'CDK2',
   'taxid': 9606},
  {'_id': '12566',
   '_score': 353.90176,
   'entrezgene': 12566,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'Cdk2',
   'taxid': 10090},
  {'_id': '362817',
   '_score': 264.88477,
   'entrezgene': 362817,
   'name': 'cyclin dependent kinase 2',
   'symbol': 'Cdk2',
   'taxid': 10116},
  {'_id': '52004',
   '_score': 21.221401,
   'entrezgene': 52004,
   'name': 'CDK2-associated protein 2',
   'symbol': 'Cdk2ap2',
   'taxid': 10090},
  {'_id': '143384',
   '_score': 18.617256,
   'entrezgene': 143384,
   'name': 'CDK2-associated, cullin domain 1',
   'symbol': 'CACUL1',
   'taxid': 9606}],
 'max_score': 373.24667,
 'took': 10,
 'total': 28}

In [9]: mg.query('reporter:1000_at')
Out[9]:
{'hits': [{'_id': '5595',
   '_score': 11.163337,
   'entrezgene': 5595,
   'name': 'mitogen-activated protein kinase 3',
   'symbol': 'MAPK3',
   'taxid': 9606}],
 'max_score': 11.163337,
 'took': 6,
 'total': 1}

In [10]: mg.query('symbol:cdk2', species='human')
Out[10]:
{'hits': [{'_id': '1017',
   '_score': 84.17707,
   'entrezgene': 1017,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'CDK2',
   'taxid': 9606}],
 'max_score': 84.17707,
 'took': 27,
 'total': 1}

In [11]: mg.querymany([1017, '695'], scopes='entrezgene', species='human')
Finished.
Out[11]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606}]

In [12]: mg.querymany([1017, '695'], scopes='entrezgene', species=9606)
Finished.
Out[12]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606}]

In [13]: mg.querymany([1017, '695'], scopes='entrezgene', species=9606, as_dataframe=True)
Finished.
Out[13]:
        _id  entrezgene                                       name symbol  \
query
1017   1017        1017                  cyclin-dependent kinase 2   CDK2
695     695         695  Bruton agammaglobulinemia tyrosine kinase    BTK

       taxid
query
1017    9606
695     9606

[2 rows x 5 columns]

In [14]: mg.querymany([1017, '695', 'NA_TEST'], scopes='entrezgene', species='human')
Finished.
Out[14]:
[{'_id': '1017',
  'entrezgene': 1017,
  'name': 'cyclin-dependent kinase 2',
  'query': '1017',
  'symbol': 'CDK2',
  'taxid': 9606},
 {'_id': '695',
  'entrezgene': 695,
  'name': 'Bruton agammaglobulinemia tyrosine kinase',
  'query': '695',
  'symbol': 'BTK',
  'taxid': 9606},
 {'notfound': True, 'query': 'NA_TEST'}]

# query all human kinases using fetch_all parameter:
In [15]: kinases = mg.query('name:kinase', species='human', fetch_all=True)
In [16]: kinases
Out [16]" <generator object _fetch_all at 0x7fec027d2eb0>

# kinases is a Python generator, now you can loop through it to get all 1073 hits:
In [16]: for gene in kinases:
   ....:     print gene['_id'], gene['symbol']
Out [16]: <output omitted here>

Contact

Drop us any question or feedback:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mygene-3.2.2.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

mygene-3.2.2-py2.py3-none-any.whl (5.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mygene-3.2.2.tar.gz.

File metadata

  • Download URL: mygene-3.2.2.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for mygene-3.2.2.tar.gz
Algorithm Hash digest
SHA256 e729cabbc28cf5afb221bca1ab637883b375cb1a3e2f067587ec79f71affdaea
MD5 c52d987c25c68e15b35efebf636de339
BLAKE2b-256 0aeca256003f84196aa3fdd65a7c6f5adfc0688398fb66442eba75b39c9b7627

See more details on using hashes here.

File details

Details for the file mygene-3.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: mygene-3.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for mygene-3.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 18d85d1b28ecee2be31d844607fb0c5f7d7c58573278432df819ee2a5e88fe46
MD5 1d18c51963a35f89aa30ef674ca186e1
BLAKE2b-256 a7b7132b1673c0ec00881d49d56c09624942fa0ebd2fc21d73d80647efa082e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page