Skip to main content

Structured Python interface to NCBI E-Utilities.

Project description

eutils is a Python package to simplify searching, fetching, and parsing records from NCBI using their E-utilities interface.

STATUS: This code is alpha. There are no known bugs, but the code supports only a limited subset of E-Utilities replies. PubMed, Gene, RefSeq (nucleotide), and dbSNP data are well-represented; others are not represented at all.

pypi_badge build_status Source

News

  • 0.3.0.post0 was just released. BEHAVIOR CHANGE: Client() no longer caches by default. Add Client(cache=True) to get previous behavior.

Features

  • simple Pythonic interface for searching and fetching

  • automatic query rate throttling per NCBI guidelines

  • optional sqlite-based caching of compressed replies

  • “façades” that facilitate access to essential attributes in replies

A Quick Example

As of May 1, 2018, NCBI throttles requests based on whether a client is registered. Unregistered clients are limited to 3 requests/second; registered clients are granted 10 requests/second, and may request more. See the NCBI Announcement for more information.

The eutils package will automatically throttle requests according to NCBI guidelines (3 or 10 requests/second without or with an API key, respectively).

$ pip install eutils
$ ipython

>>> from eutils import Client

# Initialize a client. This client handles all caching and query
# throttling.  For example:
>>> ec = Client(api_key=os.environ.get("NCBI_API_KEY", None))

# search for tumor necrosis factor genes
# any valid NCBI query may be used
>>> esr = ec.esearch(db='gene',term='tumor necrosis factor')

# fetch one of those (gene id 7157 is human TNF)
>>> egs = ec.efetch(db='gene', id=7157)

# One may fetch multiple genes at a time. These are returned as an
# EntrezgeneSet. We'll grab the first (and only) child, which returns
# an instance of the Entrezgene class.
>>> eg = egs.entrezgenes[0]

# Easily access some basic information about the gene
>>> eg.hgnc, eg.maploc, eg.description, eg.type, eg.genus_species
('TP53', '17p13.1', 'tumor protein p53', 'protein-coding', 'Homo sapiens')

# get a list of genomic references
>>> sorted([(r.acv, r.label) for r in eg.references])
[('NC_000017.11', 'Chromosome 17 Reference GRCh38...'),
 ('NC_018928.2', 'Chromosome 17 Alternate ...'),
 ('NG_017013.2', 'RefSeqGene')]

# Get the first three products defined on GRCh38
#>>> [p.acv for p in eg.references[0].products][:3]
#['NM_001126112.2', 'NM_001276761.1', 'NM_000546.5']

# As a sample, grab the first product defined on this reference (order is arbitrary)
>>> mrna = eg.references[0].products[0]
>>> str(mrna)
'GeneCommentary(acv=NM_001126112.2,type=mRNA,heading=Reference,label=transcript variant 2)'

# mrna.genomic_coords provides access to the exon definitions on this reference

>>> mrna.genomic_coords.gi, mrna.genomic_coords.strand
('568815581', -1)

>>> mrna.genomic_coords.intervals
[(7687376, 7687549), (7676520, 7676618), (7676381, 7676402),
(7675993, 7676271), (7675052, 7675235), (7674858, 7674970),
(7674180, 7674289), (7673700, 7673836), (7673534, 7673607),
(7670608, 7670714), (7668401, 7669689)]

# and the mrna has a product, the resulting protein:
>>> str(mrna.products[0])
'GeneCommentary(acv=NP_001119584.1,type=peptide,heading=Reference,label=isoform a)'

Important Notes

  • You are encouraged to browse issues. Please report any issues you find.

  • Use a pip package specification to ensure stay within minor releases for API stability. For example, eutils >=0.1,<0.2.

Developing and Contributing

Contributions of bug reports, code patches, and documentation are welcome!

Development occurs in the default branch. Please work in feature branches or bookmarks from the default branch. Feature branches should be named for the eutils issue they fix, as in 121-update-xml-facades. When merging, use a commit message like “closes #121: update xml facades to new-style interface”. (“closes #n” is recognized automatically and closes the ticket upon pushing.)

The included Makefile automates many tasks. In particular, make develop prepares a development environment and make test runs unittests. (Please run tests before committing!)

Again, thanks for your contributions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eutils-0.4.1.tar.gz (299.3 kB view details)

Uploaded Source

Built Distributions

eutils-0.4.1-py2.py3-none-any.whl (42.4 kB view details)

Uploaded Python 2 Python 3

eutils-0.4.1-py2.7.egg (75.6 kB view details)

Uploaded Source

File details

Details for the file eutils-0.4.1.tar.gz.

File metadata

  • Download URL: eutils-0.4.1.tar.gz
  • Upload date:
  • Size: 299.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for eutils-0.4.1.tar.gz
Algorithm Hash digest
SHA256 44d8f4a75bdb4c56e628370e756887709cd808238459ff208ba783cefa4fe3c5
MD5 6b157cc8eae4d5ec28b22568106cdb4d
BLAKE2b-256 c9a5dccb20d1c5d7cb8171fddb13f0793d86982d64f6b782a7814b1a2c027f61

See more details on using hashes here.

File details

Details for the file eutils-0.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: eutils-0.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for eutils-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 066b2b44e4b39aa92ea3ee31dd831bd678ff83e3bc6ed55f45e429ba8e8a991a
MD5 b0f9ad9c86c5f4f46c9946ca94546cec
BLAKE2b-256 0525226ee093ebe2533ed4f0be25896607ab3afce0de824293ddf8d2bc3d8621

See more details on using hashes here.

File details

Details for the file eutils-0.4.1-py2.7.egg.

File metadata

  • Download URL: eutils-0.4.1-py2.7.egg
  • Upload date:
  • Size: 75.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.14

File hashes

Hashes for eutils-0.4.1-py2.7.egg
Algorithm Hash digest
SHA256 7049fd252a1e543b366d9732e5c8986eb15ae578b4c9f730fc0272631bb1d82c
MD5 50435a778a09df72c9204da7a145419a
BLAKE2b-256 834323160796b24cf20a9e0916803e23145afc7eb616c93d05e61507344e2519

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page