Skip to main content

Retrieve article records from NCBI via E-utilities

Project description

ncbiutils

build License codecov Making retrieval of records from National Center for Biotechnology Information (NCBI) E-Utilities simpler.

Installation

Set up a virtual environment. Here, we use miniconda to create an environment named testenv:

$ conda create --name testenv python=3.8
$ conda activate testenv

Then install the package in the testenv environment:

$ pip install ncbiutils

Usage

The ncbiutils module exposes a PubMedFetch class that provides an easy to configure and use wrapper for the EFetch E-Utility. By default, PubMedFetch will retrieve PubMed article records, each indicated by its PubMed identifier (PMID).

from ncbiutils.ncbiutils import PubMedFetch
import json

# Initalize a list of PubMed identifiers for those records we wish to retrieve
uids = ['16186693', '29083299']

# Create an instance, optionally provide an E-Utility API key
pubmed_fetch = PubMedFetch()

# Retrieve the records
# Returns a generator that yields results for a chunk of the input PMIDs (see Options)
chunks = pubmed_fetch.get_citations(uids)

# Iterate over the results
for chunk in chunks:
    # A Chunk is a namedtuple with 3 fields:
    #   - error: Includes network errors as well as HTTP status >=400
    #   - citations: article records, each wrapped as a Citation
    #   - ids: input ids for chunk
    error, citations, ids = chunk

    # Citation class can be represented as a dict
    print(json.dumps(citations[0].dict()))

# Output as JSON
{
   "pmid":"16186693",
   "pmc":"None",
   "doi":"10.1159/000087186",
   "title":"Searching the MEDLINE literature database through PubMed: a short guide.",
   "abstract":"The Medline database from the National Library of Medicine (NLM) contains more than 12 million bibliographic citations from over 4,600 international biomedical journals...",
   "author_list":[
      {
         "fore_name":"Edith",
         "last_name":"Motschall",
         "initials":"E",
         "collective_name":"None",
         "orcid":"None",
         "affiliations":[
            "Institut für Medizinische Biometrie und Medizinische Informatik, Universität Freiburg, Germany. motschall@mi.ukl.uni-freiburg.de"
         ],
         "emails":[
            "motschall@..."
         ]
      },
      ...
   ],
   "journal":{
      "title":"Onkologie",
      "issn":[
         "0378-584X"
      ],
      "volume":"28",
      "issue":"10",
      "pub_year":"2005",
      "pub_month":"Oct",
      "pub_day":"None"
   },
   "publication_type_list":[
      "D016428",
      "D016454"
   ],
   "correspondence":[],
   "mesh_list":[
      {
         "descriptor_name":{
            "ui":"D003628",
            "value":"Database Management Systems"
         }
      },
      {
         "descriptor_name":{
            "ui":"D016206",
            "value":"Databases, Bibliographic"
         }
      },
      {
         "descriptor_name":{
            "ui":"D016247",
            "value":"Information Storage and Retrieval"
         },
         "qualifier_name":[
            {
               "ui":"Q000379",
               "value":"methods"
            }
         ]
      },
     ...
   ]
}

Options

Configure the PubMedFetch instance through its constructor:

  • db: DbEnum
    • Set the database to process either <!DOCTYPE pmc-articleset ...> or <!DOCTYPE PubmedArticleSet ...> (default)
  • retmax : int
    • Maximum number of records to return in a chunk (default/max 10000)
  • api_key : str
    • API key for NCBI E-Utilities

Also available is:

Testing

As this project was built with poetry, you'll need to install poetry to get this project's development dependencies.

Once installed, clone this GitHub remote:

$ git clone https://github.com/PathwayCommons/ncbiutils
$ cd ncbiutils

Install the project:

$ poetry install

Run the test script:

$ ./test.sh

Under the hood, the tests are run with pytest. The test script also does a lint check with flake8 and type check with mypy.

Publishing a release

A GitHub workflow will automatically version and release this package to PyPI following a push directly to main or when a pull request is merged into main. A push/merge to main will automatically bump up the patch version.

We use Python Semantic Release (PSR) to manage versioning. By making a commit with a well-defined message structure, PSR will scan commit messages and bump the version accordingly in accordance with semver.

For a patch bump:

$ git commit -m "fix(ncbiutils): some comment for this patch version"

For a minor bump:

$ git commit -m "feat(ncbiutils): some comment for this minor version bump"

For a release:

$ git commit -m "feat(mod_plotting): some comment for this release\n\nBREAKING CHANGE: other footer text."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbiutils-0.7.1.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

ncbiutils-0.7.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file ncbiutils-0.7.1.tar.gz.

File metadata

  • Download URL: ncbiutils-0.7.1.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ncbiutils-0.7.1.tar.gz
Algorithm Hash digest
SHA256 365227162881d84e9d87873ad5cdbb0d5a6f24392d904aecfcfef1e938577cbf
MD5 e5f3a72542ee535f54abbf8d3bd32cc1
BLAKE2b-256 42cec63613e5324adc28b0efe22f560cb9a6df1e50d9dc8b23077a33aa080d51

See more details on using hashes here.

File details

Details for the file ncbiutils-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: ncbiutils-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ncbiutils-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2e58949ce20d8a9134d155b49d9819054835e971dbb85f2b7a68717eb39e9caf
MD5 7d599ed85bd749e2d650cc51c7b4c799
BLAKE2b-256 cd7d17595a3645f5ad108a82a88413b5c276aecc5bfa73d1a885ce0b1bbc9c9f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page