Turbocharge a PubMed literature search using citation data from the NIH
PubMed ID (PMID) Cite
Turbocharge a PubMed literature search in biomedicine, biochemistry, chemistry, behavioral science, and other life sciences by linking citation data from the National Institutes of Health (NIH) with PubMed IDs (PMIDs) using the command line rather than clicking and clicking and clicking on Google Scholar "Cited by N" links.
Table of Contents
- Quickstart on the command line
- 1) Download citation counts and data for a research paper
- 2) Forward citation search: following a paper's Cited by links or Forward snowballing
- 3) Backward citation search: following the links to a paper's references or Backward snowballing
- 4) Summarize a group of citations
- 5) Search PubMed from the command line
- Examples in Jupyter notebooks using the pmidcite Python library
- Installation & citation:
1) Download citation counts and data for a research paper
$ icite -H 26032263
- This paper (PMID 26032263) has
- This paper is performing well (
74th percentile in column
%) compared to its peers.
This paper is performing well (
74th percentile) compared to its peers (column
The NIH percentile grouping (column
G) helps to
highlight the better performing papers in groups
sorting the citing papers by group first, then publication year.
The sort places the lower performing papers in groups
1 at the back.
New papers appear at the beginning of a sorted list, no matter how many citations they have to better facilitate researchers in finding the latest discoveries.
The grouping of papers by NIH percentile grouping is a novel feature created by dvklopfenstein for this project.
2) Forward citation search
Also known as following a paper's Cited by links or Forward snowballing
icite -H; icite 26032263 --load_citations | sort -k6 -r
icite -H; icite 26032263 -c | sort -k6 -r
3) Backward citation search
Also known as following links to a paper's references or Backward snowballing
$ icite -H; icite 26032263 --load_references | sort -k6 -r
$ icite -H; icite 26032263 -r | sort -k6 -r
4) Summarize a group of citations
- 4a) Examine a paper with PMID
30022098. Print the column headers(
icite -H 30022098
- 4b) Download the details about each paper(
-c) that cites
30022098into a file(
icite 30022098 -c -o goatools_cites.txt
- 4c) Summarize the overall performace of the 300+ citing papers contained in
summarize_papers goatools_cites.txt -p TOP CIT CLI
4a) Examine a paper with PMID
30022098. Print the column headers(
$ icite -H 30022098 COL 2 3 4 5 6 7 8 9 10 au(authors) TYP PMID RP HAMCc % G YEAR cit cli ref au(authors) title TOP 30022098 R. .A..c 100 4 2018 318 1 23 au(D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
Paper with PMID
30022098 is cited by
cit) other research papers and
cli) clinical study. It has
4b) Download the details about each paper(
-c) that cites
30022098 into a file(
$ icite 30022098 -c -o goatools_cites.txt
The requested paper (PMID=
30022098) is described in one one line in
$ grep TOP goatools_cites.txt TOP 30022098 R. .A..c 100 4 2018 318 1 23 au(D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
The paper (PMID=
30022098) is cited by 381(
CIT) research papers plus 1(
CLI) clinical study:
$ grep CIT goatools_cites.txt | wc -l 318 $ grep CLI goatools_cites.txt | wc -l 1
4c) Summarize all the papers in
NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see? Open an issue to comment.
$ summarize_papers goatools_cites.txt -p TOP CIT CLI i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt
- Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option.
- The groups are from newest(
i) to top-performing(
3), very good(
2), and overlooked(
- The percentages of papers in
goatools_citations.txtin each group follow the group name
5) Download citations for all papers returned from a PubMed search
- Do a search in PubMed
- Save all results into a file containing all PMIDs found by the search
- Download the list of PMIDs
- Run icite to analyze all the PMIDs
1. Do a search in PubMed
2. Save all results into a list of PMIDs
3. Download the list of PMIDs
4. Run icite to analyze all the PMIDs
$ icite -i pmid-HIVANDDNAm-set.txt -o pmid-HIVANDDNAm-icite.txt $ grep TOP pmid-HIVANDDNAm-icite.txt | sort -k6
Command Line Interface (CLI)
A Command-Line Interface (CLI) can be preferable to a Graphical User Interface (GUI) because:
- processing can be automated from a script
- time-consuming mouse clicking is reduced
- more data can be seen at once on a text screen than in a browser, giving the researcher a better overall impression of the full set of information 
Researchers who use Linux or Mac already work from the command line. Researchers who use Windows can get that Linux-like command line feeling while still running native Windows programs by downloading Cygwin from https://www.cygwin.com/ .
PubMed vs Google Scholar
In 2013, Boeker et al.  recommended that a scientific search interface contain five integrated search criteria. PubMed implements all five, while Google did not in 2013 or today.
Google's highly popular implementation of the forward citation search through their ubiquitous "Cited by N" links is a "Better" experience than the PubMed's "forward citation search" implementation.
But if your research is in the health sciences and you are amenable to working from the command line, you can use PubMed in your browser plus citation data downloaded from the NIH using the command-line using pmidcite. The NIH's citation data includes a paper's ranking among its co-citation network.
What is in PubMed? Take a quick tour
PubMed is a search interface and toolset used to access over 30.5 million article records from databases such as:
- MEDLINE: a highly selective database started in the 1960s
- PubMed Central (PMC): an open-access database for full-text papers that are free of cost
- Additional content such as books and articles published before the 1960s
To install from PyPI
$ pip3 install pmidcite
To install locally
$ git clone https://github.com/dvklopfenstein/pmidcite.git $ cd ./pmidcite $ pip3 install .
Save your literature search in a GitHub repo.
1. Add a pmidcite init file
Add a .pmidciterc init file to a non-git managed directory, such as home (~)
$ icite --generate-rcfile | tee ~/.pmidciterc [pmidcite] email = firstname.lastname@example.org # To download PubMed search results, get an NCBI API key here: # https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities apikey = MY_LONG_HEX_NCBI_API_KEY tool = my_scripts
$ export PMIDCITECONF=~/.pmidciterc
Do not version manage the
.pmidciterc using a tool such as GitHub because it
contains your personal email and your private NCBI API key.
2. NCBI E-Utils API key
To download PubMed abstracts and PubMed search results using NCBI's E-Utils,
get an NCBI API key using these instructions:
apikey value in the config file:
See the contributing guide for detailed instructions on how to get started contributing to the pmidcite project.
How to Cite
If you use pmidcite in your research or literature search, please cite paper 1 (pmidcite) and paper 3 (NIH citation data).
Please also consider reading and citing Gusenbauer's response (paper 2) about improving search for all during the information avalanche of these times:
The pmidcite paper:
Commentary to Gusenbauer and Haddaway 2020: Evaluating Retrieval Qualities of PubMed and Google Scholar
Klopfenstein DV and Dampier W
2020 | Research Synthesis Methods | PMID: 33031632 | DOI: 10.1002/jrsm.1456 | pdf
Gusenbauer's response to the pmidcite paper:
What every Researcher should know about Searching – Clarified Concepts, Search Advice, and an Agenda to improve Finding in Academia
Gusenbauer M and Haddaway N
2020 | Research Synthesis Methods | PMID: 33031639 | DOI: 10.1002/jrsm.1457 | pdf
The NIH citation data used by pmidcite -- Scientific Influence, Translation, and Citation counts:
The NIH Open Citation Collection: A public access, broad coverage resource
Hutchins BI ... Santangelo GM
2019 | PLoS Biology | PMID: 31600197 | DOI: 10.1371/journal.pbio.3000385
Please consider reading and citing the paper  which inspired the creation of pmidcite  and the authors' response to our paper :
- Which Academic Search Systems are Suitable for Systematic Reviews or Meta-Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed and 26 other Resources
Gusenbauer M and Haddaway N
2019 | Research Synthesis Methods | PMID: 31614060 | DOI: 10.1002/jrsm.1378
Mentioned in this README are also these outstanding contributions:
Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level
Hutchins BI, Xin Yuan, Anderson JM, and Santangelo, George M.
2016 | PLoS Biology | PMID: 27599104 | DOI: 10.1371/journal.pbio.1002541
Google Scholar as replacement for systematic literature searches: good relative recall and precision are not enough
Boeker M et al.
2013 | BMC Medical Research Methodology | PMID: 24160679 | DOI: 10.1186/1471-2288-13-131
- PMIDCITE Manuscript with the original text box formatting
- Gusenbauer's Response
Copyright (C) 2019-present pmidcite, DV Klopfenstein, PhD. All rights reserved.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for pmidcite-0.0.45-py2.py3-none-any.whl