Skip to main content

Simple interface to query or scrape IDs from PubMed.

Project description

pubmed-id

Simple interface to query or scrape IDs from PubMed (The US National Library of Medicine).

This tool was originally developed to obtain temporal data for the well-known PubMed graph dataset.

Usage

Command line interface

A CLI is included that allows querying the PubMed via their API or by web scraping.

usage: pubmed-id [-h] [-o OUTPUT_FILE] [-m {api,citedin,refs,scrape}]
                 [-w WORKERS] [-c SIZE] [--email ADDRESS] [--tool NAME]
                 [--quiet] [--log-level {critical,error,warning,info,debug}]
                 ID [ID ...]

positional arguments:
  ID                    IDs to query (separated by whitespaces).

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        File to write results to (default: 'PubMedAPI.json').
  -m {api,citedin,refs,scrape}, --method {api,citedin,refs,scrape}
                        Method to obtain data with (default: 'api').
  -w WORKERS, --max-workers WORKERS
                        Number of processes to use (optional).
  -c SIZE, --chunksize SIZE
                        Number of objects sent to each worker (optional).
  --email ADDRESS       Your e-mail address (required to query API only).
  --tool NAME           Tool name (optional, used to query API only).
  --quiet               Does not print results (limited to a single item only
                        by default).
  --log-level {critical,error,warning,info,debug}
                        Logging level (optional).

Importing as a class

Quick example on how to obtain data from the API:

>>> from pubmed_id import PubMedAPI
>>> api = PubMedAPI(email="myemail@domain.com", tool="MyToolName")

For more information on the API, please check the official documentation.

Scrape data from website

Scraping the PMID or PMICD instead returns more data (strings shortened for brevity):

>>> api(6798965, method="scrape")

{
  "6798965": {
    "date": "1981 Aug 1",
    "title": "Characterization of N-glycosylated...",
    "abstract": "The N epsilon-glycosylation of...",
    "author_names": "A Le Pape;J P Muh;A J Bailey",
    "author_ids": "6798965;6798965;6798965",
    "doi": "PMC1163140",
    "pmid": "6798965"
  }
}

Note: some papers are unavailable from the API, but still return data when scraped, e.g., PMID 15356126. Please consider the Usage Guidelines from NCBI (see note below).

Obtain data from API

By default, the returned data is a dictionary with the PMCID, the PMID, and the DOI of a paper:

>>> api(6798965, method="api")

{
  "pmcid": "PMC1163140",
  "pmid": "6798965",
  "doi": "10.1042/bj1970405"
}

Either an integer (PMID), a string (PMID or PMCID), or a list is accepted as input when calling the class directly.

Note: NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. See more: Usage Guidelines.

Get paper references

Returns list of references from a paper:

>>> api(6798965, method="refs")

{
  "6798965": [
    "7430347",
    "..."
  ]
}

Get citations for a paper

Returns list of citations to a paper:

>>> api(6798965, method="citedin")

{
  "15356126": [
    "32868408",
    "..."
  ]
}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubmed_id-1.2.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubmed_id-1.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file pubmed_id-1.2.tar.gz.

File metadata

  • Download URL: pubmed_id-1.2.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pubmed_id-1.2.tar.gz
Algorithm Hash digest
SHA256 152c3c5b71c8108d0921a5d387106a00d64cb732a110b4eeffb9dd37f25a8025
MD5 3e47b80de04e36999bccb568b8f04d8d
BLAKE2b-256 5f8ea2221044d3eccbde2dabe9cb6b086adbb4a39584e8ce120815bc6de32f34

See more details on using hashes here.

File details

Details for the file pubmed_id-1.2-py3-none-any.whl.

File metadata

  • Download URL: pubmed_id-1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pubmed_id-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d3dec2f9a2000a491b39ce92b4668639b76b4ea68e7d3e24f02b5e43b5b0d09e
MD5 804b2f6c4a85960b08adf2fc511e431c
BLAKE2b-256 803f04a8c3a07f76c7f52603655321be4681f0cdc8ba2d98ddbaae8bf4f0416e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page