Skip to main content

Simple interface to query or scrape IDs from PubMed.

Project description

pubmed-id

Simple interface to query or scrape IDs from PubMed (The US National Library of Medicine).

This tool was originally developed to obtain temporal data for the well-known PubMed graph dataset.

Usage

Command line interface

A CLI is included that allows querying the PubMed via their API or by web scraping.

usage: pubmed-id [-h] [-o OUTPUT_FILE] [-m {api,citedin,refs,scrape}]
                 [-w WORKERS] [-c SIZE] [--email ADDRESS] [--tool NAME]
                 [--quiet] [--log-level {critical,error,warning,info,debug}]
                 ID [ID ...]

positional arguments:
  ID                    IDs to query (separated by whitespaces).

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        File to write results to (default: 'PubMedAPI.json').
  -m {api,citedin,refs,scrape}, --method {api,citedin,refs,scrape}
                        Method to obtain data with (default: 'api').
  -w WORKERS, --max-workers WORKERS
                        Number of processes to use (optional).
  -c SIZE, --chunksize SIZE
                        Number of objects sent to each worker (optional).
  --email ADDRESS       Your e-mail address (required to query API only).
  --tool NAME           Tool name (optional, used to query API only).
  --quiet               Does not print results (limited to a single item only
                        by default).
  --log-level {critical,error,warning,info,debug}
                        Logging level (optional).

Importing as a class

Quick example on how to obtain data from the API:

>>> from pubmed_id import PubMedAPI
>>> api = PubMedAPI(email="myemail@domain.com", tool="MyToolName")

For more information on the API, please check the official documentation.

Obtain data from API

By default, the returned data is a dictionary with the PMCID, the PMID, and the DOI of a paper:

>>> api(6798965)

{
  "pmcid": "PMC1163140",
  "pmid": "6798965",
  "doi": "10.1042/bj1970405"
}

Either an integer (PMID), a string (PMID or PMCID), or a list is accepted as input when calling the class directly.

Note: NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. See more: Usage Guidelines.

Scrape data from website

Scraping the PMID or PMICD instead returns more data (strings shortened for brevity):

>>> api(6798965, method="scrape")

{
  "6798965": {
    "date": "1981 Aug 1",
    "title": "Characterization of N-glycosylated...",
    "abstract": "The N epsilon-glycosylation of...",
    "author_names": "A Le Pape;J P Muh;A J Bailey",
    "author_ids": "6798965;6798965;6798965",
    "doi": "PMC1163140",
    "pmid": "6798965"
  }
}

Note: some papers are unavailable from the API, but still return data when scraped, e.g., PMID 15356126.

Get paper references

Returns list of references from a paper:

>>> api(6798965, method="refs")

{
  "6798965": [
    "7430347",
    "..."
  ]
}

Get citations for a paper

Returns list of citations to a paper:

>>> api(6798965, method="citedin")

{
  "15356126": [
    "32868408",
    "..."
  ]
}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubmed_id-1.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubmed_id-1.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pubmed_id-1.1.tar.gz.

File metadata

  • Download URL: pubmed_id-1.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pubmed_id-1.1.tar.gz
Algorithm Hash digest
SHA256 0824ebc5cd045bdb3993caa49dd12ab0244ce2a9e4b0e67b65895b1397146d46
MD5 dc830e1e25df1101018af145170d9dad
BLAKE2b-256 ba21cd8e55a8f7c0ea2ec3a895f25ddcb2facc18ca50e9c6e7942e9cf627a377

See more details on using hashes here.

File details

Details for the file pubmed_id-1.1-py3-none-any.whl.

File metadata

  • Download URL: pubmed_id-1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pubmed_id-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd40283b33ae73ae4ef54794410b03486d1117334711f210b6e03da17c923b84
MD5 110a066d14778579ee3d68239b79eedb
BLAKE2b-256 ebb1589ed3be5124c6d84e4ba829a0aa2947e692b6e0db357dc78a8ba3b029cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page