Skip to main content

Simple interface to query or scrape IDs from PubMed.

Project description

pubmed-id

Simple interface to query or scrape IDs from PubMed (The US National Library of Medicine).

This tool was originally developed to obtain temporal data for the well-known PubMed graph dataset.

Usage

Command line interface

A CLI is included that allows querying the PubMed via their API or by web scraping.

usage: pubmed-id [-h] [-o OUTPUT_FILE] [-m METHOD] [-w WORKERS] [-c SIZE]
                 [--email ADDRESS] [--tool NAME] [--quiet]
                 ID [ID ...]

positional arguments:
  ID                    IDs to query (separated by whitespaces).

options:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        File to write results to (default: 'PubMedAPI.json').
  -m METHOD, --method METHOD
                        Method to obtain data with (default: 'api'). Choices:
                        ('api', 'citedin', 'refs', 'scrape').
  -w WORKERS, --max-workers WORKERS
                        Number of processes to use (optional).
  -c SIZE, --chunksize SIZE
                        Number of objects sent to each worker (optional).
  --email ADDRESS       Your e-mail address (required to query API only).
  --tool NAME           Tool name (optional, used to query API only).
  --quiet               Does not print results (limited to a single item only
                        by default).

Importing as a class

Quick example on how to obtain data from the API:

>>> from pubmed_id import PubMedAPI
>>>
>>> api = PubMedAPI(email="myemail@domain.com", tool="MyToolName")

For more information on the API, please check the official documentation.

Obtain data from API

By default, the returned data is a dictionary with the PMCID, the PMID, and the DOI of a paper:

>>> api(6798965)

{
  "pmcid": "PMC1163140",
  "pmid": "6798965",
  "doi": "10.1042/bj1970405"
}

Either an integer (PMID), a string (PMID or PMCID), or a list is accepted as input when calling the class directly.

Note: NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. See more: Usage Guidelines.

Scrape data from website

Scraping the PMID or PMICD instead returns more data (strings shortened for brevity):

>>> api(6798965, method="scrape")

{
  "6798965": {
    "date": "1981 Aug 1",
    "title": "Characterization of N-glycosylated...",
    "abstract": "The N epsilon-glycosylation of...",
    "author_names": "A Le Pape;J P Muh;A J Bailey",
    "author_ids": "6798965;6798965;6798965",
    "doi": "PMC1163140",
    "pmid": "6798965"
  }
}

Note: some papers are unavailable from the API, but still return data when scraped, e.g., PMID 15356126.

Get paper references

Returns list of references from a paper:

>>> api(6798965, method="refs")

{
  "6798965": [
    "7430347",
    "..."
  ]
}

Get citations for a paper

Returns list of citations to a paper:

>>> api(6798965, method="citedin")

{
  "15356126": [
    "32868408",
    "..."
  ]
}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubmed_id-1.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubmed_id-1.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pubmed_id-1.0.tar.gz.

File metadata

  • Download URL: pubmed_id-1.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pubmed_id-1.0.tar.gz
Algorithm Hash digest
SHA256 2ffc9df66a5429484420c9327a6816a47b9fe48639c3be200128625b394cd2a5
MD5 8729d4d668c9690aaafe492eba3dd54b
BLAKE2b-256 a806797269ea9863749a1b681fbe47d1167a38914d878168ebeface5612a1344

See more details on using hashes here.

File details

Details for the file pubmed_id-1.0-py3-none-any.whl.

File metadata

  • Download URL: pubmed_id-1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for pubmed_id-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4db1ddcdac03b29a89bcc676a4d1dbefa28f54b69ad44fa438e8145062bf52db
MD5 13ee96c002b534af37955a4580d3a0f0
BLAKE2b-256 d18247f82bad8ade91268eb72693ff85fd848b5532ae4b1520a352b2dc82e4d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page