Skip to main content

CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles

Project description

huntpdf

image

A Python-based CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles.

Installation

pip install huntpdf
playwright install chromium  # one-time setup for browser fallback

Usage

huntpdf <query> [-o OUTPUT]

Where <query> can be:

Input Type Example
arXiv ID huntpdf 2301.07041
DOI huntpdf 10.1038/nature12373
PMC ID huntpdf PMC4056847
PubMed ID huntpdf 25428566
URL huntpdf https://arxiv.org/abs/2301.07041

Output is JSON:

{"status": "success", "pdf_path": "/absolute/path/to/file.pdf"}

Options

  • -o, --output PATH — Save PDF to a specific path (default: auto-generated in current directory)

Supported URL patterns

  • arxiv.org/abs/{id} and arxiv.org/pdf/{id}
  • doi.org/{doi}
  • pubmed.ncbi.nlm.nih.gov/{pmid}
  • pmc.ncbi.nlm.nih.gov/articles/{PMCID}
  • Direct PDF links (any URL serving application/pdf)
  • Other URLs (attempted via headless browser fallback)

Download strategy

For each URL, huntpdf tries:

  1. Direct HTTP download — fast streaming via httpx
  2. Headless browser — Playwright-based fallback for JavaScript-rendered pages

For DOI lookups specifically, if the Unpaywall PDF URL fails (e.g. paywall), huntpdf queries the Semantic Scholar API for alternate open-access copies and tries existing resolvers (arXiv, PMC) with the discovered identifiers.

Environment variables

  • UNPAYWALL_EMAIL (required for DOI lookups) — Your email address for Unpaywall API requests

Development

pip install -e ".[dev]"
pytest

Run only unit tests (skip integration tests that hit real APIs):

pytest -m "not integration"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huntpdf-1.0.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

huntpdf-1.0.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file huntpdf-1.0.0.tar.gz.

File metadata

  • Download URL: huntpdf-1.0.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for huntpdf-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ba346e7694f546ecc4831b9ea290aa845d53c605c813a55c3333920eb9376b6a
MD5 5663514fe190b33ba56e74931702792b
BLAKE2b-256 832cd2fb171db9f1d9ebcf790f72028ebf0e2a222f15637633a49b0e58915dd3

See more details on using hashes here.

File details

Details for the file huntpdf-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: huntpdf-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for huntpdf-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d24a68f3ad2d1fa169279a3bfb5b9b7c684495f7cf991a4d01432fbb2366c00e
MD5 fc634c5131224baa18f991401df80b25
BLAKE2b-256 2e3a124483a58423641d828f3d724afbcf2aa8fcf800bb396608f04fd592f63a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page