CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles
Project description
huntpdf
A Python-based CLI tool to fetch PDFs of arxiv, DOI, and PubMed articles.
Installation
pip install huntpdf
playwright install chromium # one-time setup for browser fallback
Usage
huntpdf <query> [-o OUTPUT]
Where <query> can be:
| Input Type | Example |
|---|---|
| arXiv ID | huntpdf 2301.07041 |
| DOI | huntpdf 10.1038/nature12373 |
| PMC ID | huntpdf PMC4056847 |
| PubMed ID | huntpdf 25428566 |
| URL | huntpdf https://arxiv.org/abs/2301.07041 |
Output is JSON:
{"status": "success", "pdf_path": "/absolute/path/to/file.pdf"}
Options
-o, --output PATH— Save PDF to a specific path (default: auto-generated in current directory)
Supported URL patterns
arxiv.org/abs/{id}andarxiv.org/pdf/{id}doi.org/{doi}pubmed.ncbi.nlm.nih.gov/{pmid}pmc.ncbi.nlm.nih.gov/articles/{PMCID}- Direct PDF links (any URL serving
application/pdf) - Other URLs (attempted via headless browser fallback)
Download strategy
For each URL, huntpdf tries:
- Direct HTTP download — fast streaming via httpx
- Headless browser — Playwright-based fallback for JavaScript-rendered pages
For DOI lookups specifically, if the Unpaywall PDF URL fails (e.g. paywall), huntpdf queries the Semantic Scholar API for alternate open-access copies and tries existing resolvers (arXiv, PMC) with the discovered identifiers.
Environment variables
UNPAYWALL_EMAIL(required for DOI lookups) — Your email address for Unpaywall API requests
Development
pip install -e ".[dev]"
pytest
Run only unit tests (skip integration tests that hit real APIs):
pytest -m "not integration"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file huntpdf-1.0.0.tar.gz.
File metadata
- Download URL: huntpdf-1.0.0.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba346e7694f546ecc4831b9ea290aa845d53c605c813a55c3333920eb9376b6a
|
|
| MD5 |
5663514fe190b33ba56e74931702792b
|
|
| BLAKE2b-256 |
832cd2fb171db9f1d9ebcf790f72028ebf0e2a222f15637633a49b0e58915dd3
|
File details
Details for the file huntpdf-1.0.0-py3-none-any.whl.
File metadata
- Download URL: huntpdf-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d24a68f3ad2d1fa169279a3bfb5b9b7c684495f7cf991a4d01432fbb2366c00e
|
|
| MD5 |
fc634c5131224baa18f991401df80b25
|
|
| BLAKE2b-256 |
2e3a124483a58423641d828f3d724afbcf2aa8fcf800bb396608f04fd592f63a
|