CLI to search, download, and convert academic papers (arXiv, Semantic Scholar) into Markdown — built for AI/ML researchers.
Project description
paperhound
paperhound — sniff out academic papers from the command line.
A small, fast CLI for AI/ML researchers who want a single tool to search, inspect, download, and convert to Markdown papers from arXiv and Semantic Scholar. Conversion is powered by docling, so the resulting Markdown is good enough to feed straight into an LLM context.
Features
- 🔎 Unified search — one query, all backends. arXiv and Semantic Scholar are queried in parallel and the results are merged and deduplicated.
- 📄 Inspect before downloading —
paperhound show <id>prints the abstract and metadata so you can decide if it's worth a download. - ⬇️ Download by identifier — arXiv id, DOI, Semantic Scholar paper id, or any paper URL. Open-access PDFs are resolved automatically.
- 📝 PDF → Markdown via docling —
paperhound convert paper.pdforpaperhound get <id>for the full pipeline. - 🤖 Agent-ready — ships with a
SKILL.mdand JSON output mode so any Claude / OpenAI / local agent can drive the CLI. - 🧪 Heavily tested — every module has unit tests; live integration tests are gated behind an environment variable.
Installation
pip install paperhound
or with uv:
uv tool install paperhound
Python 3.10+ is required. Docling pulls in PyTorch on first run, so the very first conversion may take a moment to download model weights.
Quick start
# Search across all providers
paperhound search "diffusion transformers" --limit 5
# Show the abstract for a specific paper
paperhound show 2401.12345
paperhound show 10.1038/s41586-020-2649-2 # DOI works too
paperhound show https://arxiv.org/abs/1706.03762 # ...and URLs
# Download the PDF
paperhound download 1706.03762 -o ./papers/
# Convert a local PDF to Markdown
paperhound convert ./papers/1706.03762.pdf -o attention.md
# Or do it all at once: search-resolve, download, convert, clean up
paperhound get 1706.03762 -o attention.md
JSON output for scripts and agents
paperhound search "graph neural networks" --json | jq '.[].title'
paperhound show 1706.03762 --json
Commands
| Command | Description |
|---|---|
paperhound search <query> |
Run a unified search. --limit, --source arxiv|semantic_scholar, --year-min, --year-max, --json. |
paperhound show <id> |
Fetch a paper's metadata + abstract. |
paperhound download <id> -o <path> |
Download a paper PDF. |
paperhound convert <pdf> -o <md> |
Convert a PDF (or any docling-supported file/URL) to Markdown. |
paperhound get <id> -o <md> |
Download + convert in one step. --keep-pdf to keep the PDF. |
paperhound version |
Print the installed version. |
Run paperhound <command> --help for full options.
Identifier formats
paperhound accepts whatever you have on hand:
- arXiv ids:
2401.12345,2401.12345v3,cs.AI/0301001,arXiv:2401.12345 - DOIs:
10.1038/s41586-020-2649-2,doi:10.1038/... - Semantic Scholar paper ids: 40-char hex
- URLs:
arxiv.org/abs/...,arxiv.org/pdf/...,doi.org/...,semanticscholar.org/paper/...
Configuration
| Env var | Purpose |
|---|---|
SEMANTIC_SCHOLAR_API_KEY |
Optional. Lifts the public rate limit for the Semantic Scholar Graph API. |
PAPERHOUND_RUN_INTEGRATION |
Set to 1 to run live integration tests. |
Use it from agents
paperhound is designed to be driven by AI agents. The repo includes a ready-to-install
skill at skills/paperhound/SKILL.md that documents
every command, recommends the JSON output flag, and gives an end-to-end example.
Drop the skill into your agent's skill directory (e.g. ~/.claude/skills/) and
the agent will know how to search papers, fetch abstracts, and produce Markdown.
Development
make install # uv sync --extra dev
make test # unit tests
make test-integration # live API tests (PAPERHOUND_RUN_INTEGRATION=1)
make check # lint + format check + tests (run before pushing)
The test suite uses respx to record/replay HTTP, so unit tests do not touch
the network. Provider clients are dependency-injected, which makes the
aggregator and CLI fully unit-testable.
Releasing to PyPI
- Bump
versioninpyproject.tomlandpaperhound/__init__.py. - Tag the release:
git tag v0.1.1 && git push --tags. - The
Publish to PyPIGitHub Action builds and publishes via PyPI Trusted Publishing — no API token required, just configure the trusted publisher once on PyPI.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperhound-0.1.0.tar.gz.
File metadata
- Download URL: paperhound-0.1.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2135cbea4d84dc62c282b2df4aad072ae6a8ac65057e6841527b5d6946fe445c
|
|
| MD5 |
c0febdb66b581006e8dd2e6bbb27a906
|
|
| BLAKE2b-256 |
4337b6e11bded7fb244bc25c8b68796754c5517ffd261a36dc0711ba1085e171
|
Provenance
The following attestation bundles were made for paperhound-0.1.0.tar.gz:
Publisher:
publish.yml on alexfdez1010/paperhound
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperhound-0.1.0.tar.gz -
Subject digest:
2135cbea4d84dc62c282b2df4aad072ae6a8ac65057e6841527b5d6946fe445c - Sigstore transparency entry: 1441609854
- Sigstore integration time:
-
Permalink:
alexfdez1010/paperhound@a5ef9ce1dda1e38acd1123e4ff7da69a8c6b5027 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/alexfdez1010
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a5ef9ce1dda1e38acd1123e4ff7da69a8c6b5027 -
Trigger Event:
push
-
Statement type:
File details
Details for the file paperhound-0.1.0-py3-none-any.whl.
File metadata
- Download URL: paperhound-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f40b998fdf00b4dfcf378ed99a0147929576205758506ff76309d2f2aed3d33f
|
|
| MD5 |
4e799929ed0df07b86bb0cddc06ccf28
|
|
| BLAKE2b-256 |
c143187cf65e04ffa40cd47b4bfe99ff938c595d59f8411f6323b208f4375f10
|
Provenance
The following attestation bundles were made for paperhound-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on alexfdez1010/paperhound
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paperhound-0.1.0-py3-none-any.whl -
Subject digest:
f40b998fdf00b4dfcf378ed99a0147929576205758506ff76309d2f2aed3d33f - Sigstore transparency entry: 1441609939
- Sigstore integration time:
-
Permalink:
alexfdez1010/paperhound@a5ef9ce1dda1e38acd1123e4ff7da69a8c6b5027 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/alexfdez1010
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a5ef9ce1dda1e38acd1123e4ff7da69a8c6b5027 -
Trigger Event:
push
-
Statement type: