Fetch HuggingFace Daily Papers and produce Jupyter-friendly AI summaries — one paper, one agent.
Project description
PaperHub
PaperHub fetches HuggingFace Daily Papers by programmatic date filters, assigns each paper to its own AI summarization agent, and renders Jupyter-friendly summaries by default.
Two ways to use PaperHub:
- Terminal CLI (
paperhub) — interactive launcher with a REPL interface. Primary mode. - Python / Jupyter — import
PaperHuband callhub.run(...)directly.
Install
python3 -m pip install -e ".[dev]"
Optional provider extras:
python3 -m pip install -e ".[anthropic]" # adds the Anthropic client
python3 -m pip install -e ".[google]" # adds the Google Gemini client
Configuration
PaperHub reads configuration from shell environment variables and from its own
per-user config file. The CLI can save provider keys for you without touching a
project-level .env file:
paperhub version
paperhub set-key openai
paperhub set-key anthropic
paperhub set-key google
paperhub api-keys
paperhub config-path
Inside the interactive launcher, use /set-key openai. The config file lives
under the OS-specific user config directory, for example
~/Library/Application Support/paperhub/.env on macOS. Environment variables
still override values saved there.
| Variable | Purpose |
|---|---|
OPENAI_API_KEY |
Default provider key |
ANTHROPIC_API_KEY |
Optional, used when --provider anthropic |
GOOGLE_API_KEY |
Optional, used when --provider google |
PAPERHUB_PROVIDER |
Override default provider (default: openai) |
PAPERHUB_MODEL |
Optional global model override |
PAPERHUB_OPENAI_MODEL |
OpenAI default model (default: gpt-5.4-mini) |
PAPERHUB_OPENAI_REASONING_EFFORT |
OpenAI reasoning effort (default: xhigh) |
PAPERHUB_ANTHROPIC_MODEL |
Anthropic default model (default: claude-haiku-4-5-20251001) |
PAPERHUB_GOOGLE_MODEL |
Google default model (default: gemini-3-flash-preview) |
PAPERHUB_CONCURRENCY |
Max concurrent paper agents (default: 5) |
PAPERHUB_MAX_PDF_CHARS |
Truncation cap for PDF text (default: 60000) |
PAPERHUB_CACHE_DIR |
Override the on-disk cache location |
Terminal CLI (Primary)
Start the interactive launcher:
paperhub
The launcher opens a REPL with a status dashboard showing the current provider, model, API key state, date range, and top paper count. Use commands to configure and run:
/provider
/provider openai
/version
/model
/model gpt-5.4-mini
/model default
/language
/date 2026-05
/date 2026-05-15
/date 2026-W18
/date 2026-05-01 2026-05-31
/top 5
/metadata
/run
/set-key openai
/keys
/config-path
/api-keys
/quit
Date formats for /date
| Example | Period | Description |
|---|---|---|
/date 2026-05 |
month | May 2026 |
/date 2026 |
year | Full year 2026 |
/date 2026-05-15 |
day | Single day |
/date 2026-W18 |
week | ISO week 18 of 2026 |
/date 2026-05-01 2026-05-31 |
custom | Inclusive date range |
/metadata fetches HuggingFace paper metadata only and does not call an LLM.
If the selected provider key is missing, /run prints setup guidance.
/api-keys shows key status and setup help.
You can also pass startup flags:
paperhub --provider anthropic
paperhub --model claude-haiku-4-5-20251001
paperhub --top-n 10
Python / Jupyter API
from datetime import date
from paperhub import PaperHub
hub = PaperHub(provider="openai")
# Month
hub.run(period="month", year=2026, month=5, top_n=10)
# Single day
hub.run(period="day", year=2026, month=5, day=1, top_n=5)
# ISO week
hub.run(period="week", year=2026, week=18, top_n=5)
# Full year
hub.run(period="year", year=2026, top_n=20)
# Custom range
hub.run(period="custom", start=date(2026, 4, 15), end=date(2026, 4, 30), top_n=15)
run returns a list[PaperSummary]; in non-Jupyter contexts pass
display=False and call render_plain yourself if you do not need the
Markdown side effect.
For a step-by-step notebook, open examples/03_jupyter_quickstart.ipynb.
Provider Selection
PaperHub(provider="openai") # OpenAI default model
PaperHub(provider="openai", model="gpt-5.4-mini")
PaperHub(provider="anthropic") # Anthropic default model
PaperHub(model="claude-haiku-4-5-20251001", provider="anthropic")
PaperHub(model="gemini-3-flash-preview", provider="google")
If model is omitted, PaperHub picks the selected provider's default model.
Provider-specific config values such as PAPERHUB_OPENAI_MODEL override those
defaults. PAPERHUB_MODEL remains available as a global override. OpenAI uses
PAPERHUB_OPENAI_REASONING_EFFORT=xhigh by default; set it to an empty value
to let the OpenAI API choose its model default.
Provider SDKs are imported lazily — installing paperhub includes OpenAI by
default, and does not require Anthropic or Google packages unless you use those
providers.
Caching
PaperHub caches metadata, PDF text, and summaries in
the OS-specific user cache directory, for example
~/Library/Caches/paperhub/paperhub.sqlite on macOS and
~/.cache/paperhub/paperhub.sqlite on Linux. Override with
PAPERHUB_CACHE_DIR.
Summary entries are keyed by (arxiv_id, model, language), so swapping models
or output language gives you a clean re-run while keeping the PDF download
free.
A second invocation with the same papers and model:
- Reuses the cached PDF text (no arXiv hit, no extraction).
- Reuses the cached summary (no LLM call).
Tests, Lint, Typecheck, Build
python3 -m pytest # unit tests, no live network or LLM keys needed
python3 -m ruff check .
python3 -m ruff format --check .
python3 -m mypy src tests
python3 -m build # build wheel + sdist
The unit tests use a fake LLM client and httpx.MockTransport; no real
provider keys are required.
Project Layout
src/paperhub/
__init__.py # PaperHub public API
config.py # Settings (pydantic-settings)
models.py # PaperMeta, PaperSummary, RunRequest
dates.py # period → (start, end)
fetchers/ # HF JSON API (default) + HTML fallback
pdf/ # arXiv download + text extraction
agents/ # LLMClient protocol + provider clients + PaperAgent
orchestrator.py # asyncio.Semaphore parallelism
cache.py # SQLite cache
formatter.py # Markdown / plain-text rendering
interactive_cli.py # `paperhub` interactive launcher
tests/ # pytest suite, mocked HTTP and fake LLM
docs/ARCHITECTURE.md
docs/HOW_TO_START.md
docs/API_KEYS.md
examples/README.md
examples/03_jupyter_quickstart.ipynb
Troubleshooting
- "No papers found": HuggingFace may not yet have published Daily Papers
for that date. Use
/metadatain the interactive launcher to check the fetcher without an LLM call. - PDF text comes back tiny: some arXiv PDFs use unusual layouts. PaperHub
falls back to
pdfplumber; if both extractors are short, the agent will pass through the abstract as the input text instead of failing.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperhub-0.1.3.tar.gz.
File metadata
- Download URL: paperhub-0.1.3.tar.gz
- Upload date:
- Size: 47.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
affabb7735180dfdd4bb385743bc3c8d2bdce2f60671ff4329dc50a343ab3194
|
|
| MD5 |
f8a91d08c62b5358de54d5aead8e5131
|
|
| BLAKE2b-256 |
34e37ca4b2d9233293c8891ac9ef48bda9162e00c7b2fdf2a28405ec8e712d36
|
File details
Details for the file paperhub-0.1.3-py3-none-any.whl.
File metadata
- Download URL: paperhub-0.1.3-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b39627ddc4ea37c69d60edc5c4a82a9ec4298068a3d61474929dff00bf082be9
|
|
| MD5 |
c0d2f6fcc87a663b34c143495a4418db
|
|
| BLAKE2b-256 |
433514ca9452f62c9a61d449a7a22794df0e3d0915748e3222f126b5dfd05f6b
|