Skip to main content

Fetch HuggingFace Daily Papers and produce Jupyter-friendly AI summaries — one paper, one agent.

Project description

PaperHub

PaperHub fetches HuggingFace Daily Papers by programmatic date filters, assigns each paper to its own AI summarization agent, and renders Jupyter-friendly summaries by default.

Two ways to use PaperHub:

  1. Terminal CLI (paperhub) — interactive launcher with a REPL interface. Primary mode.
  2. Python / Jupyter — import PaperHub and call hub.run(...) directly.

Install

python3 -m pip install -e ".[dev]"

Optional provider extras:

python3 -m pip install -e ".[anthropic]"  # adds the Anthropic client
python3 -m pip install -e ".[google]"     # adds the Google Gemini client

Configuration

PaperHub reads configuration from shell environment variables and from its own per-user config file. The CLI can save provider keys for you without touching a project-level .env file:

paperhub version
paperhub set-key openai
paperhub set-key anthropic
paperhub set-key google
paperhub check-llm
paperhub api-keys
paperhub config-path

Inside the interactive launcher, use /set-key openai. The config file lives under the OS-specific user config directory, for example ~/Library/Application Support/paperhub/.env on macOS. Environment variables still override values saved there.

After set-key, PaperHub immediately sends one tiny request to the selected provider/model and reports whether the key and LLM are working. You can repeat that check later with paperhub check-llm or /check-llm.

Variable Purpose
OPENAI_API_KEY Default provider key
ANTHROPIC_API_KEY Optional, used when --provider anthropic
GOOGLE_API_KEY Optional, used when --provider google
PAPERHUB_PROVIDER Override default provider (default: openai)
PAPERHUB_MODEL Optional global model override
PAPERHUB_OPENAI_MODEL OpenAI default model (default: gpt-5.4-mini)
PAPERHUB_OPENAI_REASONING_EFFORT OpenAI reasoning effort (default: xhigh)
PAPERHUB_ANTHROPIC_MODEL Anthropic default model (default: claude-haiku-4-5-20251001)
PAPERHUB_GOOGLE_MODEL Google default model (default: gemini-3-flash-preview)
PAPERHUB_CONCURRENCY Max concurrent paper agents (default: 5)
PAPERHUB_MAX_PDF_CHARS Truncation cap for PDF text (default: 60000)
PAPERHUB_CACHE_DIR Override the on-disk cache location

Terminal CLI (Primary)

Start the interactive launcher:

paperhub

The launcher opens a REPL with a status dashboard showing the current provider, model, API key state, date range, and top paper count. Use commands to configure and run:

/provider
/provider openai
/version
/model
/model gpt-5.4-mini
/model default
/language
/date 2026-05
/date 2026-05-15
/date 2026-W18
/date 2026-05-01 2026-05-31
/top 5
/metadata
/run
/set-key openai
/keys
/check-llm
/config-path
/api-keys
/quit

Date formats for /date

Example Period Description
/date 2026-05 month May 2026
/date 2026 year Full year 2026
/date 2026-05-15 day Single day
/date 2026-W18 week ISO week 18 of 2026
/date 2026-05-01 2026-05-31 custom Inclusive date range

/metadata fetches HuggingFace paper metadata only and does not call an LLM. If the selected provider key is missing, /run prints setup guidance. /api-keys shows key status and setup help. /check-llm sends a tiny live provider request and confirms that the selected key/model can respond.

You can also pass startup flags:

paperhub --provider anthropic
paperhub --model claude-haiku-4-5-20251001
paperhub --top-n 10

Python / Jupyter API

from datetime import date
from paperhub import PaperHub

hub = PaperHub(provider="openai")

# Month
hub.run(period="month", year=2026, month=5, top_n=10)

# Single day
hub.run(period="day", year=2026, month=5, day=1, top_n=5)

# ISO week
hub.run(period="week", year=2026, week=18, top_n=5)

# Full year
hub.run(period="year", year=2026, top_n=20)

# Custom range
hub.run(period="custom", start=date(2026, 4, 15), end=date(2026, 4, 30), top_n=15)

run returns a list[PaperSummary]; in non-Jupyter contexts pass display=False and call render_plain yourself if you do not need the Markdown side effect.

For a step-by-step notebook, open examples/03_jupyter_quickstart.ipynb.

Provider Selection

PaperHub(provider="openai")                             # OpenAI default model
PaperHub(provider="openai", model="gpt-5.4-mini")
PaperHub(provider="anthropic")                          # Anthropic default model
PaperHub(model="claude-haiku-4-5-20251001", provider="anthropic")
PaperHub(model="gemini-3-flash-preview", provider="google")

If model is omitted, PaperHub picks the selected provider's default model. Provider-specific config values such as PAPERHUB_OPENAI_MODEL override those defaults. PAPERHUB_MODEL remains available as a global override. OpenAI uses PAPERHUB_OPENAI_REASONING_EFFORT=xhigh by default; set it to an empty value to let the OpenAI API choose its model default.

Provider SDKs are imported lazily — installing paperhub includes OpenAI by default, and does not require Anthropic or Google packages unless you use those providers.

Caching

PaperHub caches metadata, PDF text, and summaries in the OS-specific user cache directory, for example ~/Library/Caches/paperhub/paperhub.sqlite on macOS and ~/.cache/paperhub/paperhub.sqlite on Linux. Override with PAPERHUB_CACHE_DIR. Summary entries are keyed by (arxiv_id, model, language), so swapping models or output language gives you a clean re-run while keeping the PDF download free.

A second invocation with the same papers and model:

  • Reuses the cached PDF text (no arXiv hit, no extraction).
  • Reuses the cached summary (no LLM call).

Tests, Lint, Typecheck, Build

python3 -m pytest          # unit tests, no live network or LLM keys needed
python3 -m ruff check .
python3 -m ruff format --check .
python3 -m mypy src tests
python3 -m build           # build wheel + sdist

The unit tests use a fake LLM client and httpx.MockTransport; no real provider keys are required.

Project Layout

src/paperhub/
  __init__.py           # PaperHub public API
  config.py             # Settings (pydantic-settings)
  models.py             # PaperMeta, PaperSummary, RunRequest
  dates.py              # period → (start, end)
  fetchers/             # HF JSON API (default) + HTML fallback
  pdf/                  # arXiv download + text extraction
  agents/               # LLMClient protocol + provider clients + PaperAgent
  orchestrator.py       # asyncio.Semaphore parallelism
  cache.py              # SQLite cache
  formatter.py          # Markdown / plain-text rendering
  interactive_cli.py    # `paperhub` interactive launcher
tests/                  # pytest suite, mocked HTTP and fake LLM
docs/ARCHITECTURE.md
docs/HOW_TO_START.md
docs/API_KEYS.md
examples/README.md
examples/03_jupyter_quickstart.ipynb

Troubleshooting

  • "No papers found": HuggingFace may not yet have published Daily Papers for that date. Use /metadata in the interactive launcher to check the fetcher without an LLM call.
  • PDF text comes back tiny: some arXiv PDFs use unusual layouts. PaperHub falls back to pdfplumber; if both extractors are short, the agent will pass through the abstract as the input text instead of failing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperhub-0.1.4.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperhub-0.1.4-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file paperhub-0.1.4.tar.gz.

File metadata

  • Download URL: paperhub-0.1.4.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for paperhub-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e297f63558a1688c9f558fbb3da19a73e738a4e16f90bb00736ca2c9548ba9dd
MD5 cb7f2740cd0c36dcfc82af7e26140ee6
BLAKE2b-256 7c204ee7f7d67b92735e771dc2145712b99ee722fb3fb3ba5b99541724050088

See more details on using hashes here.

File details

Details for the file paperhub-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: paperhub-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for paperhub-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d6c6be8d9d83e03ac51be8742a8b7037a407030e223b21cbeb01eae40986016a
MD5 ed0b70b612f067c6d806ab69011022f7
BLAKE2b-256 0f4b9e3441823cf5ade4c480a1c1bfc43ac16d6b11f249e57179188ba7d98b28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page