Skip to main content

CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.

Project description

research-hub

One sentence in. Cluster + papers + AI brief out. ~50 seconds. Zotero + Obsidian + NotebookLM, wired together for AI agents — no API key required.

PyPI Tests MCP tools Python License: MIT CI: Linux · macOS · Windows

繁體中文 → README.zh-TW.md


📋 Prerequisites (check these first)

Need Why How
Python 3.10+ All package code python --version
Obsidian (free) research-hub writes notes into a vault Obsidian renders Download at obsidian.md
Google account with NotebookLM Powers the brief generation Visit notebooklm.google.com once and accept terms
Chrome patchright drives your local Chrome (no separate API key) Install Chrome — init will probe it
Zotero account + API key (researcher/humanities only) Sync papers + PDFs across devices zotero.org/settings/keys
(optional) claude / codex / gemini CLI Powers auto --with-crystals for fully automated runs Install whichever AI CLI you already use

research-hub init runs a first-run readiness check at the end that flags whichever of these is missing — no need to memorize the list.


⚡ Install + first run (60 seconds total)

pip install research-hub-pipeline[playwright,secrets]
research-hub init                                          # interactive: persona + Zotero/NLM + readiness check
research-hub notebooklm login                              # one-time Google sign-in
research-hub auto "harness engineering for LLM agents"     # done — 50s later you have 8 papers + a brief

Want fully automated end-to-end (search → ingest → NLM brief → cached AI answers)?

research-hub auto "harness engineering" --with-crystals    # auto-pipes through claude/codex/gemini CLI

If a supported LLM CLI is on your PATH, --with-crystals runs the crystal generation step automatically. If not, the prompt is saved to .research_hub/artifacts/<slug>/crystal-prompt.md and the Next Steps banner tells you exactly what to paste where.

Two ways to drive it after install:

Path What you do What runs under the hood
🤖 Talk to Claude (recommended) "Claude, research harness engineering for me" Claude calls auto_research_topic(...) via MCP — one tool call
💻 One-line CLI research-hub auto "topic" Same orchestrator, called directly
🖱 Click in dashboard research-hub serve --dashboard → Manage tab Same actions, button-driven

All three drive the same orchestrator. Pick whichever your hands are on.


🤖 Talk to Claude — 30-second setup

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "research-hub": {
      "command": "research-hub",
      "args": ["serve"]
    }
  }
}

Restart Claude Desktop. Then:

You: "Claude, find me 5 papers on agent-based modeling and put them in a notebook." Claude: calls auto_research_topic(topic="agent-based modeling", max_papers=5) → 5 papers ingested + NotebookLM brief URL — ~50 s.

You: "What's the SOTA in my llm-evaluation-harness cluster?" Claude: calls read_crystal("llm-evaluation-harness", "sota-and-open-problems") → 180-word pre-written answer with citations. ~1 KB read, 0 abstracts fetched at query time.

81 MCP tools in total — full reference: docs/mcp-tools.md. The big ones:

Tool What it replaces
auto_research_topic(topic) 7-step CLI flow (search → ingest → bundle → upload → generate → download)
cleanup_garbage(everything=True) du -sh .research_hub/bundles/* + manual rm -rf
tidy_vault() doctor --autofix + dedup rebuild + bases emit --force + cleanup preview
ask_cluster_notebooklm(cluster, question) Open NotebookLM tab, paste question, copy answer
read_crystal(cluster, slot) Re-read 20 paper abstracts to answer the same question again
list_claims(cluster, min_confidence) Skim hub overview hoping a claim is in the right paragraph
add_paper(arxiv_id, cluster) Manual Zotero add → manual Obsidian note → manual NotebookLM upload

📊 At a glance — every feature in one table

Capability Command (or MCP tool) Notes
Lazy mode — one sentence in, brief out auto "topic" / auto_research_topic search → ingest → NLM brief in ~50s
Lazy maintenance tidy / tidy_vault doctor + dedup + bases + cleanup preview
GC accumulated junk cleanup --all --apply / cleanup_garbage bundles + debug logs + stale artifacts
Ad-hoc NLM Q&A ask --cluster X "Q?" / ask_cluster_notebooklm dual backend (NLM + crystal cache)
Pre-computed crystals crystal emit / apply 10 canonical Q→A per cluster, ~1 KB/answer
Structured memory memory emit / apply + list_entities/claims/methods typed entities, claims with confidence, method taxonomies
Live dashboard serve --dashboard 6 tabs, persona-aware, Manage tab buttons execute CLI directly
4 personas, 1 codebase RESEARCH_HUB_PERSONA=researcher|humanities|analyst|internal vocabulary + hidden tabs adapt
100% orphan coverage clusters rebind --emit then --apply 8-heuristic chain, auto-create-from-folder proposals
Health checks (12+) doctor / doctor --autofix mechanical backfills, patchright Chrome probe
Multi-backend search search "query" arXiv + Semantic Scholar (default) + Crossref DOI lookup
Cluster autosplit clusters analyze --split-suggestion networkx greedy modularity on citation graph
Obsidian Bases dashboard bases emit / emit_cluster_base auto-generated .base per cluster (auto-refreshes on ingest)
NotebookLM upload notebooklm upload --cluster X patchright + persistent Chrome (no API key, no quota)
Citation graph vault graph-colors networkx + Obsidian graph view colors
Local file ingest import-folder /path PDF / DOCX / MD / TXT / URL (analyst persona)

→ Full lazy-mode guide · → All commands · → MCP reference


🖥 What the dashboard looks like

research-hub serve --dashboard opens http://127.0.0.1:8765/ — six tabs, all driven by the same data your CLI sees.

Overview Library
Overview — treemap + storage map + recent feed + crystals coverage Library — clusters drilled into sub-topics + per-paper rows
Briefings Diagnostics
Briefings — NotebookLM brief preview + artifact links Diagnostics — health badges + drift alerts (grouped by kind in v0.48)
Manage Writing
Manage — every CLI action as a button (rename / merge / split / NLM upload / ask / polish-markdown / bases emit) Writing — quote capture + draft composer + BibTeX export

→ Dashboard walkthrough · → All 4 persona variants


🧠 What makes it different

1. Pre-computed answers, not lazy retrieval

Every RAG system still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.

For each cluster you generate ~10 canonical Q→A crystals once with any LLM. Later queries read a pre-written paragraph (~1 KB), not 20 paper abstracts (~30 KB) — 30× compression with quality that doesn't degrade at query time. Underneath, a structured memory layer holds the entities, typed claims with confidence, and method taxonomies that crystals reference. AI agents query via list_entities, list_claims(min_confidence="high"), list_methods — no RAG over prose, structured lookup over structured data.

Example cluster: hub/llm-evaluation-harness/ has 10 crystals + 14 entities + 12 claims + 7 methods. → Why this is not RAG

2. Three control surfaces, one orchestrator

CLI, dashboard buttons, and MCP tools all call the same Python orchestrator. There is no "REST mode" or "API mode" with diverging behavior. Whatever you can do at the shell, Claude can do via MCP, and vice versa.

3. Provider-agnostic by design

No OpenAI / Anthropic API key required. All AI generation uses an emit / apply pattern: emit writes a self-contained prompt to stdout, you paste into your AI of choice (Claude, GPT, Gemini, local model), apply ingests the JSON response. NotebookLM browser automation uses your own logged-in Chrome — no quota, no per-token billing.


📦 Install variants

# Researcher / Humanities (Zotero + NotebookLM)
pip install research-hub-pipeline[playwright,secrets]

# Analyst / Internal KM (no Zotero, import local files)
pip install research-hub-pipeline[import,secrets]

# Everything for development
pip install -e '.[dev,playwright,import,secrets,mcp]'

Python 3.10+. Optional npm install -g defuddle-cli for cleaner URL imports.


📚 Docs

First 10 minutes Guided tour for each persona
Lazy-mode reference The 4 one-sentence commands
Dashboard walkthrough Tab-by-tab tour with persona recipes
MCP tools reference All 81 tools, categorized + signatures
Personas 4 persona profiles + per-persona feature matrix
Cluster integrity 6 failure modes × 4 personas mitigation matrix
Anti-RAG / crystals Why pre-computed Q→A beats retrieval
NotebookLM setup + troubleshooting patchright + persistent Chrome (v0.42+)
Import folder Local PDF/DOCX/MD/TXT/URL ingest
Papers input schema Ingestion pipeline reference
Upgrade guide Migrating from older versions
Audit reports audit_v0.26.mdaudit_v0.45.md
CHANGELOG Per-version release notes

🩺 Troubleshooting (first-run problems)

Symptom Cause Fix
research-hub init says chrome WARN patchright cannot launch Chrome Chrome not installed, or patchright cannot find it Install Chrome from chrome.com; rerun research-hub doctor to re-probe
research-hub notebooklm login opens browser but Google blocks login Headless / new device challenge The browser is patchright (real Chrome) — click "Yes, it's me" on your phone, then complete login normally
research-hub auto fails at search step with 0 papers Topic too narrow, or arXiv/SemSch transient outage Re-run with --max-papers 20 or rephrase the topic; both backends are fault-tolerant
research-hub auto fails at nlm.upload with "Generation button not found" NotebookLM UI changed, or you're not logged in Run research-hub notebooklm login again; if persists, file an issue with the nlm-debug-*.jsonl from .research_hub/
auto --with-crystals falls back to "no LLM CLI on PATH" Neither claude, codex, nor gemini CLI installed Install whichever AI CLI you use; or generate crystals manually with crystal emit → paste → crystal apply
Claude Desktop doesn't see the MCP server claude_desktop_config.json not in expected location macOS: ~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json · restart Claude Desktop after editing
init reports zotero WARN but I don't use Zotero Default persona is researcher which expects Zotero Re-run research-hub init --persona analyst (or internal) — these personas skip Zotero entirely

For everything else: research-hub doctor --autofix repairs the common mechanical issues; the report tells you which subsystem to look at.


🛠 Status

  • Latest: v0.49.0 (2026-04-19) — auto Next Steps banner + --with-crystals LLM-CLI bridge + first-run readiness check, see CHANGELOG.md
  • Tests: 1537 passing on the fast suite (CI: Linux + Windows + macOS × Python 3.10/3.11/3.12 = 9 jobs)
  • MCP tools: 81 (v0.47 added auto / cleanup / tidy as MCP tools; v0.49 extended auto_research_topic with do_crystals / llm_cli)
  • Dependencies: pyzotero, pyyaml, requests, rapidfuzz, networkx, platformdirs (all pure-Python)
  • Optional: [playwright] for NotebookLM, [import] for local file ingest, [secrets] for OS-keyring credential storage

👩‍💻 For developers

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q                     # 1537 passing

Contributing: CONTRIBUTING.md. Security: SECURITY.md.

Package on PyPI: research-hub-pipeline · CLI entry point: research-hub

License

MIT. See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.49.5.tar.gz (42.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

research_hub_pipeline-0.49.5-py3-none-any.whl (366.6 kB view details)

Uploaded Python 3

File details

Details for the file research_hub_pipeline-0.49.5.tar.gz.

File metadata

  • Download URL: research_hub_pipeline-0.49.5.tar.gz
  • Upload date:
  • Size: 42.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.49.5.tar.gz
Algorithm Hash digest
SHA256 c8a68c14b93549d46f1458852284ccadad1ddf5ee23e4235dc74de288c27b942
MD5 d484b3ea994291a7a1d714ac78ef41f0
BLAKE2b-256 ed1cd1f9a2313b7a7612f76be27918d2e5edda1b2298f6935ca1a42bd3a4a92a

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.49.5-py3-none-any.whl.

File metadata

File hashes

Hashes for research_hub_pipeline-0.49.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9d531c8f0cc5052e6371d511eebd2fed45fd072457c22ff5d1568bf1fbb253d5
MD5 153bfa9c1e2a51bab97bbb4908dff53d
BLAKE2b-256 10ac85f55e442ece539f0abfafdee9be76ab698f7efbcca0e9505c38f62d2556

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page