CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.

These details have not been verified by PyPI

Project links

Project description

research-hub

One sentence in. Cluster + papers + AI brief out. ~50 seconds. Zotero + Obsidian + NotebookLM, wired together for AI agents — no API key required.

繁體中文 → README.zh-TW.md

📋 Prerequisites (check these first)

Need	Why	How
Python 3.10+	All package code	`python --version`
Obsidian (free)	research-hub writes notes into a vault Obsidian renders	Download at obsidian.md
Google account with NotebookLM	Powers the brief generation	Visit notebooklm.google.com once and accept terms
Chrome	patchright drives your local Chrome (no separate API key)	Install Chrome — `init` will probe it
Zotero account + API key (researcher/humanities only)	Sync papers + PDFs across devices	zotero.org/settings/keys
(optional) `claude` / `codex` / `gemini` CLI	Powers `auto --with-crystals` for fully automated runs	Install whichever AI CLI you already use

research-hub init runs a first-run readiness check at the end that flags whichever of these is missing — no need to memorize the list.

⚡ Install + first run (60 seconds total)

pip install research-hub-pipeline[playwright,secrets]
research-hub init                                          # interactive: persona + Zotero/NLM + readiness check
research-hub notebooklm login                              # one-time Google sign-in
research-hub auto "harness engineering for LLM agents"     # done — 50s later you have 8 papers + a brief

Want fully automated end-to-end (search → ingest → NLM brief → cached AI answers)?

research-hub auto "harness engineering" --with-crystals    # auto-pipes through claude/codex/gemini CLI

Not sure what to ask for? Plan first, then act (v0.50):

research-hub plan "I want to learn about harness engineering"
# Prints: suggested topic, cluster, max_papers (auto-tuned for "thesis"/"learn" intents),
# warns about existing-cluster collisions, then prints the exact `auto` command to run.

When using Claude Desktop, just say "Claude, research X" and Claude will call plan_research_workflow first to confirm the plan with you before kicking off auto_research_topic.

If a supported LLM CLI is on your PATH, --with-crystals runs the crystal generation step automatically. If not, the prompt is saved to .research_hub/artifacts/<slug>/crystal-prompt.md and the Next Steps banner tells you exactly what to paste where.

🎬 30-second demo (real terminal output, not a mock-up)

This is the actual output from running research-hub auto "LLM agents agent-based modeling social simulation" --with-crystals on the maintainer's Windows zh-TW vault during the v0.49.5 verification (full record in CHANGELOG.md v0.49.4):

$ research-hub auto "LLM agents agent-based modeling social simulation" --with-crystals
[OK] cluster        created: llm-agents-agent-based-modeling-social
[OK] zotero.bind    created collection 9FHZCK4N for llm-agents-agent-based-modeling-social
[OK] search         8 results
[OK] ingest         8 papers in raw/llm-agents-agent-based-modeling-social/
[OK] nlm.bundle     7 PDFs (24 MB)
[OK] nlm.upload     8 succeeded
[OK] nlm.generate   brief generation triggered
[OK] nlm.download   1893 chars saved
[OK] crystals       10 crystals via claude

============================================================
Done in 187s. Cluster: llm-agents-agent-based-modeling-social
============================================================
  NotebookLM: https://notebooklm.google.com/notebook/99866b50-3b71-4d84-9e19-7682bbc85e2d
  Brief:      .research_hub/artifacts/.../brief-20260420T020640Z.txt

Next steps (copy-paste any of these):

  # Read the cached SOTA answer (~1 KB, no LLM call)
  research-hub crystal read --cluster llm-agents-agent-based-modeling-social \
                            --slug sota-and-open-problems

  # Ad-hoc Q&A against the uploaded notebook
  research-hub ask llm-agents-agent-based-modeling-social "what's the main risk?"

  # Or talk to Claude Desktop with the research-hub MCP installed:
  > "Claude, what's in my llm-agents-agent-based-modeling-social cluster?"

What that single command produced:

Artifact	Where	Size
8 paper PDFs	Zotero collection `9FHZCK4N` (auto-created)	24 MB
8 Obsidian notes with frontmatter	`raw/llm-agents-agent-based-modeling-social/`	8 × ~3 KB
NotebookLM notebook with all 8 sources	google.com/notebook/99866b50-...	—
AI brief (downloaded)	`.research_hub/artifacts/.../brief-*.txt`	1.9 KB
10 cached canonical Q→A crystals	`hub/llm-agents-agent-based-modeling-social/crystals/`	10 × ~4 KB

After this 187 s run, every subsequent question against this cluster reads a cached crystal in under a second — no LLM call, no API quota burn.

Two ways to drive it after install:

Path	What you do	What runs under the hood
🤖 Talk to Claude (recommended)	"Claude, research harness engineering for me"	Claude calls `auto_research_topic(...)` via MCP — one tool call
💻 One-line CLI	`research-hub auto "topic"`	Same orchestrator, called directly
🖱 Click in dashboard	`research-hub serve --dashboard` → Manage tab	Same actions, button-driven

All three drive the same orchestrator. Pick whichever your hands are on.

🤖 Talk to Claude — 30-second setup

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "research-hub": {
      "command": "research-hub",
      "args": ["serve"]
    }
  }
}

Restart Claude Desktop. Then:

You: "Claude, find me 5 papers on agent-based modeling and put them in a notebook." Claude: calls auto_research_topic(topic="agent-based modeling", max_papers=5) → 5 papers ingested + NotebookLM brief URL — ~50 s.

You: "What's the SOTA in my llm-evaluation-harness cluster?" Claude: calls read_crystal("llm-evaluation-harness", "sota-and-open-problems") → 180-word pre-written answer with citations. ~1 KB read, 0 abstracts fetched at query time.

81 MCP tools in total — full reference: docs/mcp-tools.md. The big ones:

Tool	What it replaces
`auto_research_topic(topic)`	7-step CLI flow (search → ingest → bundle → upload → generate → download)
`cleanup_garbage(everything=True)`	`du -sh .research_hub/bundles/*` + manual `rm -rf`
`tidy_vault()`	`doctor --autofix` + `dedup rebuild` + `bases emit --force` + cleanup preview
`ask_cluster_notebooklm(cluster, question)`	Open NotebookLM tab, paste question, copy answer
`read_crystal(cluster, slot)`	Re-read 20 paper abstracts to answer the same question again
`list_claims(cluster, min_confidence)`	Skim hub overview hoping a claim is in the right paragraph
`add_paper(arxiv_id, cluster)`	Manual Zotero add → manual Obsidian note → manual NotebookLM upload

📊 At a glance — every feature in one table

Capability	Command (or MCP tool)	Notes
Lazy mode — one sentence in, brief out	`auto "topic"` / `auto_research_topic`	search → ingest → NLM brief in ~50s
Lazy maintenance	`tidy` / `tidy_vault`	doctor + dedup + bases + cleanup preview
GC accumulated junk	`cleanup --all --apply` / `cleanup_garbage`	bundles + debug logs + stale artifacts
Ad-hoc NLM Q&A	`ask --cluster X "Q?"` / `ask_cluster_notebooklm`	dual backend (NLM + crystal cache)
Pre-computed crystals	`crystal emit / apply`	10 canonical Q→A per cluster, ~1 KB/answer
Structured memory	`memory emit / apply` + `list_entities/claims/methods`	typed entities, claims with confidence, method taxonomies
Live dashboard	`serve --dashboard`	6 tabs, persona-aware, Manage tab buttons execute CLI directly
4 personas, 1 codebase	`RESEARCH_HUB_PERSONA=researcher\|humanities\|analyst\|internal`	vocabulary + hidden tabs adapt
100% orphan coverage	`clusters rebind --emit` then `--apply`	8-heuristic chain, auto-create-from-folder proposals
Health checks (12+)	`doctor` / `doctor --autofix`	mechanical backfills, patchright Chrome probe
Multi-backend search	`search "query"`	arXiv + Semantic Scholar (default) + Crossref DOI lookup
Cluster autosplit	`clusters analyze --split-suggestion`	networkx greedy modularity on citation graph
Obsidian Bases dashboard	`bases emit` / `emit_cluster_base`	auto-generated `.base` per cluster (auto-refreshes on ingest)
NotebookLM upload	`notebooklm upload --cluster X`	patchright + persistent Chrome (no API key, no quota)
Citation graph	`vault graph-colors`	networkx + Obsidian graph view colors
Local file ingest	`import-folder /path`	PDF / DOCX / MD / TXT / URL (analyst persona)

→ Full lazy-mode guide · → All commands · → MCP reference

🖥 What the dashboard looks like

research-hub serve --dashboard opens http://127.0.0.1:8765/ — six tabs, all driven by the same data your CLI sees.



Overview — treemap + storage map + recent feed + crystals coverage	Library — clusters drilled into sub-topics + per-paper rows

Briefings — NotebookLM brief preview + artifact links	Diagnostics — health badges + drift alerts (grouped by kind in v0.48)

Manage — every CLI action as a button (rename / merge / split / NLM upload / ask / polish-markdown / bases emit)	Writing — quote capture + draft composer + BibTeX export

→ Dashboard walkthrough · → All 4 persona variants

🧠 What makes it different

1. Pre-computed answers, not lazy retrieval

Every RAG system still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.

For each cluster you generate ~10 canonical Q→A crystals once with any LLM. Later queries read a pre-written paragraph (~1 KB), not 20 paper abstracts (~30 KB) — 30× compression with quality that doesn't degrade at query time. Underneath, a structured memory layer holds the entities, typed claims with confidence, and method taxonomies that crystals reference. AI agents query via list_entities, list_claims(min_confidence="high"), list_methods — no RAG over prose, structured lookup over structured data.

Example cluster from the maintainer's vault: hub/llm-evaluation-harness/ has 10 crystals + 14 entities + 12 claims + 7 methods, all generated once. After research-hub auto "harness engineering" --with-crystals your own vault will look the same. → Why this is not RAG

2. Three control surfaces, one orchestrator

CLI, dashboard buttons, and MCP tools all call the same Python orchestrator. There is no "REST mode" or "API mode" with diverging behavior. Whatever you can do at the shell, Claude can do via MCP, and vice versa.

3. Provider-agnostic by design

No OpenAI / Anthropic API key required. All AI generation uses an emit / apply pattern: emit writes a self-contained prompt to stdout, you paste into your AI of choice (Claude, GPT, Gemini, local model), apply ingests the JSON response. NotebookLM browser automation uses your own logged-in Chrome — no quota, no per-token billing.

⚖️ How it compares to the alternatives

Honest, side-by-side. research-hub doesn't replace any of these — it stitches them together so an AI agent can drive them all.

What you can do	Zotero alone	NotebookLM alone	Generic RAG (LangChain etc.)	Obsidian-Zotero plugin	research-hub
Search arXiv + Semantic Scholar in one command	❌	❌	DIY	❌	✅ `auto "topic"`
One-shot ingest into Zotero and Obsidian and NotebookLM	❌	❌	DIY	partial (Z↔O only)	✅ `auto`
AI brief from your collection	❌	✅ (manual)	DIY	❌	✅ auto-generated
Cached canonical Q→A so the AI doesn't re-RAG every query	❌	❌	❌ (RAG re-fetches)	❌	✅ crystals (~1 KB/answer)
Structured memory layer (entities + typed claims + methods)	❌	❌	unstructured chunks	❌	✅ `list_entities/claims/methods`
Direct AI-agent control via MCP	❌	❌	DIY MCP server	❌	✅ 81 MCP tools
Live HTML dashboard with action buttons	❌	❌	❌	❌	✅ `serve --dashboard`
Auto-cluster papers + detect drift + auto-rebind orphans	❌	❌	❌	❌	✅ `clusters rebind`
Per-cluster Obsidian Bases dashboard	❌	❌	❌	❌	✅ `bases emit`
No API key required for AI	n/a	✅	❌	n/a	✅
Local-first vault you own	✅ (cloud-sync)	❌ (Google)	depends	✅	✅
Cost per 1000 queries	n/a	quota-limited	~$5–50 (token billing)	n/a	$0 (cached crystals)

The honest takeaway: research-hub is only worth it if you already use 2-of-3 Zotero / Obsidian / NotebookLM and want to AI-agentize the workflow. If you only use one, the simpler tools alone are fine.

📦 Install variants

# Researcher / Humanities (Zotero + NotebookLM)
pip install research-hub-pipeline[playwright,secrets]

# Analyst / Internal KM (no Zotero, import local files)
pip install research-hub-pipeline[import,secrets]

# Everything for development
pip install -e '.[dev,playwright,import,secrets,mcp]'

Python 3.10+. Optional npm install -g defuddle-cli for cleaner URL imports.

📚 Docs


First 10 minutes	Guided tour for each persona
Lazy-mode reference	The 4 one-sentence commands
Dashboard walkthrough	Tab-by-tab tour with persona recipes
MCP tools reference	All 81 tools, categorized + signatures
Personas	4 persona profiles + per-persona feature matrix
Cluster integrity	6 failure modes × 4 personas mitigation matrix
Anti-RAG / crystals	Why pre-computed Q→A beats retrieval
NotebookLM setup + troubleshooting	patchright + persistent Chrome (v0.42+)
Import folder	Local PDF/DOCX/MD/TXT/URL ingest
Papers input schema	Ingestion pipeline reference
Upgrade guide	Migrating from older versions
Audit reports	`audit_v0.26.md` … `audit_v0.45.md`
CHANGELOG	Per-version release notes

🩺 Troubleshooting (first-run problems)

Symptom	Cause	Fix
`research-hub init` says `chrome WARN patchright cannot launch Chrome`	Chrome not installed, or patchright cannot find it	Install Chrome from chrome.com; rerun `research-hub doctor` to re-probe
`research-hub notebooklm login` opens browser but Google blocks login	Headless / new device challenge	The browser is patchright (real Chrome) — click "Yes, it's me" on your phone, then complete login normally
`research-hub auto` fails at `search` step with `0 papers`	Topic too narrow, or arXiv/SemSch transient outage	Re-run with `--max-papers 20` or rephrase the topic; both backends are fault-tolerant
`research-hub auto` fails at `nlm.upload` with "Generation button not found"	NotebookLM UI changed, or you're not logged in	Run `research-hub notebooklm login` again; if persists, file an issue with the `nlm-debug-*.jsonl` from `.research_hub/`
`auto --with-crystals` falls back to "no LLM CLI on PATH"	Neither `claude`, `codex`, nor `gemini` CLI installed	Install whichever AI CLI you use; or generate crystals manually with `crystal emit` → paste → `crystal apply`
Claude Desktop doesn't see the MCP server	`claude_desktop_config.json` not in expected location	macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` · Windows: `%APPDATA%\Claude\claude_desktop_config.json` · restart Claude Desktop after editing
`init` reports `zotero WARN` but I don't use Zotero	Default persona is `researcher` which expects Zotero	Re-run `research-hub init --persona analyst` (or `internal`) — these personas skip Zotero entirely

For everything else: research-hub doctor --autofix repairs the common mechanical issues; the report tells you which subsystem to look at.

🛠 Status

Latest: v0.50.0 (2026-04-20) — intent planner: research-hub plan "..." + plan_research_workflow MCP tool turn freeform user intent into a confirmed plan before auto fires. See CHANGELOG.md.
Tests: 1552 passing on the fast suite (CI: Linux + Windows + macOS × Python 3.10/3.11/3.12 = 9 jobs)
MCP tools: 82 (v0.47 added auto / cleanup / tidy; v0.49 extended auto_research_topic with do_crystals / llm_cli; v0.50 added plan_research_workflow)
End-to-end verified: as of v0.49.5, the full lazy-mode flow — auto "topic" --with-crystals → search → ingest → NotebookLM brief → cached AI answers — is verified working on a Windows zh-TW machine with the real claude CLI. See CHANGELOG.md v0.49.4 for the full per-stage results table.
Dependencies: pyzotero, pyyaml, requests, rapidfuzz, networkx, platformdirs (all pure-Python)
Optional: [playwright] for NotebookLM, [import] for local file ingest, [secrets] for OS-keyring credential storage

👩‍💻 For developers

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q                     # 1552 passing

Contributing: CONTRIBUTING.md. Security: SECURITY.md.

Package on PyPI: research-hub-pipeline · CLI entry point: research-hub

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

May 26, 2026

0.91.1

May 17, 2026

0.91.0

May 16, 2026

0.90.0

May 15, 2026

0.89.2

May 15, 2026

0.89.1

May 15, 2026

0.89.0

May 14, 2026

0.88.15

May 14, 2026

0.88.14

May 14, 2026

0.88.13

May 14, 2026

0.88.12

May 14, 2026

0.88.11

May 14, 2026

0.88.10

May 14, 2026

0.88.9

May 14, 2026

0.88.8

May 14, 2026

0.88.7

May 13, 2026

0.88.6

May 13, 2026

0.88.5

May 13, 2026

0.88.4

May 13, 2026

0.88.3

May 13, 2026

0.88.2

May 13, 2026

0.88.1

May 13, 2026

0.88.0

May 13, 2026

0.87.2

May 13, 2026

0.87.1.1

May 13, 2026

0.87.1

May 13, 2026

0.87.0

May 13, 2026

0.86.3

May 13, 2026

0.73.0

Apr 30, 2026

0.72.0

Apr 29, 2026

0.71.2

Apr 29, 2026

0.71.1

Apr 29, 2026

0.71.0

Apr 28, 2026

0.70.1

Apr 28, 2026

0.70.0

Apr 28, 2026

0.69.0

Apr 27, 2026

0.68.5

Apr 28, 2026

0.68.4

Apr 28, 2026

0.68.3

Apr 26, 2026

0.68.2

Apr 26, 2026

0.68.0

Apr 26, 2026

0.67.0

Apr 26, 2026

0.66.1

Apr 25, 2026

0.66.0

Apr 25, 2026

0.64.2

Apr 24, 2026

0.60.0

Apr 21, 2026

0.58.0

Apr 21, 2026

0.56.0

Apr 21, 2026

0.55.0

Apr 21, 2026

0.54.0

Apr 20, 2026

0.53.2

Apr 20, 2026

0.53.1

Apr 20, 2026

0.53.0

Apr 20, 2026

0.52.0

Apr 20, 2026

0.51.0

Apr 20, 2026

0.50.1

Apr 20, 2026

This version

0.50.0

Apr 20, 2026

0.49.5

Apr 20, 2026

0.49.4

Apr 20, 2026

0.49.3

Apr 20, 2026

0.49.2

Apr 20, 2026

0.49.1

Apr 20, 2026

0.49.0

Apr 19, 2026

0.48.0

Apr 19, 2026

0.47.0

Apr 19, 2026

0.46.0

Apr 19, 2026

0.45.0

Apr 19, 2026

0.44.0

Apr 19, 2026

0.43.0

Apr 19, 2026

0.42.0

Apr 19, 2026

0.41.1

Apr 19, 2026

0.41.0

Apr 19, 2026

0.40.2

Apr 19, 2026

0.40.1

Apr 19, 2026

0.40.0

Apr 19, 2026

0.39.0

Apr 19, 2026

0.38.1

Apr 18, 2026

0.38.0

Apr 18, 2026

0.37.3

Apr 18, 2026

0.37.2

Apr 18, 2026

0.37.1

Apr 18, 2026

0.37.0

Apr 18, 2026

0.36.0

Apr 18, 2026

0.35.0

Apr 18, 2026

0.34.0

Apr 18, 2026

0.33.3

Apr 18, 2026

0.33.2

Apr 17, 2026

0.33.1

Apr 17, 2026

0.33.0

Apr 17, 2026

0.32.0

Apr 17, 2026

0.31.1

Apr 17, 2026

0.31.0

Apr 17, 2026

0.30.0

Apr 17, 2026

0.29.0

Apr 17, 2026

0.28.0

Apr 15, 2026

0.27.0

Apr 15, 2026

0.26.0

Apr 14, 2026

0.25.0

Apr 14, 2026

0.24.0

Apr 14, 2026

0.23.1

Apr 14, 2026

0.23.0

Apr 14, 2026

0.22.0

Apr 14, 2026

0.21.0

Apr 13, 2026

0.20.2

Apr 13, 2026

0.20.1

Apr 13, 2026

0.20.0

Apr 13, 2026

0.19.1

Apr 13, 2026

0.18.0

Apr 13, 2026

0.17.0

Apr 13, 2026

0.16.0

Apr 13, 2026

0.15.0

Apr 13, 2026

0.14.0

Apr 13, 2026

0.13.0

Apr 13, 2026

0.12.0

Apr 13, 2026

0.11.0

Apr 13, 2026

0.10.0

Apr 13, 2026

0.9.0

Apr 12, 2026

0.8.2

Apr 12, 2026

0.8.1

Apr 12, 2026

0.8.0

Apr 12, 2026

0.7.0

Apr 12, 2026

0.6.0

Apr 12, 2026

0.5.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.50.0.tar.gz (42.8 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

research_hub_pipeline-0.50.0-py3-none-any.whl (373.7 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file research_hub_pipeline-0.50.0.tar.gz.

File metadata

Download URL: research_hub_pipeline-0.50.0.tar.gz
Upload date: Apr 20, 2026
Size: 42.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.50.0.tar.gz
Algorithm	Hash digest
SHA256	`940fc4a4c0d982628d7570a2c017572287d16683acdcab122ec49315e59132db`
MD5	`46ecfedd90017c8464cee08a66461af1`
BLAKE2b-256	`0be9a0044c0023eb673c9a220e60148826d7943c4d06c2b38eeef24ad63e5eaf`

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.50.0-py3-none-any.whl.

File metadata

Download URL: research_hub_pipeline-0.50.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 373.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.50.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db80251b1ab9716190688087e8cd39bfcc6caf18f19c985be73732b9206d0668`
MD5	`12672d3dd355b91d13bb96aa79917ce2`
BLAKE2b-256	`688e8eae63be6949244e28ec99425dc5adb8ba4a24db3433a4fdb35f6bcf122b`

See more details on using hashes here.

research-hub-pipeline 0.50.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

research-hub

📋 Prerequisites (check these first)

⚡ Install + first run (60 seconds total)

🎬 30-second demo (real terminal output, not a mock-up)

🤖 Talk to Claude — 30-second setup

📊 At a glance — every feature in one table

🖥 What the dashboard looks like

🧠 What makes it different

1. Pre-computed answers, not lazy retrieval

2. Three control surfaces, one orchestrator

3. Provider-agnostic by design

⚖️ How it compares to the alternatives

📦 Install variants

📚 Docs

🩺 Troubleshooting (first-run problems)

🛠 Status

👩‍💻 For developers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes