Skip to main content

CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.

Project description

research-hub

Build your research cluster once. Ask AI about it thousands of times. Zotero + Obsidian + NotebookLM, wired together for AI agents.

PyPI Tests Python License: MIT CI: Linux · macOS · Windows

繁體中文說明 → README.zh-TW.md

Dashboard Overview


The 10-minute story

Say you want to get up to speed on "harness engineering for LLM agents" — a subfield that barely existed 6 months ago. Traditional workflow: search arXiv, skim abstracts, manually note key claims, fight Obsidian, occasionally wish you had a RAG. 2 hours.

research-hub workflow:

research-hub clusters new --query "LLM evaluation harness" --slug llm-evaluation-harness
research-hub search "language model evaluation harness" --to-papers-input \
    --cluster llm-evaluation-harness > papers.json
research-hub ingest --cluster llm-evaluation-harness --no-verify

3 minutes later your vault has 6 key papers with structured notes. Now push them to NotebookLM and pull back an auto-generated briefing:

research-hub notebooklm bundle   --cluster llm-evaluation-harness
research-hub notebooklm upload   --cluster llm-evaluation-harness   # Chrome CDP-attach, no API key
research-hub notebooklm generate --cluster llm-evaluation-harness --type brief
research-hub notebooklm download --cluster llm-evaluation-harness --type brief

Or click the same actions in research-hub serve --dashboard -> Manage on that cluster card. In live mode the buttons run the identical CLI flow, so there is no extra terminal step.

Your vault now has .research_hub/artifacts/llm-evaluation-harness/brief-*.txt — a ~300-character synthesis covering all 6 papers, generated by NotebookLM from the uploaded sources (no prompt engineering needed, no headless-browser hacks — it attaches to your existing Chrome session).

2 more minutes: generate the AI summary layer (crystals) and the structured entity/claim registry (memory):

research-hub crystal emit --cluster llm-evaluation-harness > prompt.md
# (paste prompt to Claude/GPT, save response as crystals.json)
research-hub crystal apply --cluster llm-evaluation-harness --scored crystals.json

research-hub memory emit  --cluster llm-evaluation-harness > mem-prompt.md
research-hub memory apply --cluster llm-evaluation-harness --scored memory.json

Now open Claude Desktop and ask:

You: "What's the current SOTA in LLM evaluation harness?" Claude (via MCP): calls read_crystal("llm-evaluation-harness", "sota-and-open-problems") → gets a pre-written 180-word answer with paper citations. ~1 KB read, 0 abstracts fetched at query time.

That pre-written answer is a crystal. You paid the reasoning cost once; every subsequent question is ~1 KB of cached analysis. See hub/llm-evaluation-harness/crystals/ in your vault for the 10 canonical Q&As generated above.


What makes it different

1. Crystals — pre-computed answers, not lazy retrieval (v0.28)

Every RAG system, including Karpathy's "LLM wiki", still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.

For each cluster you generate ~10 canonical Q→A crystals once, using any LLM you like. When an AI agent later asks "what's the SOTA in X?", it reads a pre-written paragraph — not 20 paper abstracts. Token cost per query: ~1 KB (crystal read) vs ~30 KB (cluster digest). 30× compression.

Because the quality was pre-computed, it doesn't degrade at query time. See the harness-engineering example crystals — one folder, 10 Q&As answering "what is this field?", "what are the main threads?", "where do experts disagree?", "what's SOTA?", etc.

→ Why this is not RAG

2. Structured memory layer — entities, claims, methods (v0.36)

Crystals store prose. Memory stores the underlying structure: named entities (benchmarks, models, concepts), typed claims with confidence + supporting papers, and method taxonomies. For the harness cluster:

hub/llm-evaluation-harness/memory.json
├── 14 entities  (vla-eval, SafeHarness, M*, LIBERO, SEC-bench, ...)
├── 12 claims    ("Harness is locus of progress", "Specialized beats generic +22%", ...)
└── 7 methods    (reflective code evolution, lifecycle-integrated defense, ...)

AI agents query entities via list_entities, claims via list_claims(min_confidence="high"), methods via list_methods. No RAG over prose — structured lookup over structured data.

3. 4 personas, 1 codebase, dashboard adapts (v0.38)

Same vault, 4 rendered dashboards:

Persona Install Dashboard vocabulary Hidden tabs
Researcher (PhD STEM, Zotero) pip install research-hub-pipeline[playwright,secrets] Cluster / Crystal / Paper / Citation graph (none)
Humanities (Zotero, quote-heavy) pip install research-hub-pipeline[playwright,secrets] Theme / Synthesis / Source (none)
Analyst (industry, no Zotero) pip install research-hub-pipeline[import,secrets] Topic / AI Brief / Document Diagnostics, Bind-Zotero
Internal KM (lab / company) pip install research-hub-pipeline[import,secrets] Project area / AI Brief / Document Diagnostics, Bind-Zotero

Side-by-side screenshots: docs/personas.md. Your first 10 minutes guide →

4. Live dashboard with direct execution (v0.27, expanded v0.42/v0.43/v0.44)

research-hub serve --dashboard

Localhost HTTP dashboard at http://127.0.0.1:8765/. Every Manage-tab button directly executes the CLI — no copy-paste.

5 tabs:

  • Overview — treemap + storage map + recent additions
  • Library — cluster cards with paper rows
  • Briefings — NotebookLM brief preview + artifact links
  • Diagnostics — health badges + drift alerts
  • Manage — per-cluster actions (rename / merge / split / NLM upload / NLM ask / polish-markdown / bases emit)

→ Full dashboard walkthrough

5. Cluster integrity + 100% orphan coverage (v0.37 + v0.39)

Papers drift, rebind v2 catches it. On the maintainer's 1063-orphan vault: 33% → 100% coverage via 8-heuristic chain + auto-create-from-folder proposals.

research-hub doctor                       # catches 12+ classes of drift
research-hub clusters rebind --emit       # proposes 80%+ assignments
research-hub clusters rebind --apply report.md --auto-create-new

→ 6 failure modes × 4 personas mitigation matrix


Install

# Researcher / Humanities (use Zotero + NotebookLM)
pip install research-hub-pipeline[playwright,secrets]

# Analyst / Internal KM (no Zotero, import local files)
pip install research-hub-pipeline[import,secrets]

research-hub init              # 4-option interactive persona prompt
research-hub serve --dashboard # opens browser

Python 3.10+. No OpenAI/Anthropic API key required — research-hub is provider-agnostic (all AI generation uses emit/apply pattern; you feed prompts to your own AI).

For Claude Code / Claude Desktop users

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "research-hub": {
      "command": "research-hub",
      "args": ["serve"]
    }
  }
}

60 MCP tools cover: paper ingest, cluster CRUD, labels, quotes, draft composition, citation graph, NotebookLM, crystal generation, fit-check, autofill, cluster memory, cluster rebind workflows.

Then talk to Claude:

"Claude, what's in my llm-evaluation-harness cluster?" → read_crystal("what-is-this-field") → 180-word answer "Claude, which claims have high confidence?" → list_claims(cluster="llm-evaluation-harness", min_confidence="high") → 10 structured claims with paper refs "Claude, add arxiv 2310.06770 to LLM-SE cluster" → add_paper(...) → Zotero + Obsidian + NotebookLM entries


Status

  • Latest: v0.41.0 (2026-04-19)
  • Tests: 1423 passing, 15 skipped, 3 xfailed (CI: Linux + Windows + macOS × Python 3.10/3.11/3.12)
  • Platforms: Windows, macOS, Linux
  • Python: 3.10+
  • Dependencies: pyzotero, pyyaml, requests, rapidfuzz, networkx, platformdirs (all pure-Python)
  • Optional: playwright extra for NotebookLM browser automation

Architecture docs

Workflow reference

Stage Command What it does
Init init / doctor First-time config + health check (doctor has 12+ checks, --autofix for mechanical backfills)
Find search / verify / discover new Multi-backend paper search + DOI resolution + AI-scored discovery
Ingest add / ingest / import-folder One-shot or bulk paper ingest into Zotero + Obsidian
Organize clusters new/list/show/bind/merge/split/rename/delete/rebind/scaffold-missing Cluster CRUD + 8-heuristic rebind + hub scaffolding
Topic topic scaffold/propose/assign/build Sub-topic notes from subtopics: frontmatter
Label label / find --label / paper prune / paper lookup-doi Canonical label vocabulary + Crossref DOI backfill
Crystal crystal emit/apply/list/read/check Pre-computed canonical Q→A answers
Memory memory emit/apply/list/read Structured entities/claims/methods registry
Analyze clusters analyze --split-suggestion Citation-graph community detection for big clusters
Sync sync status / pipeline repair Detect + repair Zotero ↔ Obsidian drift
Dashboard dashboard / serve --dashboard / vault graph-colors Static HTML or live HTTP server + auto-refresh Obsidian graph
NotebookLM notebooklm bundle/upload/generate/download Browser-automated NLM flows (CDP attach)
Write quote / compose-draft / cite Quote capture, markdown draft assembly, BibTeX export

For developers

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q  # 1423 passing

Contributing: see CONTRIBUTING.md. Reporting security issues: see SECURITY.md.

Package name on PyPI: research-hub-pipeline CLI entry point: research-hub

License

MIT. See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.44.0.tar.gz (18.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

research_hub_pipeline-0.44.0-py3-none-any.whl (350.0 kB view details)

Uploaded Python 3

File details

Details for the file research_hub_pipeline-0.44.0.tar.gz.

File metadata

  • Download URL: research_hub_pipeline-0.44.0.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.44.0.tar.gz
Algorithm Hash digest
SHA256 b01dae05cd7d2cfcb19fb4b00de13582e9778cd0608f77b76eee5554174b58e2
MD5 a95a6675c6911a5fb93ce0082bdceb10
BLAKE2b-256 eb74ea979e3422e6ca0cde516b3ba1eb6c771b8ddf0b9820903d1f53af2e8a6b

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.44.0-py3-none-any.whl.

File metadata

File hashes

Hashes for research_hub_pipeline-0.44.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3944b40c116d0abd41c143df65c1a1f5dfb7a369fbae7f6e20fbb0e1de31b1cf
MD5 f94ff6bef7bb5eb8aaa0c44fa4ee487c
BLAKE2b-256 2fb657ee11d818d38345a52d9bea5dbfaaebe2bc44a2095d677d86bc3b7b4a8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page