CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.
Project description
research-hub
Build your research cluster once. Ask AI about it thousands of times. Zotero + Obsidian + NotebookLM, wired together for AI agents.
繁體中文說明 → README.zh-TW.md
The 10-minute story
Say you want to get up to speed on "harness engineering for LLM agents" — a subfield that barely existed 6 months ago. Traditional workflow: search arXiv, skim abstracts, manually note key claims, fight Obsidian, occasionally wish you had a RAG. 2 hours.
research-hub workflow:
research-hub clusters new --query "LLM evaluation harness" --slug llm-evaluation-harness
research-hub search "language model evaluation harness" --to-papers-input \
--cluster llm-evaluation-harness > papers.json
research-hub ingest --cluster llm-evaluation-harness --no-verify
3 minutes later your vault has 6 key papers with structured notes. Now push them to NotebookLM and pull back an auto-generated briefing:
research-hub notebooklm bundle --cluster llm-evaluation-harness
research-hub notebooklm upload --cluster llm-evaluation-harness # Chrome CDP-attach, no API key
research-hub notebooklm generate --cluster llm-evaluation-harness --type brief
research-hub notebooklm download --cluster llm-evaluation-harness --type brief
Or click the same actions in research-hub serve --dashboard -> Manage on that cluster card. In live mode the buttons run the identical CLI flow, so there is no extra terminal step.
Your vault now has .research_hub/artifacts/llm-evaluation-harness/brief-*.txt — a ~300-character synthesis covering all 6 papers, generated by NotebookLM from the uploaded sources (no prompt engineering needed, no headless-browser hacks — it attaches to your existing Chrome session).
2 more minutes: generate the AI summary layer (crystals) and the structured entity/claim registry (memory):
research-hub crystal emit --cluster llm-evaluation-harness > prompt.md
# (paste prompt to Claude/GPT, save response as crystals.json)
research-hub crystal apply --cluster llm-evaluation-harness --scored crystals.json
research-hub memory emit --cluster llm-evaluation-harness > mem-prompt.md
research-hub memory apply --cluster llm-evaluation-harness --scored memory.json
Now open Claude Desktop and ask:
You: "What's the current SOTA in LLM evaluation harness?" Claude (via MCP): calls
read_crystal("llm-evaluation-harness", "sota-and-open-problems")→ gets a pre-written 180-word answer with paper citations. ~1 KB read, 0 abstracts fetched at query time.
That pre-written answer is a crystal. You paid the reasoning cost once; every subsequent question is ~1 KB of cached analysis. See hub/llm-evaluation-harness/crystals/ in your vault for the 10 canonical Q&As generated above.
What makes it different
1. Crystals — pre-computed answers, not lazy retrieval (v0.28)
Every RAG system, including Karpathy's "LLM wiki", still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.
For each cluster you generate ~10 canonical Q→A crystals once, using any LLM you like. When an AI agent later asks "what's the SOTA in X?", it reads a pre-written paragraph — not 20 paper abstracts. Token cost per query: ~1 KB (crystal read) vs ~30 KB (cluster digest). 30× compression.
Because the quality was pre-computed, it doesn't degrade at query time. See the harness-engineering example crystals — one folder, 10 Q&As answering "what is this field?", "what are the main threads?", "where do experts disagree?", "what's SOTA?", etc.
2. Structured memory layer — entities, claims, methods (v0.36)
Crystals store prose. Memory stores the underlying structure: named entities (benchmarks, models, concepts), typed claims with confidence + supporting papers, and method taxonomies. For the harness cluster:
hub/llm-evaluation-harness/memory.json
├── 14 entities (vla-eval, SafeHarness, M*, LIBERO, SEC-bench, ...)
├── 12 claims ("Harness is locus of progress", "Specialized beats generic +22%", ...)
└── 7 methods (reflective code evolution, lifecycle-integrated defense, ...)
AI agents query entities via list_entities, claims via list_claims(min_confidence="high"), methods via list_methods. No RAG over prose — structured lookup over structured data.
3. 4 personas, 1 codebase, dashboard adapts (v0.38)
Same vault, 4 rendered dashboards:
| Persona | Install | Dashboard vocabulary | Hidden tabs |
|---|---|---|---|
| Researcher (PhD STEM, Zotero) | pip install research-hub-pipeline[playwright,secrets] |
Cluster / Crystal / Paper / Citation graph | (none) |
| Humanities (Zotero, quote-heavy) | pip install research-hub-pipeline[playwright,secrets] |
Theme / Synthesis / Source | (none) |
| Analyst (industry, no Zotero) | pip install research-hub-pipeline[import,secrets] |
Topic / AI Brief / Document | Diagnostics, Bind-Zotero |
| Internal KM (lab / company) | pip install research-hub-pipeline[import,secrets] |
Project area / AI Brief / Document | Diagnostics, Bind-Zotero |
Side-by-side screenshots: docs/personas.md. Your first 10 minutes guide →
4. Live dashboard with direct execution (v0.27, expanded v0.42/v0.43/v0.44)
research-hub serve --dashboard
Localhost HTTP dashboard at http://127.0.0.1:8765/. Every Manage-tab button directly executes the CLI — no copy-paste.
5 tabs:
- Overview — treemap + storage map + recent additions
- Library — cluster cards with paper rows
- Briefings — NotebookLM brief preview + artifact links
- Diagnostics — health badges + drift alerts
- Manage — per-cluster actions (rename / merge / split / NLM upload / NLM ask / polish-markdown / bases emit)
5. Cluster integrity + 100% orphan coverage (v0.37 + v0.39)
Papers drift, rebind v2 catches it. On the maintainer's 1063-orphan vault: 33% → 100% coverage via 8-heuristic chain + auto-create-from-folder proposals.
research-hub doctor # catches 12+ classes of drift
research-hub clusters rebind --emit # proposes 80%+ assignments
research-hub clusters rebind --apply report.md --auto-create-new
→ 6 failure modes × 4 personas mitigation matrix
Install
# Researcher / Humanities (use Zotero + NotebookLM)
pip install research-hub-pipeline[playwright,secrets]
# Analyst / Internal KM (no Zotero, import local files)
pip install research-hub-pipeline[import,secrets]
research-hub init # 4-option interactive persona prompt
research-hub serve --dashboard # opens browser
Python 3.10+. No OpenAI/Anthropic API key required — research-hub is provider-agnostic (all AI generation uses emit/apply pattern; you feed prompts to your own AI).
For Claude Code / Claude Desktop users
Add to claude_desktop_config.json:
{
"mcpServers": {
"research-hub": {
"command": "research-hub",
"args": ["serve"]
}
}
}
60 MCP tools cover: paper ingest, cluster CRUD, labels, quotes, draft composition, citation graph, NotebookLM, crystal generation, fit-check, autofill, cluster memory, cluster rebind workflows.
Then talk to Claude:
"Claude, what's in my llm-evaluation-harness cluster?" →
read_crystal("what-is-this-field")→ 180-word answer "Claude, which claims have high confidence?" →list_claims(cluster="llm-evaluation-harness", min_confidence="high")→ 10 structured claims with paper refs "Claude, add arxiv 2310.06770 to LLM-SE cluster" →add_paper(...)→ Zotero + Obsidian + NotebookLM entries
Status
- Latest: v0.41.0 (2026-04-19)
- Tests: 1423 passing, 15 skipped, 3 xfailed (CI: Linux + Windows + macOS × Python 3.10/3.11/3.12)
- Platforms: Windows, macOS, Linux
- Python: 3.10+
- Dependencies:
pyzotero,pyyaml,requests,rapidfuzz,networkx,platformdirs(all pure-Python) - Optional:
playwrightextra for NotebookLM browser automation
Architecture docs
- Your first 10 minutes — guided tour for each of the 4 personas
- User personas — 4 persona profiles with per-persona feature matrix
- Cluster integrity — 6 failure modes + mitigation matrix across all 4 personas
- MCP tools reference — all 60 tools categorized + signatures
- Example Claude Desktop flow — worked example: ingest → crystallize → query
- Import folder — local file ingest for analyst persona (PDF/DOCX/MD/TXT/URL)
- Anti-RAG crystals — why pre-computed Q→A beats retrieval
- Upgrade guide — migrating from older versions
- Task-level workflows — v0.33+ 5 MCP wrappers (ask/brief/sync/compose/collect)
- Screenshot workflow — re-render any dashboard tab
- Audit reports —
audit_v0.26.md…audit_v0.41.md - NotebookLM setup — CDP attach flow + troubleshooting
- Papers input schema — ingestion pipeline reference
Workflow reference
| Stage | Command | What it does |
|---|---|---|
| Init | init / doctor |
First-time config + health check (doctor has 12+ checks, --autofix for mechanical backfills) |
| Find | search / verify / discover new |
Multi-backend paper search + DOI resolution + AI-scored discovery |
| Ingest | add / ingest / import-folder |
One-shot or bulk paper ingest into Zotero + Obsidian |
| Organize | clusters new/list/show/bind/merge/split/rename/delete/rebind/scaffold-missing |
Cluster CRUD + 8-heuristic rebind + hub scaffolding |
| Topic | topic scaffold/propose/assign/build |
Sub-topic notes from subtopics: frontmatter |
| Label | label / find --label / paper prune / paper lookup-doi |
Canonical label vocabulary + Crossref DOI backfill |
| Crystal | crystal emit/apply/list/read/check |
Pre-computed canonical Q→A answers |
| Memory | memory emit/apply/list/read |
Structured entities/claims/methods registry |
| Analyze | clusters analyze --split-suggestion |
Citation-graph community detection for big clusters |
| Sync | sync status / pipeline repair |
Detect + repair Zotero ↔ Obsidian drift |
| Dashboard | dashboard / serve --dashboard / vault graph-colors |
Static HTML or live HTTP server + auto-refresh Obsidian graph |
| NotebookLM | notebooklm bundle/upload/generate/download |
Browser-automated NLM flows (CDP attach) |
| Write | quote / compose-draft / cite |
Quote capture, markdown draft assembly, BibTeX export |
For developers
git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q # 1423 passing
Contributing: see CONTRIBUTING.md. Reporting security issues: see SECURITY.md.
Package name on PyPI: research-hub-pipeline CLI entry point: research-hub
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file research_hub_pipeline-0.45.0.tar.gz.
File metadata
- Download URL: research_hub_pipeline-0.45.0.tar.gz
- Upload date:
- Size: 18.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b1cdc086c2872a4ca220f31ab9e4da410a5ffafc7b5c4f30d85edcb9de70e74
|
|
| MD5 |
91d37b0dea9cdc673e5cf1585232f124
|
|
| BLAKE2b-256 |
e9adb8fb3f9e2afb15cdb1590e8b7dc3586c5f1cee076fb97fdc06316631b4ed
|
File details
Details for the file research_hub_pipeline-0.45.0-py3-none-any.whl.
File metadata
- Download URL: research_hub_pipeline-0.45.0-py3-none-any.whl
- Upload date:
- Size: 350.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58d40e002dc5ca2b199e61273d01a68fe99709af2aba4365ac49cdf23eeed11
|
|
| MD5 |
eb395f552af642bb8ecfad1742caa96f
|
|
| BLAKE2b-256 |
608c5d8261ef44c9e8a4f60f7dd4a324cc195247b47489018ebdfcfd8c2eea32
|