CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.
Project description
research-hub
Zotero + Obsidian + NotebookLM, wired together for AI agents.
繁體中文說明 → README.zh-TW.md
What this is
A CLI + MCP server that does three things at once:
- Ingest academic papers into Zotero (citations) + Obsidian (structured notes) + NotebookLM (briefings) — one command.
- Organize papers into clusters, sub-topics, and an Obsidian graph coloured by research label.
- Serve 52 MCP tools so Claude Code / Codex / any MCP-compatible AI can drive the whole thing.
Built for PhD students and research teams who already use AI agents daily and don't want to context-switch between six tabs.
Source code vs vault
research-hub has two separate locations on your computer. This is intentional:
| Source code | Vault | |
|---|---|---|
| What | The Python package + CLI | Your research data |
| Where | site-packages/research_hub/ (managed by pip) |
~/knowledge-base/ (default, you choose during init) |
| Contains | CLI, MCP server, dashboard renderer | Paper notes, Obsidian graph, crystals, Zotero sync |
| Shared? | Yes — same package for every user | No — each user has their own vault |
After pip install, run research-hub init to create your vault. If you already have an Obsidian vault, point init at it — research-hub adds its folders alongside your existing notes without overwriting anything.
Run research-hub where at any time to see exactly where your config and vault live.
What makes it different
1. Crystals — pre-computed answers, not lazy retrieval (v0.28)
Every RAG system, including Karpathy's "LLM wiki", still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.
For each research cluster, you generate ~10 canonical Q→A "crystals" once (via emit/apply, using any LLM you like). When an AI agent asks "what's the SOTA in X?", it reads a pre-written 100-word paragraph — not 20 paper abstracts.
research-hub crystal emit --cluster llm-agents-software-engineering > prompt.md
# Feed prompt.md to Claude/GPT/Gemini, save answer as crystals.json
research-hub crystal apply --cluster llm-agents-software-engineering --scored crystals.json
Token cost per cluster-level query: ~1 KB (crystal read) vs ~30 KB (cluster digest). 30× compression without losing quality, because the quality was pre-computed.
2. Live dashboard with direct execution (v0.27)
research-hub serve --dashboard
Opens a localhost HTTP dashboard at http://127.0.0.1:8765/. Every Manage-tab button directly executes the CLI command instead of copying to clipboard. Vault changes push to the browser via Server-Sent Events. Fallback to static clipboard mode when the server isn't running.
3. Obsidian graph auto-coloured by label (v0.27)
research-hub vault graph-colors --refresh
Writes 14 colour groups to .obsidian/graph.json: 5 per cluster path + 9 per paper label (seed, core, method, benchmark, survey, application, tangential, deprecated, archived). Every research-hub dashboard run auto-refreshes them. Open Obsidian Graph View — your vault is visually structured by meaning, not just file tree.
4. Sub-topic-aware Library + citation-graph cluster split (v0.27)
Big clusters (331 papers?) don't render as a flat list anymore. They're grouped by sub-topic, each expandable. And if your cluster has no sub-topics yet:
research-hub clusters analyze --cluster my-big-cluster --split-suggestion
Uses Semantic Scholar citation graph + networkx community detection to suggest 3-8 coherent sub-topics. Writes a markdown report you review before running topic apply-assignments.
Install
pip install research-hub-pipeline
research-hub init # interactive config + vault layout
research-hub serve --dashboard # opens browser
Python 3.10+. No OpenAI/Anthropic API key required — research-hub is provider-agnostic (all AI generation uses emit/apply pattern; you feed prompts to your own AI).
For Claude Code / Claude Desktop users
Add to claude_desktop_config.json:
{
"mcpServers": {
"research-hub": {
"command": "research-hub",
"args": ["serve"]
}
}
}
Then talk to Claude:
"Claude, add arxiv 2310.06770 to a new cluster called LLM-SE" "Claude, generate crystals for the LLM-SE cluster" "Claude, what's this cluster about?" → Claude calls
list_crystals+read_crystal→ gets the pre-written 100-word answer
52 MCP tools cover: paper ingest, cluster CRUD, labels, quotes, draft composition, citation graph, NotebookLM, crystal generation, fit-check, autofill.
Quickstart (5 commands)
# 1. Initialize vault
research-hub init
# 2. Ingest one paper
research-hub add 10.48550/arxiv.2310.06770 --cluster llm-agents
# 3. Open live dashboard
research-hub serve --dashboard
# 4. Generate crystals once you have a few papers
research-hub crystal emit --cluster llm-agents > prompt.md
# (feed prompt.md to your AI, save response as crystals.json)
research-hub crystal apply --cluster llm-agents --scored crystals.json
# 5. Ask your AI questions — it reads crystals, not papers
# (via Claude Desktop MCP, or any MCP-compatible client)
Status
- Latest: v0.28.0 (2026-04-15)
- Tests: 1113 passing, 12 skipped, 5 xfail baselines (documented search-quality issues)
- Platforms: Windows, macOS, Linux
- Python: 3.10+
- Dependencies:
pyzotero,pyyaml,requests,rapidfuzz,networkx,platformdirs(all pure-Python) - Optional:
playwrightextra for NotebookLM browser automation
Architecture docs
- Anti-RAG crystals — why pre-computed Q→A beats retrieval
- Audit reports —
audit_v0.26.md,audit_v0.27.md,audit_v0.28.md - NotebookLM setup — CDP attach flow + troubleshooting
- Papers input schema — ingestion pipeline reference
Workflow reference
| Stage | Command | What it does |
|---|---|---|
| Init | init / doctor |
First-time config + health check |
| Find | search / verify / discover new |
Multi-backend paper search + DOI resolution + AI-scored discovery |
| Ingest | add / ingest |
One-shot or bulk paper ingest into Zotero + Obsidian |
| Organize | clusters new/list/show/bind/merge/split/rename/delete |
Cluster CRUD |
| Topic | topic scaffold/propose/assign/build |
Sub-topic notes from subtopics: frontmatter |
| Label | label / find --label / paper prune |
Canonical label vocabulary (seed/core/method/...) |
| Crystal | crystal emit/apply/list/read/check |
Pre-computed canonical Q→A answers |
| Analyze | clusters analyze --split-suggestion |
Citation-graph community detection for big clusters |
| Sync | sync status / pipeline repair |
Detect + repair Zotero ↔ Obsidian drift |
| Dashboard | dashboard / serve --dashboard / vault graph-colors |
Static HTML or live HTTP server + auto-refresh Obsidian graph |
| NotebookLM | notebooklm bundle/upload/generate/download |
Browser-automated NLM flows (CDP attach) |
| Write | quote / compose-draft / cite |
Quote capture, markdown draft assembly, BibTeX export |
Two personas
| Persona | Install | Zotero? | Best for |
|---|---|---|---|
| Researcher (default) | pip install research-hub-pipeline[playwright] |
Yes | PhD students, academic literature review |
| Analyst | research-hub init --persona analyst |
No — Obsidian only | Industry research, white papers, technical docs |
Both personas get the same dashboard, MCP server, and crystal system.
For developers
git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q # 1113 passing
Package name on PyPI: research-hub-pipeline CLI entry point: research-hub
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file research_hub_pipeline-0.29.0.tar.gz.
File metadata
- Download URL: research_hub_pipeline-0.29.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2733908ca3cfb0e29cef7e7b6fb77efd065e11c114fe556eb9185a4dc282a1b5
|
|
| MD5 |
92e05efdcccd351978b7c65bc52f7b6a
|
|
| BLAKE2b-256 |
f9cf7de55e7107c9eef9c4fac8993443b8957ce6d31155c5890fc4ba7d76655a
|
File details
Details for the file research_hub_pipeline-0.29.0-py3-none-any.whl.
File metadata
- Download URL: research_hub_pipeline-0.29.0-py3-none-any.whl
- Upload date:
- Size: 284.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee1c03c911fe1b4d9ae668e54a8c249ef05d7bc64f213746266b67625ce06b91
|
|
| MD5 |
6f9f2a02dcdeabc42e90517553c0dcb5
|
|
| BLAKE2b-256 |
3e16a5cc8233fc067dbc64c77bf7f9b722ea4e6020db793a5dc7da0c30dafcf7
|