Skip to main content

CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.

Project description

research-hub

Zotero + Obsidian + NotebookLM, wired together for AI agents.

PyPI Tests Python License: MIT

繁體中文說明 → README.zh-TW.md

Dashboard Overview


What this is

A CLI + MCP server that does three things at once:

  1. Ingest academic papers into Zotero (citations) + Obsidian (structured notes) + NotebookLM (briefings) — one command.
  2. Organize papers into clusters, sub-topics, and an Obsidian graph coloured by research label.
  3. Serve 56 MCP tools so Claude Code / Codex / any MCP-compatible AI can drive the whole thing.

Built for PhD students and research teams who already use AI agents daily and don't want to context-switch between six tabs.

Source code vs vault

research-hub has two separate locations on your computer. This is intentional:

Source code Vault
What The Python package + CLI Your research data
Where site-packages/research_hub/ (managed by pip) ~/knowledge-base/ (default, you choose during init)
Contains CLI, MCP server, dashboard renderer Paper notes, Obsidian graph, crystals, Zotero sync
Shared? Yes — same package for every user No — each user has their own vault

After pip install, run research-hub init to create your vault. If you already have an Obsidian vault, point init at it — research-hub adds its folders alongside your existing notes without overwriting anything.

Run research-hub where at any time to see exactly where your config and vault live.

What makes it different

1. Crystals — pre-computed answers, not lazy retrieval (v0.28)

Every RAG system, including Karpathy's "LLM wiki", still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.

For each research cluster, you generate ~10 canonical Q→A "crystals" once (via emit/apply, using any LLM you like). When an AI agent asks "what's the SOTA in X?", it reads a pre-written 100-word paragraph — not 20 paper abstracts.

research-hub crystal emit --cluster llm-agents-software-engineering > prompt.md
# Feed prompt.md to Claude/GPT/Gemini, save answer as crystals.json
research-hub crystal apply --cluster llm-agents-software-engineering --scored crystals.json

Token cost per cluster-level query: ~1 KB (crystal read) vs ~30 KB (cluster digest). 30× compression without losing quality, because the quality was pre-computed.

→ Why this is not RAG

2. Live dashboard with direct execution (v0.27)

research-hub serve --dashboard

Opens a localhost HTTP dashboard at http://127.0.0.1:8765/. Every Manage-tab button directly executes the CLI command instead of copying to clipboard. Vault changes push to the browser via Server-Sent Events. Fallback to static clipboard mode when the server isn't running.

Live dashboard

3. Obsidian graph auto-coloured by label (v0.27)

research-hub vault graph-colors --refresh

Writes 14 colour groups to .obsidian/graph.json: 5 per cluster path + 9 per paper label (seed, core, method, benchmark, survey, application, tangential, deprecated, archived). Every research-hub dashboard run auto-refreshes them. Open Obsidian Graph View — your vault is visually structured by meaning, not just file tree.

Obsidian Graph coloured by label

4. Sub-topic-aware Library + citation-graph cluster split (v0.27)

Big clusters (331 papers?) don't render as a flat list anymore. They're grouped by sub-topic, each expandable. And if your cluster has no sub-topics yet:

research-hub clusters analyze --cluster my-big-cluster --split-suggestion

Uses Semantic Scholar citation graph + networkx community detection to suggest 3-8 coherent sub-topics. Writes a markdown report you review before running topic apply-assignments.

Library tab with sub-topics


Install

pip install research-hub-pipeline
research-hub init              # interactive config + vault layout
research-hub serve --dashboard # opens browser

Python 3.10+. No OpenAI/Anthropic API key required — research-hub is provider-agnostic (all AI generation uses emit/apply pattern; you feed prompts to your own AI).

For Claude Code / Claude Desktop users

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "research-hub": {
      "command": "research-hub",
      "args": ["serve"]
    }
  }
}

Then talk to Claude:

"Claude, add arxiv 2310.06770 to a new cluster called LLM-SE" "Claude, generate crystals for the LLM-SE cluster" "Claude, what's this cluster about?" → Claude calls list_crystals + read_crystal → gets the pre-written 100-word answer

56 MCP tools cover: paper ingest, cluster CRUD, labels, quotes, draft composition, citation graph, NotebookLM, crystal generation, fit-check, autofill, and cluster memory.

Quickstart (5 commands)

# 1. Initialize vault
research-hub init

# 2. Ingest one paper
research-hub add 10.48550/arxiv.2310.06770 --cluster llm-agents

# 3. Open live dashboard
research-hub serve --dashboard

# 4. Generate crystals once you have a few papers
research-hub crystal emit --cluster llm-agents > prompt.md
# (feed prompt.md to your AI, save response as crystals.json)
research-hub crystal apply --cluster llm-agents --scored crystals.json

# 5. Ask your AI questions — it reads crystals, not papers
# (via Claude Desktop MCP, or any MCP-compatible client)

Status

  • Latest: v0.38.0 (2026-04-18)
  • Tests: 1369 passing, 15 skipped, 3 xfailed
  • Platforms: Windows, macOS, Linux
  • Python: 3.10+
  • Dependencies: pyzotero, pyyaml, requests, rapidfuzz, networkx, platformdirs (all pure-Python)
  • Optional: playwright extra for NotebookLM browser automation

Architecture docs

Workflow reference

Stage Command What it does
Init init / doctor First-time config + health check
Find search / verify / discover new Multi-backend paper search + DOI resolution + AI-scored discovery
Ingest add / ingest One-shot or bulk paper ingest into Zotero + Obsidian
Organize clusters new/list/show/bind/merge/split/rename/delete Cluster CRUD
Topic topic scaffold/propose/assign/build Sub-topic notes from subtopics: frontmatter
Label label / find --label / paper prune Canonical label vocabulary (seed/core/method/...)
Crystal crystal emit/apply/list/read/check Pre-computed canonical Q→A answers
Analyze clusters analyze --split-suggestion Citation-graph community detection for big clusters
Sync sync status / pipeline repair Detect + repair Zotero ↔ Obsidian drift
Dashboard dashboard / serve --dashboard / vault graph-colors Static HTML or live HTTP server + auto-refresh Obsidian graph
NotebookLM notebooklm bundle/upload/generate/download Browser-automated NLM flows (CDP attach)
Write quote / compose-draft / cite Quote capture, markdown draft assembly, BibTeX export

Four personas

Persona Install Zotero? Best for
Researcher (default) pip install research-hub-pipeline[playwright] Yes PhD STEM literature reviews
Analyst research-hub init --persona analyst No — Obsidian only Industry research, white papers, technical docs
Humanities research-hub init + quote-first workflow Yes Books, talks, archives, quote-heavy reading
Internal KM research-hub init --persona analyst + import-folder No — Obsidian only Policies, mixed internal docs, vendor notes

All 4 personas now share the same dashboard, MCP server, and crystal system, and the test matrix covers cluster-integrity flows across each persona.

For developers

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q  # 1369 passing

Package name on PyPI: research-hub-pipeline CLI entry point: research-hub

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.38.0.tar.gz (18.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

research_hub_pipeline-0.38.0-py3-none-any.whl (328.7 kB view details)

Uploaded Python 3

File details

Details for the file research_hub_pipeline-0.38.0.tar.gz.

File metadata

  • Download URL: research_hub_pipeline-0.38.0.tar.gz
  • Upload date:
  • Size: 18.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for research_hub_pipeline-0.38.0.tar.gz
Algorithm Hash digest
SHA256 31f45be35139229644d15ca044300c334be274a3f1bf9cf600bbf1e664bbb68d
MD5 f10501f2b2c367058c4058a59388cb12
BLAKE2b-256 91adf13ec824063b1f337dbdcc255a1cdda13548d5f2eee8f972fdb380630bb8

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.38.0-py3-none-any.whl.

File metadata

File hashes

Hashes for research_hub_pipeline-0.38.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ff3edf51f66501946d84d4fda8bbab97ae65e8cb2b49e1a831227c2968842f3
MD5 8fc7a116c2e9a1a7b4f35adf6523b1fb
BLAKE2b-256 4998ba5293d3c11fd66cc0682963fb255e11211ae489108d6775fa5b7c949ed3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page