CLI + MCP server for Zotero + Obsidian + NotebookLM research pipelines. Run `research-hub init` after install.

These details have not been verified by PyPI

Project links

Project description

research-hub

Build your research cluster once. Ask AI about it thousands of times. Zotero + Obsidian + NotebookLM, wired together for AI agents.

繁體中文說明 → README.zh-TW.md

Dashboard Overview

Install (under 60 seconds)

pip install research-hub-pipeline[playwright,secrets]
research-hub init                 # interactive: pick persona + Zotero/NLM
research-hub notebooklm login     # one-time Google sign-in for NotebookLM

Python 3.10+. Optional npm install -g defuddle-cli for cleaner URL imports.

The 1-minute story (v0.46 lazy mode)

research-hub auto "harness engineering for LLM agents"

That single command:

Slugifies the topic into a cluster
Searches arXiv + Semantic Scholar (8 papers default)
Ingests them into Zotero + Obsidian
Bundles + uploads to NotebookLM
Generates + downloads a brief into .research_hub/artifacts/<slug>/

~50 seconds end-to-end. No prompt engineering. No copy-paste between systems.

The 4 lazy commands you actually need (full guide):

research-hub auto "topic"                # find + save + brief in one command
research-hub ask  "question" --cluster X  # ad-hoc Q&A against an uploaded notebook
research-hub tidy                         # one-shot maintenance: doctor + dedup + bases + cleanup preview
research-hub cleanup --all --apply        # GC bundles + debug logs + old artifacts

Or click them as buttons in research-hub serve --dashboard → Manage tab (dashboard walkthrough).

The longhand version (when you want control)

If you want to inspect each step, the same flow expanded:

research-hub clusters new --query "LLM evaluation harness" --slug llm-evaluation-harness
research-hub search "language model evaluation harness" --to-papers-input \
    --cluster llm-evaluation-harness > papers.json
research-hub ingest --cluster llm-evaluation-harness --no-verify

research-hub notebooklm bundle   --cluster llm-evaluation-harness
research-hub notebooklm upload   --cluster llm-evaluation-harness   # patchright + persistent Chrome, no API key
research-hub notebooklm generate --cluster llm-evaluation-harness --type brief
research-hub notebooklm download --cluster llm-evaluation-harness --type brief

research-hub notebooklm ask --cluster llm-evaluation-harness \
    --question "What are the 3 main research threads?"

Your vault now has .research_hub/artifacts/llm-evaluation-harness/brief-*.txt — a synthesis of all uploaded papers, generated by NotebookLM (no API key needed — patchright drives your local Chrome).

2 more minutes: generate the AI summary layer (crystals) and the structured entity/claim registry (memory):

research-hub crystal emit --cluster llm-evaluation-harness > prompt.md
# (paste prompt to Claude/GPT, save response as crystals.json)
research-hub crystal apply --cluster llm-evaluation-harness --scored crystals.json

research-hub memory emit  --cluster llm-evaluation-harness > mem-prompt.md
research-hub memory apply --cluster llm-evaluation-harness --scored memory.json

Now open Claude Desktop and ask:

You: "What's the current SOTA in LLM evaluation harness?" Claude (via MCP): calls read_crystal("llm-evaluation-harness", "sota-and-open-problems") → gets a pre-written 180-word answer with paper citations. ~1 KB read, 0 abstracts fetched at query time.

That pre-written answer is a crystal. You paid the reasoning cost once; every subsequent question is ~1 KB of cached analysis. See hub/llm-evaluation-harness/crystals/ in your vault for the 10 canonical Q&As generated above.

What makes it different

1. Crystals — pre-computed answers, not lazy retrieval (v0.28)

Every RAG system, including Karpathy's "LLM wiki", still assembles context at query time. research-hub's answer: store the AI's reasoning, not the inputs.

For each cluster you generate ~10 canonical Q→A crystals once, using any LLM you like. When an AI agent later asks "what's the SOTA in X?", it reads a pre-written paragraph — not 20 paper abstracts. Token cost per query: ~1 KB (crystal read) vs ~30 KB (cluster digest). 30× compression.

Because the quality was pre-computed, it doesn't degrade at query time. See the harness-engineering example crystals — one folder, 10 Q&As answering "what is this field?", "what are the main threads?", "where do experts disagree?", "what's SOTA?", etc.

→ Why this is not RAG

2. Structured memory layer — entities, claims, methods (v0.36)

Crystals store prose. Memory stores the underlying structure: named entities (benchmarks, models, concepts), typed claims with confidence + supporting papers, and method taxonomies. For the harness cluster:

hub/llm-evaluation-harness/memory.json
├── 14 entities  (vla-eval, SafeHarness, M*, LIBERO, SEC-bench, ...)
├── 12 claims    ("Harness is locus of progress", "Specialized beats generic +22%", ...)
└── 7 methods    (reflective code evolution, lifecycle-integrated defense, ...)

AI agents query entities via list_entities, claims via list_claims(min_confidence="high"), methods via list_methods. No RAG over prose — structured lookup over structured data.

3. 4 personas, 1 codebase, dashboard adapts (v0.38)

Same vault, 4 rendered dashboards:

Persona	Install	Dashboard vocabulary	Hidden tabs
Researcher (PhD STEM, Zotero)	`pip install research-hub-pipeline[playwright,secrets]`	Cluster / Crystal / Paper / Citation graph	(none)
Humanities (Zotero, quote-heavy)	`pip install research-hub-pipeline[playwright,secrets]`	Theme / Synthesis / Source	(none)
Analyst (industry, no Zotero)	`pip install research-hub-pipeline[import,secrets]`	Topic / AI Brief / Document	Diagnostics, Bind-Zotero
Internal KM (lab / company)	`pip install research-hub-pipeline[import,secrets]`	Project area / AI Brief / Document	Diagnostics, Bind-Zotero

Side-by-side screenshots: docs/personas.md. Your first 10 minutes guide →

4. Live dashboard with direct execution (v0.27, expanded v0.42/v0.43/v0.44)

research-hub serve --dashboard

Localhost HTTP dashboard at http://127.0.0.1:8765/. Every Manage-tab button directly executes the CLI — no copy-paste.

5 tabs:

Overview — treemap + storage map + recent additions
Library — cluster cards with paper rows
Briefings — NotebookLM brief preview + artifact links
Diagnostics — health badges + drift alerts
Manage — per-cluster actions (rename / merge / split / NLM upload / NLM ask / polish-markdown / bases emit)

→ Full dashboard walkthrough

5. Cluster integrity + 100% orphan coverage (v0.37 + v0.39)

Papers drift, rebind v2 catches it. On the maintainer's 1063-orphan vault: 33% → 100% coverage via 8-heuristic chain + auto-create-from-folder proposals.

research-hub doctor                       # catches 12+ classes of drift
research-hub clusters rebind --emit       # proposes 80%+ assignments
research-hub clusters rebind --apply report.md --auto-create-new

→ 6 failure modes × 4 personas mitigation matrix

Install

# Researcher / Humanities (use Zotero + NotebookLM)
pip install research-hub-pipeline[playwright,secrets]

# Analyst / Internal KM (no Zotero, import local files)
pip install research-hub-pipeline[import,secrets]

research-hub init              # 4-option interactive persona prompt
research-hub serve --dashboard # opens browser

Python 3.10+. No OpenAI/Anthropic API key required — research-hub is provider-agnostic (all AI generation uses emit/apply pattern; you feed prompts to your own AI).

For Claude Code / Claude Desktop users

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "research-hub": {
      "command": "research-hub",
      "args": ["serve"]
    }
  }
}

78 MCP tools cover: paper ingest, cluster CRUD, labels, quotes, draft composition, citation graph, NotebookLM (upload/generate/download/ask), crystal generation, fit-check, autofill, cluster memory, cluster rebind workflows, Obsidian Bases dashboard generation (emit_cluster_base).

Then talk to Claude:

"Claude, what's in my llm-evaluation-harness cluster?" → read_crystal("what-is-this-field") → 180-word answer "Claude, which claims have high confidence?" → list_claims(cluster="llm-evaluation-harness", min_confidence="high") → 10 structured claims with paper refs "Claude, add arxiv 2310.06770 to LLM-SE cluster" → add_paper(...) → Zotero + Obsidian + NotebookLM entries

Status

Latest: v0.45.0 (2026-04-19) — see CHANGELOG.md
Tests: 1520 passing, 15 skipped, 2 xfailed (CI: Linux + Windows + macOS × Python 3.10/3.11/3.12)
Platforms: Windows, macOS, Linux
Python: 3.10+
Dependencies: pyzotero, pyyaml, requests, rapidfuzz, networkx, platformdirs (all pure-Python)
Optional: playwright extra for NotebookLM browser automation

Architecture docs

Your first 10 minutes — guided tour for each of the 4 personas
User personas — 4 persona profiles with per-persona feature matrix
Cluster integrity — 6 failure modes + mitigation matrix across all 4 personas
MCP tools reference — all 60 tools categorized + signatures
Example Claude Desktop flow — worked example: ingest → crystallize → query
Import folder — local file ingest for analyst persona (PDF/DOCX/MD/TXT/URL)
Anti-RAG crystals — why pre-computed Q→A beats retrieval
Upgrade guide — migrating from older versions
Task-level workflows — v0.33+ 5 MCP wrappers (ask/brief/sync/compose/collect)
Screenshot workflow — re-render any dashboard tab
Audit reports — audit_v0.26.md … audit_v0.45.md
NotebookLM setup + troubleshooting — patchright + persistent Chrome (v0.42+)
Dashboard walkthrough — tab-by-tab tour with persona-specific recipes (v0.44)
Validation log v0.43 — 11-paper NotebookLM stress test + dual-backend ask cross-check
Papers input schema — ingestion pipeline reference

Workflow reference

Stage	Command	What it does
Init	`init` / `doctor`	First-time config + health check (doctor has 12+ checks, `--autofix` for mechanical backfills)
Find	`search` / `verify` / `discover new`	Multi-backend paper search + DOI resolution + AI-scored discovery
Ingest	`add` / `ingest` / `import-folder`	One-shot or bulk paper ingest into Zotero + Obsidian
Organize	`clusters new/list/show/bind/merge/split/rename/delete/rebind/scaffold-missing`	Cluster CRUD + 8-heuristic rebind + hub scaffolding
Topic	`topic scaffold/propose/assign/build`	Sub-topic notes from `subtopics:` frontmatter
Label	`label` / `find --label` / `paper prune` / `paper lookup-doi`	Canonical label vocabulary + Crossref DOI backfill
Crystal	`crystal emit/apply/list/read/check`	Pre-computed canonical Q→A answers
Memory	`memory emit/apply/list/read`	Structured entities/claims/methods registry
Analyze	`clusters analyze --split-suggestion`	Citation-graph community detection for big clusters
Sync	`sync status` / `pipeline repair`	Detect + repair Zotero ↔ Obsidian drift
Dashboard	`dashboard` / `serve --dashboard` / `vault graph-colors`	Static HTML or live HTTP server (v0.44 Manage tab buttons drive the v0.42/v0.43 actions below)
NotebookLM	`notebooklm bundle/upload/generate/download/ask`	Browser-automated NLM flows (v0.42 patchright + persistent Chrome). `ask` does ad-hoc Q&A against the uploaded notebook
Obsidian	`vault polish-markdown` / `bases emit`	v0.42 callout/block-ID polish on paper notes. v0.43 auto-generated `.base` dashboard per cluster (auto-refreshes on `ingest`/`topic build` since v0.45)
Write	`quote` / `compose-draft` / `cite`	Quote capture, markdown draft assembly, BibTeX export

For developers

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q  # 1520 passing

Contributing: see CONTRIBUTING.md. Reporting security issues: see SECURITY.md.

Package name on PyPI: research-hub-pipeline CLI entry point: research-hub

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

May 26, 2026

0.91.1

May 17, 2026

0.91.0

May 16, 2026

0.90.0

May 15, 2026

0.89.2

May 15, 2026

0.89.1

May 15, 2026

0.89.0

May 14, 2026

0.88.15

May 14, 2026

0.88.14

May 14, 2026

0.88.13

May 14, 2026

0.88.12

May 14, 2026

0.88.11

May 14, 2026

0.88.10

May 14, 2026

0.88.9

May 14, 2026

0.88.8

May 14, 2026

0.88.7

May 13, 2026

0.88.6

May 13, 2026

0.88.5

May 13, 2026

0.88.4

May 13, 2026

0.88.3

May 13, 2026

0.88.2

May 13, 2026

0.88.1

May 13, 2026

0.88.0

May 13, 2026

0.87.2

May 13, 2026

0.87.1.1

May 13, 2026

0.87.1

May 13, 2026

0.87.0

May 13, 2026

0.86.3

May 13, 2026

0.73.0

Apr 30, 2026

0.72.0

Apr 29, 2026

0.71.2

Apr 29, 2026

0.71.1

Apr 29, 2026

0.71.0

Apr 28, 2026

0.70.1

Apr 28, 2026

0.70.0

Apr 28, 2026

0.69.0

Apr 27, 2026

0.68.5

Apr 28, 2026

0.68.4

Apr 28, 2026

0.68.3

Apr 26, 2026

0.68.2

Apr 26, 2026

0.68.0

Apr 26, 2026

0.67.0

Apr 26, 2026

0.66.1

Apr 25, 2026

0.66.0

Apr 25, 2026

0.64.2

Apr 24, 2026

0.60.0

Apr 21, 2026

0.58.0

Apr 21, 2026

0.56.0

Apr 21, 2026

0.55.0

Apr 21, 2026

0.54.0

Apr 20, 2026

0.53.2

Apr 20, 2026

0.53.1

Apr 20, 2026

0.53.0

Apr 20, 2026

0.52.0

Apr 20, 2026

0.51.0

Apr 20, 2026

0.50.1

Apr 20, 2026

0.50.0

Apr 20, 2026

0.49.5

Apr 20, 2026

0.49.4

Apr 20, 2026

0.49.3

Apr 20, 2026

0.49.2

Apr 20, 2026

0.49.1

Apr 20, 2026

0.49.0

Apr 19, 2026

0.48.0

Apr 19, 2026

0.47.0

Apr 19, 2026

This version

0.46.0

Apr 19, 2026

0.45.0

Apr 19, 2026

0.44.0

Apr 19, 2026

0.43.0

Apr 19, 2026

0.42.0

Apr 19, 2026

0.41.1

Apr 19, 2026

0.41.0

Apr 19, 2026

0.40.2

Apr 19, 2026

0.40.1

Apr 19, 2026

0.40.0

Apr 19, 2026

0.39.0

Apr 19, 2026

0.38.1

Apr 18, 2026

0.38.0

Apr 18, 2026

0.37.3

Apr 18, 2026

0.37.2

Apr 18, 2026

0.37.1

Apr 18, 2026

0.37.0

Apr 18, 2026

0.36.0

Apr 18, 2026

0.35.0

Apr 18, 2026

0.34.0

Apr 18, 2026

0.33.3

Apr 18, 2026

0.33.2

Apr 17, 2026

0.33.1

Apr 17, 2026

0.33.0

Apr 17, 2026

0.32.0

Apr 17, 2026

0.31.1

Apr 17, 2026

0.31.0

Apr 17, 2026

0.30.0

Apr 17, 2026

0.29.0

Apr 17, 2026

0.28.0

Apr 15, 2026

0.27.0

Apr 15, 2026

0.26.0

Apr 14, 2026

0.25.0

Apr 14, 2026

0.24.0

Apr 14, 2026

0.23.1

Apr 14, 2026

0.23.0

Apr 14, 2026

0.22.0

Apr 14, 2026

0.21.0

Apr 13, 2026

0.20.2

Apr 13, 2026

0.20.1

Apr 13, 2026

0.20.0

Apr 13, 2026

0.19.1

Apr 13, 2026

0.18.0

Apr 13, 2026

0.17.0

Apr 13, 2026

0.16.0

Apr 13, 2026

0.15.0

Apr 13, 2026

0.14.0

Apr 13, 2026

0.13.0

Apr 13, 2026

0.12.0

Apr 13, 2026

0.11.0

Apr 13, 2026

0.10.0

Apr 13, 2026

0.9.0

Apr 12, 2026

0.8.2

Apr 12, 2026

0.8.1

Apr 12, 2026

0.8.0

Apr 12, 2026

0.7.0

Apr 12, 2026

0.6.0

Apr 12, 2026

0.5.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.46.0.tar.gz (30.3 MB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

research_hub_pipeline-0.46.0-py3-none-any.whl (359.1 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file research_hub_pipeline-0.46.0.tar.gz.

File metadata

Download URL: research_hub_pipeline-0.46.0.tar.gz
Upload date: Apr 19, 2026
Size: 30.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.46.0.tar.gz
Algorithm	Hash digest
SHA256	`546f8f954c8856e55ad7b9e82a4f4b4520f569d288193ffd60d57b896bad4480`
MD5	`d47d2a76f01489f39be4e4ac6528919b`
BLAKE2b-256	`f9c0b9697dd6bb1d582360d9ccbddd6618b5c700c1997bb73cba832c741a8edf`

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.46.0-py3-none-any.whl.

File metadata

Download URL: research_hub_pipeline-0.46.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 359.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for research_hub_pipeline-0.46.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`658c941e30794ba7b0e39ee07689134d95bd0b4ca5c4ba21d15a4db9c379ebe8`
MD5	`0d5129783b75c87521e45151eff47a22`
BLAKE2b-256	`6fb4593b465d214cee410cbc3f22b35c675700b6b6dcaed111493b9b326ff25d`

See more details on using hashes here.

research-hub-pipeline 0.46.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

research-hub

Install (under 60 seconds)

The 1-minute story (v0.46 lazy mode)

The longhand version (when you want control)

What makes it different

1. Crystals — pre-computed answers, not lazy retrieval (v0.28)

2. Structured memory layer — entities, claims, methods (v0.36)

3. 4 personas, 1 codebase, dashboard adapts (v0.38)

4. Live dashboard with direct execution (v0.27, expanded v0.42/v0.43/v0.44)

5. Cluster integrity + 100% orphan coverage (v0.37 + v0.39)

Install

For Claude Code / Claude Desktop users

Status

Architecture docs

Workflow reference

For developers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes