Zotero -> Obsidian -> NotebookLM research pipeline
Project description
research-hub-pipeline
Zotero -> Obsidian -> NotebookLM research pipeline. Search, verify, save, organize, and upload academic papers -> all from the terminal.
Install
pip install research-hub-pipeline
# With NotebookLM browser automation:
pip install research-hub-pipeline[playwright]
playwright install chromium
research-hub requires Python 3.10 or newer. Install the playwright
extra only if you plan to use NotebookLM login, upload, or generation.
Two personas, one tool
research-hub supports two workflows out of the box:
Researcher (default)
pip install research-hub-pipeline[playwright]
research-hub init # interactive: vault + Zotero + library id
research-hub doctor # 7-check health diagnostic
research-hub add 10.1234/example # one-shot fetch + ingest
research-hub notebooklm upload --cluster my-topic
You get: Zotero items with full citation metadata + Obsidian research notes + NotebookLM upload + AI suggestions + BibTeX export.
Data analyst (no Zotero)
pip install research-hub-pipeline[playwright]
research-hub init --persona analyst # skips Zotero prompts
research-hub doctor # Zotero checks marked "skipped"
research-hub add 10.1234/example # writes to Obsidian only
research-hub notebooklm upload --cluster my-topic
You get: Obsidian-only knowledge base + NotebookLM upload + AI suggestions. No Zotero account required. Perfect for industry research, white papers, and technical documentation.
Quick start
-
Run the first-time setup wizard:
research-hub initThis creates your config file, vault folders, and optional Zotero settings.
-
Verify the installation:
research-hub doctorThe health check reports config, vault paths, Zotero credentials, dedup index status, Chrome availability, and NotebookLM session state.
-
Search for papers and verify DOIs while printing results:
research-hub search "graph neural networks" --verify
-
Ask for cluster and related-paper suggestions for a DOI:
research-hub suggest 10.1145/3448016.3452841 --json
-
Ingest papers into your Zotero + Obsidian workflow:
research-hub ingest --cluster my-cluster
-
Upload a prepared cluster to NotebookLM:
research-hub notebooklm upload --cluster my-cluster
-
Export a citation when you need it:
research-hub cite 10.1145/3448016.3452841 --format bibtex
If you are starting from scratch, a common first session is:
research-hub init
research-hub doctor
research-hub clusters new --query "graph neural networks" --slug my-cluster
research-hub search "graph neural networks" --verify
research-hub ingest --cluster my-cluster
research-hub notebooklm login --cdp
research-hub notebooklm upload --cluster my-cluster
Features
The CLI is organized around a paper workflow: find papers, verify identifiers, save notes, keep clusters aligned, and push selected clusters into NotebookLM.
| Stage | Command | Description |
|---|---|---|
| Search | search |
Query Semantic Scholar for new papers by keyword |
| Verify | verify --doi / verify --arxiv / verify --paper |
Check that a DOI, arXiv ID, or fuzzy title match resolves |
| Save | run / ingest |
Run the ingestion pipeline and write Zotero + Obsidian outputs |
| Organize | clusters new / clusters list / clusters show / clusters bind |
Create, inspect, and bind topic clusters |
| Suggest | suggest <id> [--json] |
Recommend a cluster and related existing papers |
| Sync | sync status / sync reconcile |
Detect and reconcile Zotero -> Obsidian drift |
| Upload | notebooklm upload |
Upload bundle sources to NotebookLM using a saved browser session |
| Generate | notebooklm generate --type brief |
Trigger NotebookLM artifact generation |
| Cite | cite <id> --format bibtex |
Export BibTeX, BibLaTeX, RIS, or CSL-JSON |
| Maintain | index / status / cleanup / synthesize |
Rebuild indexes, check progress, deduplicate hub pages, generate synthesis pages |
| Setup | init / doctor |
Create config and validate the local environment |
Search and verify
Use search to discover candidate papers before adding anything to your
vault:
research-hub search "retrieval augmented generation" --limit 10
Add --verify when you want each DOI checked against doi.org before
it is printed:
research-hub search "retrieval augmented generation" --limit 10 --verify
Use verify directly when you already know the identifier:
research-hub verify --doi 10.48550/arXiv.1706.03762
research-hub verify --arxiv 1706.03762
research-hub verify --paper "Attention Is All You Need" --paper-year 2017
research-hub verify --paper "Attention Is All You Need" --paper-author Vaswani
Run ingestion
run and ingest both execute the pipeline. Use --dry-run when you
want to validate config and inputs without writing anything:
research-hub ingest --cluster my-cluster --dry-run
research-hub ingest --cluster my-cluster --query "retrieval augmented generation"
research-hub run --cluster my-cluster --query "retrieval augmented generation"
Verification is enabled by default for run and ingest. Skip DOI or
arXiv checks only when you want faster batch work:
research-hub ingest --cluster my-cluster --no-verify
Organize by cluster
Clusters give the rest of the workflow a stable slug for folders, Zotero collections, and NotebookLM notebooks:
research-hub clusters new --query "retrieval augmented generation" --name "RAG" --slug rag
research-hub clusters list
research-hub clusters show rag
research-hub clusters bind rag --zotero ABCD1234 --obsidian rag --notebooklm "RAG Notebook"
Get suggestions before saving
suggest accepts a DOI, arXiv ID, or quoted title and scores both
cluster matches and related existing notes:
research-hub suggest 10.48550/arXiv.1706.03762
research-hub suggest 1706.03762 --top 8
research-hub suggest "Attention Is All You Need" --json
This is the fastest way to decide whether a new paper belongs in an existing cluster before you ingest it.
Track status and repair drift
Use status for reading progress, sync status for cross-system drift,
and sync reconcile to create missing Obsidian notes from Zotero:
research-hub status
research-hub status --cluster rag
research-hub sync status
research-hub sync status --cluster rag
research-hub sync reconcile --cluster rag --dry-run
research-hub sync reconcile --cluster rag --execute
Export citations
cite can export one paper by DOI or filename stem, or an entire
cluster:
research-hub cite 10.48550/arXiv.1706.03762 --format bibtex
research-hub cite 10.48550/arXiv.1706.03762 --format ris
research-hub cite --cluster rag --format biblatex --out rag.bib
Supported output formats are bibtex, biblatex, ris, and
csljson.
Maintain the vault
These commands help keep the local knowledge base consistent:
research-hub index
research-hub cleanup --dry-run
research-hub cleanup
research-hub synthesize
research-hub synthesize --cluster rag --graph-colors
research-hub migrate-yaml --dry-run
research-hub migrate-yaml --assign-cluster rag --folder raw --force
index rebuilds the dedup index from Zotero and Obsidian. cleanup
removes duplicate wiki links in hub pages. synthesize writes cluster
summary pages. migrate-yaml patches older notes to the current YAML
layout.
NotebookLM workflow
NotebookLM commands are separate because they require browser automation:
research-hub notebooklm login --cdp
research-hub notebooklm bundle --cluster rag
research-hub notebooklm upload --cluster rag --dry-run
research-hub notebooklm upload --cluster rag
research-hub notebooklm generate --cluster rag --type brief
research-hub notebooklm generate --cluster rag --type all
Use --headless for unattended runs or --visible when you want to
watch the browser. generate --type supports brief, audio,
mind-map, video, and all.
Configuration
research-hub init writes a JSON config file automatically. By default
the config file lives at ~/.config/research-hub/config.json on Linux
and macOS, or %APPDATA%/research-hub/config.json on Windows.
If you want to point the CLI at a different file, set
RESEARCH_HUB_CONFIG.
The vault root comes from config field knowledge_base.root or from
RESEARCH_HUB_ROOT. If neither is set, the default root is
~/knowledge-base.
Useful path overrides:
RESEARCH_HUB_RAWRESEARCH_HUB_HUBRESEARCH_HUB_PROJECTSRESEARCH_HUB_LOGSRESEARCH_HUB_GRAPH
Zotero credentials can come from config or environment variables:
ZOTERO_API_KEYZOTERO_LIBRARY_IDZOTERO_LIBRARY_TYPERESEARCH_HUB_DEFAULT_COLLECTION
The typical vault layout created by research-hub init is:
<vault>/
raw/
hub/
logs/
.research_hub/
After setup, research-hub doctor is the fastest way to confirm the
config file, vault, Zotero credentials, dedup index, Chrome, and saved
NotebookLM session all look valid.
NotebookLM integration
NotebookLM support uses a Chrome DevTools Protocol attach flow rather
than a Playwright-launched browser, which avoids the Google bot check
that blocks normal automated logins. Install Chrome, run
research-hub notebooklm login --cdp once, then use
notebooklm bundle, notebooklm upload, and notebooklm generate for
cluster-level workflows. Full setup, troubleshooting, and selector
maintenance notes are in docs/notebooklm.md.
For developers
If you are working from a clone instead of installing from PyPI:
git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q
Useful developer checks:
research-hub --help
research-hub doctor
research-hub search "test query" --limit 3
The package entry point is research-hub, while the PyPI package name
is research-hub-pipeline.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file research_hub_pipeline-0.20.2.tar.gz.
File metadata
- Download URL: research_hub_pipeline-0.20.2.tar.gz
- Upload date:
- Size: 373.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1410c175a15052a94a2f02e2446ffb8cdf18e31ee0b176022f6c8137e577d6d
|
|
| MD5 |
40957699e3bbb0b49ef84a68cd6ac8b1
|
|
| BLAKE2b-256 |
b088fe2142dc8b6c99a02cdf4d6ce8524d3c0c3dddebc3ae192d73e2634f76aa
|
File details
Details for the file research_hub_pipeline-0.20.2-py3-none-any.whl.
File metadata
- Download URL: research_hub_pipeline-0.20.2-py3-none-any.whl
- Upload date:
- Size: 235.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf3c6d25ef9eba347202232501e9de8d540fa132a819b17c88504425a238cedc
|
|
| MD5 |
83ca8d9f1652e7694415f7c9a5d1fc3a
|
|
| BLAKE2b-256 |
452a12a44ec6d0c4f69cd757eae8b99ab254ee52e244324c7f1682e945d07187
|