Zotero -> Obsidian -> NotebookLM research pipeline

These details have not been verified by PyPI

Project links

Project description

research-hub-pipeline

Zotero -> Obsidian -> NotebookLM research pipeline. Search, verify, save, organize, and upload academic papers -> all from the terminal.

Install

pip install research-hub-pipeline
# With NotebookLM browser automation:
pip install research-hub-pipeline[playwright]
playwright install chromium

research-hub requires Python 3.10 or newer. Install the playwright extra only if you plan to use NotebookLM login, upload, or generation.

Two personas, one tool

research-hub supports two workflows out of the box:

Researcher (default)

pip install research-hub-pipeline[playwright]
research-hub init                  # interactive: vault + Zotero + library id
research-hub doctor                # 7-check health diagnostic
research-hub add 10.1234/example   # one-shot fetch + ingest
research-hub notebooklm upload --cluster my-topic

You get: Zotero items with full citation metadata + Obsidian research notes + NotebookLM upload + AI suggestions + BibTeX export.

Data analyst (no Zotero)

pip install research-hub-pipeline[playwright]
research-hub init --persona analyst   # skips Zotero prompts
research-hub doctor                    # Zotero checks marked "skipped"
research-hub add 10.1234/example       # writes to Obsidian only
research-hub notebooklm upload --cluster my-topic

You get: Obsidian-only knowledge base + NotebookLM upload + AI suggestions. No Zotero account required. Perfect for industry research, white papers, and technical documentation.

Quick start

Run the first-time setup wizard:
```
research-hub init
```
This creates your config file, vault folders, and optional Zotero settings.
Verify the installation:
```
research-hub doctor
```
The health check reports config, vault paths, Zotero credentials, dedup index status, Chrome availability, and NotebookLM session state.

Search for papers and verify DOIs while printing results:

research-hub search "graph neural networks" --verify

Ask for cluster and related-paper suggestions for a DOI:

research-hub suggest 10.1145/3448016.3452841 --json

Ingest papers into your Zotero + Obsidian workflow:
```
research-hub ingest --cluster my-cluster
```

Upload a prepared cluster to NotebookLM:

research-hub notebooklm upload --cluster my-cluster

Export a citation when you need it:

research-hub cite 10.1145/3448016.3452841 --format bibtex

If you are starting from scratch, a common first session is:

research-hub init
research-hub doctor
research-hub clusters new --query "graph neural networks" --slug my-cluster
research-hub search "graph neural networks" --verify
research-hub ingest --cluster my-cluster
research-hub notebooklm login --cdp
research-hub notebooklm upload --cluster my-cluster

Features

The CLI is organized around a paper workflow: find papers, verify identifiers, save notes, keep clusters aligned, and push selected clusters into NotebookLM.

Stage	Command	Description
Search	`search`	Query Semantic Scholar for new papers by keyword
Verify	`verify --doi` / `verify --arxiv` / `verify --paper`	Check that a DOI, arXiv ID, or fuzzy title match resolves
Save	`run` / `ingest`	Run the ingestion pipeline and write Zotero + Obsidian outputs
Organize	`clusters new` / `clusters list` / `clusters show` / `clusters bind`	Create, inspect, and bind topic clusters
Suggest	`suggest <id> [--json]`	Recommend a cluster and related existing papers
Sync	`sync status` / `sync reconcile`	Detect and reconcile Zotero -> Obsidian drift
Upload	`notebooklm upload`	Upload bundle sources to NotebookLM using a saved browser session
Generate	`notebooklm generate --type brief`	Trigger NotebookLM artifact generation
Cite	`cite <id> --format bibtex`	Export BibTeX, BibLaTeX, RIS, or CSL-JSON
Maintain	`index` / `status` / `cleanup` / `synthesize`	Rebuild indexes, check progress, deduplicate hub pages, generate synthesis pages
Setup	`init` / `doctor`	Create config and validate the local environment

Search and verify

Use search to discover candidate papers before adding anything to your vault:

research-hub search "retrieval augmented generation" --limit 10

Add --verify when you want each DOI checked against doi.org before it is printed:

research-hub search "retrieval augmented generation" --limit 10 --verify

Use verify directly when you already know the identifier:

research-hub verify --doi 10.48550/arXiv.1706.03762
research-hub verify --arxiv 1706.03762
research-hub verify --paper "Attention Is All You Need" --paper-year 2017
research-hub verify --paper "Attention Is All You Need" --paper-author Vaswani

Run ingestion

run and ingest both execute the pipeline. Use --dry-run when you want to validate config and inputs without writing anything:

research-hub ingest --cluster my-cluster --dry-run
research-hub ingest --cluster my-cluster --query "retrieval augmented generation"
research-hub run --cluster my-cluster --query "retrieval augmented generation"

Verification is enabled by default for run and ingest. Skip DOI or arXiv checks only when you want faster batch work:

research-hub ingest --cluster my-cluster --no-verify

Organize by cluster

Clusters give the rest of the workflow a stable slug for folders, Zotero collections, and NotebookLM notebooks:

research-hub clusters new --query "retrieval augmented generation" --name "RAG" --slug rag
research-hub clusters list
research-hub clusters show rag
research-hub clusters bind rag --zotero ABCD1234 --obsidian rag --notebooklm "RAG Notebook"

Get suggestions before saving

suggest accepts a DOI, arXiv ID, or quoted title and scores both cluster matches and related existing notes:

research-hub suggest 10.48550/arXiv.1706.03762
research-hub suggest 1706.03762 --top 8
research-hub suggest "Attention Is All You Need" --json

This is the fastest way to decide whether a new paper belongs in an existing cluster before you ingest it.

Track status and repair drift

Use status for reading progress, sync status for cross-system drift, and sync reconcile to create missing Obsidian notes from Zotero:

research-hub status
research-hub status --cluster rag
research-hub sync status
research-hub sync status --cluster rag
research-hub sync reconcile --cluster rag --dry-run
research-hub sync reconcile --cluster rag --execute

Export citations

cite can export one paper by DOI or filename stem, or an entire cluster:

research-hub cite 10.48550/arXiv.1706.03762 --format bibtex
research-hub cite 10.48550/arXiv.1706.03762 --format ris
research-hub cite --cluster rag --format biblatex --out rag.bib

Supported output formats are bibtex, biblatex, ris, and csljson.

Maintain the vault

These commands help keep the local knowledge base consistent:

research-hub index
research-hub cleanup --dry-run
research-hub cleanup
research-hub synthesize
research-hub synthesize --cluster rag --graph-colors
research-hub migrate-yaml --dry-run
research-hub migrate-yaml --assign-cluster rag --folder raw --force

index rebuilds the dedup index from Zotero and Obsidian. cleanup removes duplicate wiki links in hub pages. synthesize writes cluster summary pages. migrate-yaml patches older notes to the current YAML layout.

NotebookLM workflow

NotebookLM commands are separate because they require browser automation:

research-hub notebooklm login --cdp
research-hub notebooklm bundle --cluster rag
research-hub notebooklm upload --cluster rag --dry-run
research-hub notebooklm upload --cluster rag
research-hub notebooklm generate --cluster rag --type brief
research-hub notebooklm generate --cluster rag --type all

Use --headless for unattended runs or --visible when you want to watch the browser. generate --type supports brief, audio, mind-map, video, and all.

Configuration

research-hub init writes a JSON config file automatically. By default the config file lives at ~/.config/research-hub/config.json on Linux and macOS, or %APPDATA%/research-hub/config.json on Windows.

If you want to point the CLI at a different file, set RESEARCH_HUB_CONFIG.

The vault root comes from config field knowledge_base.root or from RESEARCH_HUB_ROOT. If neither is set, the default root is ~/knowledge-base.

Useful path overrides:

RESEARCH_HUB_RAW
RESEARCH_HUB_HUB
RESEARCH_HUB_PROJECTS
RESEARCH_HUB_LOGS
RESEARCH_HUB_GRAPH

Zotero credentials can come from config or environment variables:

ZOTERO_API_KEY
ZOTERO_LIBRARY_ID
ZOTERO_LIBRARY_TYPE
RESEARCH_HUB_DEFAULT_COLLECTION

The typical vault layout created by research-hub init is:

<vault>/
  raw/
  hub/
  logs/
  .research_hub/

After setup, research-hub doctor is the fastest way to confirm the config file, vault, Zotero credentials, dedup index, Chrome, and saved NotebookLM session all look valid.

NotebookLM integration

NotebookLM support uses a Chrome DevTools Protocol attach flow rather than a Playwright-launched browser, which avoids the Google bot check that blocks normal automated logins. Install Chrome, run research-hub notebooklm login --cdp once, then use notebooklm bundle, notebooklm upload, and notebooklm generate for cluster-level workflows. Full setup, troubleshooting, and selector maintenance notes are in docs/notebooklm.md.

For developers

If you are working from a clone instead of installing from PyPI:

git clone https://github.com/WenyuChiou/research-hub.git
cd research-hub
pip install -e '.[dev,playwright]'
python -m pytest -q

Useful developer checks:

research-hub --help
research-hub doctor
research-hub search "test query" --limit 3

The package entry point is research-hub, while the PyPI package name is research-hub-pipeline.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.38.1

Apr 18, 2026

0.38.0

Apr 18, 2026

0.37.3

Apr 18, 2026

0.37.2

Apr 18, 2026

0.37.1

Apr 18, 2026

0.37.0

Apr 18, 2026

0.36.0

Apr 18, 2026

0.35.0

Apr 18, 2026

0.34.0

Apr 18, 2026

0.33.3

Apr 18, 2026

0.33.2

Apr 17, 2026

0.33.1

Apr 17, 2026

0.33.0

Apr 17, 2026

0.32.0

Apr 17, 2026

0.31.1

Apr 17, 2026

0.31.0

Apr 17, 2026

0.30.0

Apr 17, 2026

0.29.0

Apr 17, 2026

0.28.0

Apr 15, 2026

0.27.0

Apr 15, 2026

0.26.0

Apr 14, 2026

0.25.0

Apr 14, 2026

0.24.0

Apr 14, 2026

0.23.1

Apr 14, 2026

0.23.0

Apr 14, 2026

0.22.0

Apr 14, 2026

0.21.0

Apr 13, 2026

This version

0.20.2

Apr 13, 2026

0.20.1

Apr 13, 2026

0.20.0

Apr 13, 2026

0.19.1

Apr 13, 2026

0.18.0

Apr 13, 2026

0.17.0

Apr 13, 2026

0.16.0

Apr 13, 2026

0.15.0

Apr 13, 2026

0.14.0

Apr 13, 2026

0.13.0

Apr 13, 2026

0.12.0

Apr 13, 2026

0.11.0

Apr 13, 2026

0.10.0

Apr 13, 2026

0.9.0

Apr 12, 2026

0.8.2

Apr 12, 2026

0.8.1

Apr 12, 2026

0.8.0

Apr 12, 2026

0.7.0

Apr 12, 2026

0.6.0

Apr 12, 2026

0.5.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

research_hub_pipeline-0.20.2.tar.gz (373.7 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

research_hub_pipeline-0.20.2-py3-none-any.whl (235.7 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file research_hub_pipeline-0.20.2.tar.gz.

File metadata

Download URL: research_hub_pipeline-0.20.2.tar.gz
Upload date: Apr 13, 2026
Size: 373.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for research_hub_pipeline-0.20.2.tar.gz
Algorithm	Hash digest
SHA256	`f1410c175a15052a94a2f02e2446ffb8cdf18e31ee0b176022f6c8137e577d6d`
MD5	`40957699e3bbb0b49ef84a68cd6ac8b1`
BLAKE2b-256	`b088fe2142dc8b6c99a02cdf4d6ce8524d3c0c3dddebc3ae192d73e2634f76aa`

See more details on using hashes here.

File details

Details for the file research_hub_pipeline-0.20.2-py3-none-any.whl.

File metadata

Download URL: research_hub_pipeline-0.20.2-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 235.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for research_hub_pipeline-0.20.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf3c6d25ef9eba347202232501e9de8d540fa132a819b17c88504425a238cedc`
MD5	`83ca8d9f1652e7694415f7c9a5d1fc3a`
BLAKE2b-256	`452a12a44ec6d0c4f69cd757eae8b99ab254ee52e244324c7f1682e945d07187`

See more details on using hashes here.

research-hub-pipeline 0.20.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

research-hub-pipeline

Install

Two personas, one tool

Researcher (default)

Data analyst (no Zotero)

Quick start

Features

Search and verify

Run ingestion

Organize by cluster

Get suggestions before saving

Track status and repair drift

Export citations

Maintain the vault

NotebookLM workflow

Configuration

NotebookLM integration

For developers

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes