Scan a Zotero collection, summarize each paper's PDF, and write the summary back as a note (LangGraph + DeepSeek).
Project description
zotery
Scan a Zotero collection, read each paper's attached PDF, generate a structured summary with an LLM (DeepSeek, Google Gemini, or a local Ollama model), and write that summary back into Zotero as a child note on the paper.
Names: the PyPI package is
zotery; the installed command iszotero-summarizer(the Python module iszotero_summarizer).
Every summary contains four sections:
- Motivation & Main Problem
- Key Findings
- Methodology
- Future Work
The pipeline is orchestrated with LangGraph:
START → load_items → process_paper → summarize → write_note → END
↑__________________________| (loops per paper)
load_items scans the collection · process_paper finds + downloads + extracts
the PDF · summarize calls the LLM for a structured PaperSummary · write_note
renders it to HTML and pushes it to Zotero.
How it connects to Zotero
It uses pyzotero as the connector, which speaks to both Zotero APIs:
- Web API (
ZOTERO_LOCAL=false) — the Zotero cloud library, via an API key. Required to write notes back, because Zotero's local API is read-only. Needs Zotero Sync turned on (so the library exists on zotero.org) and a write-enabled key. PDFs are still read locally from disk (seeZOTERO_STORAGE_DIR), so you do not need Zotero file sync. - Local API (
ZOTERO_LOCAL=true) — the running Zotero 7 desktop app. No API key, reads PDFs straight off disk. Good for read-only previews (--dry-run), but cannot write notes (the local API rejects writes).
Prefer an MCP server? The summarization core (
summarizer.py+graph.py) is independent of how items are fetched, so you can swapzotero_client.pyfor a Zotero MCP client. pyzotero is the default because it needs no extra service and reads local PDFs directly.
Install
Requires Python 3.10+ (the LangChain stack no longer supports 3.9).
From PyPI (current version 0.0.1):
pip install zotery
# or, with uv:
uv tool install zotery # installs the `zotero-summarizer` command globally
This puts the zotero-summarizer command on your PATH. Then create a config file
from the template and edit it (see below):
curl -O https://raw.githubusercontent.com/mkassaf/zotero-summarizer/main/.env.example
mv .env.example .env
# edit .env, or export the variables in your shell instead
.envis optional — every setting can also come from real environment variables or CLI flags. See Configuration below.
Install from source (for development)
git clone https://github.com/mkassaf/zotero-summarizer.git
cd zotero-summarizer
python3 -m venv .venv
source .venv/bin/activate
pip install -e . # or: uv sync
cp .env.example .env
# then edit .env (see below)
Configure Zotero (.env)
To write notes you need the Web API:
- Turn on sync: Zotero → Settings → Sync → log in. This puts your library metadata on zotero.org so the API can see it. (File sync is optional — PDFs are read locally.)
- Create a write-enabled key: https://www.zotero.org/settings/keys/new — check "Allow library access" and "Allow write access".
ZOTERO_LOCAL=false
ZOTERO_LIBRARY_TYPE=user
ZOTERO_LIBRARY_ID=your-username # username OR numeric userID both work
ZOTERO_API_KEY=your-write-key
# Optional: where PDFs live on disk. Auto-detected to ~/Zotero/storage if unset.
# ZOTERO_STORAGE_DIR=/Users/you/Zotero/storage
ZOTERO_LIBRARY_ID accepts your username — it's resolved to the numeric id
the Web API requires, using your API key. The numeric id works too.
Configure the LLM
Pick one provider:
| Provider | Settings | Standard key env var | Notes |
|---|---|---|---|
| DeepSeek (default) | LLM_PROVIDER=deepseekLLM_MODEL=deepseek-chat |
DEEPSEEK_API_KEY |
Key from https://platform.deepseek.com. |
| Google Gemini | LLM_PROVIDER=googleLLM_MODEL=gemini-2.5-flash |
GOOGLE_API_KEY |
Fast, recommended for big runs. |
| OpenAI-compatible | LLM_PROVIDER=openaiLLM_MODEL=gpt-4o-miniLLM_BASE_URL=... |
OPENAI_API_KEY |
OpenAI, Together, vLLM, etc. |
| Ollama (local, free) | LLM_PROVIDER=ollamaLLM_MODEL=qwen3:8b |
(none) | Needs Ollama running + ollama pull qwen3:8b. Native JSON-schema output. Slower per paper. |
Where the API key comes from
The LLM key is resolved in this order — first match wins:
- CLI flag —
--llm-api-key sk-...(highest precedence; never written to disk). - Generic override —
LLM_API_KEY(works for any provider). - Provider's standard env var —
DEEPSEEK_API_KEY,OPENAI_API_KEY, orGOOGLE_API_KEY(see the table). Use these if you already export your keys globally in your shell — nothing extra to configure here.
The Zotero key works the same way: --zotero-api-key overrides ZOTERO_API_KEY.
# Example: provider + key entirely from the command line, no .env needed
zotero-summarizer "Literature Review" \
--llm-api-key "$MY_KEY" --zotero-api-key "$ZKEY"
# Example: rely on a globally-exported key (e.g. in ~/.zshrc)
export OPENAI_API_KEY=sk-...
LLM_PROVIDER=openai LLM_MODEL=gpt-4o-mini zotero-summarizer "Literature Review"
Ollama tip: the default base URL is
http://127.0.0.1:11434. Use127.0.0.1, notlocalhost—localhostcan resolve to IPv6/Docker and miss your models.
Usage
After pip install zotery, use the zotero-summarizer command (or, from a
source checkout, python -m zotero_summarizer):
# Summarize every paper in a collection (by name or 8-char key)
zotero-summarizer "Literature Review"
# Preview first: generate + print summaries, write nothing
zotero-summarizer "Literature Review" --dry-run --limit 3
# Re-summarize papers that already have an AI note
zotero-summarizer ABCD1234 --force
Override the provider per-run without editing .env:
LLM_PROVIDER=google LLM_MODEL=gemini-2.5-flash zotero-summarizer "Literature Review"
Flags:
| flag | meaning |
|---|---|
--limit N |
only process the first N papers |
--dry-run |
generate and print summaries, but don't write notes to Zotero |
--force |
re-summarize even if an AI summary note already exists |
--llm-api-key KEY |
LLM API key; overrides LLM_API_KEY and the provider env var |
--zotero-api-key KEY |
Zotero Web API key; overrides ZOTERO_API_KEY |
Re-runs are idempotent: papers that already have an AI summary note are
skipped unless you pass --force.
How it works
| file | responsibility |
|---|---|
config.py |
load .env; build the LLM (DeepSeek / Google / Ollama / OpenAI) |
zotero_client.py |
list collection papers, find/download PDFs, write notes |
pdf_utils.py |
extract text from PDF bytes |
summarizer.py |
prompt + structured (PaperSummary) output + note HTML |
graph.py |
the LangGraph pipeline |
cli.py |
argument parsing and the run report |
Notes & limits
- Writing requires the Web API. The local API is read-only; use it only for
reading/
--dry-run. - Scanned/image-only PDFs yield no text and are skipped (no OCR).
- Long PDFs are truncated to
MAX_PDF_CHARS(default 48k chars) to stay within the model's context window. - PDFs are fetched via the API, falling back to
ZOTERO_STORAGE_DIR(the localstorage/folder, auto-detected at~/Zotero/storage). This means Web API mode works without Zotero file sync. - Never commit your
.env— it holds your API keys (it's already in.gitignore).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zotery-0.0.1.tar.gz.
File metadata
- Download URL: zotery-0.0.1.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a13a26d1cce01423ed80935ab46b7561ec0c1c68589f05e54f8f843bf5dcfd
|
|
| MD5 |
4720ed615cd1c76e817af897fe684612
|
|
| BLAKE2b-256 |
e9678c458010b42417a4816d896b7991cbe5da27a7dbd71c0ca601c8bb0fd6ec
|
Provenance
The following attestation bundles were made for zotery-0.0.1.tar.gz:
Publisher:
publish.yml on mkassaf/zotero-summarizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zotery-0.0.1.tar.gz -
Subject digest:
b9a13a26d1cce01423ed80935ab46b7561ec0c1c68589f05e54f8f843bf5dcfd - Sigstore transparency entry: 1848784546
- Sigstore integration time:
-
Permalink:
mkassaf/zotero-summarizer@199ca408d24c9b3edc737620bec786b69e58c161 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/mkassaf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@199ca408d24c9b3edc737620bec786b69e58c161 -
Trigger Event:
release
-
Statement type:
File details
Details for the file zotery-0.0.1-py3-none-any.whl.
File metadata
- Download URL: zotery-0.0.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
832c6b175cc559bffee957f88421b698b37c5cc5b40b3d5e107d4663a8912308
|
|
| MD5 |
fe0b3f697336aceb9e8954b2cc74edcc
|
|
| BLAKE2b-256 |
3be82a7b2d7c8208f254b929359da559a39d74d510c33b870e4dbd2fe9fff280
|
Provenance
The following attestation bundles were made for zotery-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on mkassaf/zotero-summarizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zotery-0.0.1-py3-none-any.whl -
Subject digest:
832c6b175cc559bffee957f88421b698b37c5cc5b40b3d5e107d4663a8912308 - Sigstore transparency entry: 1848785020
- Sigstore integration time:
-
Permalink:
mkassaf/zotero-summarizer@199ca408d24c9b3edc737620bec786b69e58c161 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/mkassaf
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@199ca408d24c9b3edc737620bec786b69e58c161 -
Trigger Event:
release
-
Statement type: