Scan a Zotero collection, summarize each paper's PDF, and write the summary back as a note (LangGraph + DeepSeek).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

zotery

Scan a Zotero collection, read each paper's attached PDF, run an LLM (DeepSeek, Google Gemini, or a local Ollama model) over it, and write the result back into Zotero as a child note on the paper.

Install it as zotery, run it as zotery. (The Python module is zotero_summarizer.)

Two modes:

Summary (default) — a structured 4-part summary of each paper: Motivation & Main Problem · Key Findings · Methodology · Future Work.
Research question (--rq "...") — instead of a generic summary, the model reads each paper against your question and writes back: a relevance judgement (high/medium/low/none), a grounded answer, its reasoning, findings, and verbatim supporting snippets quoted from the paper. Great for screening a large collection during a literature review.

The pipeline is orchestrated with LangGraph:

START → load_items → process_paper → summarize → write_note → END
                          ↑__________________________|   (loops per paper)

load_items scans the collection · process_paper finds + downloads + extracts the PDF · summarize calls the LLM for a structured PaperSummary (or an RQAnswer in --rq mode) · write_note renders it to HTML and pushes it to Zotero.

How it connects to Zotero

It uses pyzotero as the connector, which speaks to both Zotero APIs:

Web API (ZOTERO_LOCAL=false) — the Zotero cloud library, via an API key. Required to write notes back, because Zotero's local API is read-only. Needs Zotero Sync turned on (so the library exists on zotero.org) and a write-enabled key. PDFs are still read locally from disk (see ZOTERO_STORAGE_DIR), so you do not need Zotero file sync.
Local API (ZOTERO_LOCAL=true) — the running Zotero 7 desktop app. No API key, reads PDFs straight off disk. Good for read-only previews (--dry-run), but cannot write notes (the local API rejects writes).

Prefer an MCP server? The summarization core (summarizer.py + graph.py) is independent of how items are fetched, so you can swap zotero_client.py for a Zotero MCP client. pyzotero is the default because it needs no extra service and reads local PDFs directly.

Install

Requires Python 3.10+ (the LangChain stack no longer supports 3.9).

From PyPI (current version 0.1.1):

pip install zotery
# or, with uv:
uv tool install zotery      # installs the `zotery` command globally

This puts the zotery command on your PATH. Then create a config file from the template and edit it (see below):

curl -O https://raw.githubusercontent.com/mkassaf/zotero-summarizer/main/.env.example
mv .env.example .env
# edit .env, or export the variables in your shell instead

.env is optional — every setting can also come from real environment variables or CLI flags. See Configuration below.

Install from source (for development)

git clone https://github.com/mkassaf/zotero-summarizer.git
cd zotero-summarizer

python3 -m venv .venv
source .venv/bin/activate
pip install -e .            # or: uv sync

cp .env.example .env
# then edit .env  (see below)

Configure Zotero (`.env`)

To write notes you need the Web API:

Turn on sync: Zotero → Settings → Sync → log in. This puts your library metadata on zotero.org so the API can see it. (File sync is optional — PDFs are read locally.)
Create a write-enabled key: https://www.zotero.org/settings/keys/new — check "Allow library access" and "Allow write access".

ZOTERO_LOCAL=false
ZOTERO_LIBRARY_TYPE=user
ZOTERO_LIBRARY_ID=your-username      # username OR numeric userID both work
ZOTERO_API_KEY=your-write-key

# Optional: where PDFs live on disk. Auto-detected to ~/Zotero/storage if unset.
# ZOTERO_STORAGE_DIR=/Users/you/Zotero/storage

ZOTERO_LIBRARY_ID accepts your username — it's resolved to the numeric id the Web API requires, using your API key. The numeric id works too.

Configure the LLM

Pick one provider:

Provider	Settings	Standard key env var	Notes
DeepSeek (default)	`LLM_PROVIDER=deepseek` `LLM_MODEL=deepseek-chat`	`DEEPSEEK_API_KEY`	Key from https://platform.deepseek.com.
Google Gemini	`LLM_PROVIDER=google` `LLM_MODEL=gemini-2.5-flash`	`GOOGLE_API_KEY`	Fast, recommended for big runs.
OpenAI-compatible	`LLM_PROVIDER=openai` `LLM_MODEL=gpt-4o-mini` `LLM_BASE_URL=...`	`OPENAI_API_KEY`	OpenAI, Together, vLLM, etc.
Ollama (local, free)	`LLM_PROVIDER=ollama` `LLM_MODEL=qwen3:8b`	(none)	Needs Ollama running + `ollama pull qwen3:8b`. Native JSON-schema output. Slower per paper.

Where the API key comes from

The LLM key is resolved in this order — first match wins:

CLI flag — --llm-api-key sk-... (highest precedence; never written to disk).
Generic override — LLM_API_KEY (works for any provider).
Provider's standard env var — DEEPSEEK_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY (see the table). Use these if you already export your keys globally in your shell — nothing extra to configure here.

The Zotero key works the same way: --zotero-api-key overrides ZOTERO_API_KEY.

# Example: provider + key entirely from the command line, no .env needed
zotery "Literature Review" \
  --llm-api-key "$MY_KEY" --zotero-api-key "$ZKEY"

# Example: rely on a globally-exported key (e.g. in ~/.zshrc)
export OPENAI_API_KEY=sk-...
LLM_PROVIDER=openai LLM_MODEL=gpt-4o-mini zotery "Literature Review"

Ollama tip: the default base URL is http://127.0.0.1:11434. Use 127.0.0.1, not localhost — localhost can resolve to IPv6/Docker and miss your models.

Usage

After pip install zotery, use the zotery command (or, from a source checkout, python -m zotero_summarizer):

# Summarize every paper in a collection (by name or 8-char key)
zotery "Literature Review"

# Preview first: generate + print summaries, write nothing
zotery "Literature Review" --dry-run --limit 3

# Re-summarize papers that already have an AI note
zotery ABCD1234 --force

Override the provider per-run without editing .env:

LLM_PROVIDER=google LLM_MODEL=gemini-2.5-flash zotery "Literature Review"

Flags:

flag	meaning
`--rq "QUESTION"`	answer a research question per paper instead of summarizing
`--note-title "..."`	custom heading/title for the written note (see below)
`--limit N`	only process the first N papers
`--dry-run`	generate and print results, but don't write notes to Zotero
`--force`	redo papers that already have a matching note
`--llm-api-key KEY`	LLM API key; overrides `LLM_API_KEY` and the provider env var
`--zotero-api-key KEY`	Zotero Web API key; overrides `ZOTERO_API_KEY`

Custom note title. Zotero shows a note's first heading as its title. By default that's 🤖 AI Summary (or 🤖 Research-Question Analysis with --rq); override it with --note-title to make notes easy to spot in a project:

zotery "SW agentic arch" --note-title "📌 LitReview 2026 — multi-agent" \
  --rq "What architectural patterns are proposed?"

The title is purely cosmetic — re-run detection keys off a hidden marker in the note footer, not the title, so changing it never breaks the idempotency check.

Re-runs are idempotent: papers that already have a matching note are skipped unless you pass --force (summary notes and per-question RQ notes are tracked separately, so a summary and several different --rq runs can coexist).

Answer a research question (`--rq`)

Screen a collection against a specific question. For every paper, zotery writes a note with a relevance rating, a grounded answer, the model's reasoning, findings, and verbatim snippets quoted from the paper:

# Always preview first — see the answers without touching your library
zotery "SW agentic arch" --dry-run --limit 5 \
  --rq "What architectural patterns are proposed for multi-agent LLM systems?"

# Looks good? Run it for real (writes one RQ note per paper)
zotery "SW agentic arch" \
  --rq "What architectural patterns are proposed for multi-agent LLM systems?"

# Locally & free with Ollama
LLM_PROVIDER=ollama LLM_MODEL=qwen3:8b zotery "SW agentic arch" \
  --rq "How is agent reliability evaluated?"

Each note is tagged with its exact question, so you can ask several different questions over the same collection and each produces its own note. Papers that don't address the question come back with relevance none and a one-line note saying so — handy for quickly excluding irrelevant papers.

Tip: papers without an extractable PDF (scanned/image-only, or no attachment) are skipped in both modes. Use --dry-run --limit N to sanity-check output quality before a large run.

Run locally & free with Ollama

Ollama runs an LLM on your own machine — no API key, no per-token cost, nothing leaves your computer. Good for private libraries or large runs you don't want to pay for. It's slower per paper and quality depends on the model you pick.

1. Install Ollama and pull a model (a ~5 GB instruct model is a good start):

# Install: https://ollama.com/download  (or `brew install ollama` on macOS)
ollama serve            # start the server (skip if the desktop app is running)
ollama pull qwen3:8b    # download the model

2. Point zotery at Ollama and run:

# One-off, all on the command line (no .env edits, no key needed):
LLM_PROVIDER=ollama LLM_MODEL=qwen3:8b zotery "Literature Review"

# Safe first run: print summaries without writing notes to Zotero
LLM_PROVIDER=ollama LLM_MODEL=qwen3:8b zotery "Literature Review" --dry-run --limit 3

Or set it once in .env and just run zotery "Literature Review":

LLM_PROVIDER=ollama
LLM_MODEL=qwen3:8b
# LLM_BASE_URL=http://127.0.0.1:11434   # optional; this is the default

Notes:

No --llm-api-key / LLM_API_KEY needed — Ollama is keyless. (You still need a Zotero Web API key to write notes; use --dry-run to skip that.)
Use any model you've pulled — e.g. LLM_MODEL=llama3.1:8b, LLM_MODEL=mistral. Bigger models give better summaries but run slower.
If Ollama runs on another host/port, set LLM_BASE_URL (e.g. http://192.168.1.10:11434). Use 127.0.0.1, not localhost — localhost can resolve to IPv6/Docker and miss your local models.

How it works

file	responsibility
`config.py`	load `.env`; build the LLM (DeepSeek / Google / Ollama / OpenAI)
`zotero_client.py`	list collection papers, find/download PDFs, write notes
`pdf_utils.py`	extract text from PDF bytes
`summarizer.py`	prompts + structured output (`PaperSummary` / `RQAnswer`) + note HTML
`graph.py`	the LangGraph pipeline
`cli.py`	argument parsing and the run report

Notes & limits

Writing requires the Web API. The local API is read-only; use it only for reading/--dry-run.
Scanned/image-only PDFs yield no text and are skipped (no OCR).
Long PDFs are truncated to MAX_PDF_CHARS (default 48k chars) to stay within the model's context window.
PDFs are fetched via the API, falling back to ZOTERO_STORAGE_DIR (the local storage/ folder, auto-detected at ~/Zotero/storage). This means Web API mode works without Zotero file sync.
Never commit your .env — it holds your API keys (it's already in .gitignore).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

massaf

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 17, 2026

0.1.0

Jun 17, 2026

0.0.3

Jun 17, 2026

0.0.2

Jun 17, 2026

0.0.1

Jun 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zotery-0.1.1.tar.gz (23.1 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zotery-0.1.1-py3-none-any.whl (21.0 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file zotery-0.1.1.tar.gz.

File metadata

Download URL: zotery-0.1.1.tar.gz
Upload date: Jun 17, 2026
Size: 23.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zotery-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`886a4957a76319a7db47dac0d1ea388a58c299d6f496bec5a6f471c12947ce4c`
MD5	`b40d00409a0c0010f82596057a91384d`
BLAKE2b-256	`da61467f6a2717426dd34969bb3920afa10dbd87a5d2a3b3cbaf56909773067b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zotery-0.1.1.tar.gz:

Publisher: publish.yml on mkassaf/zotero-summarizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zotery-0.1.1.tar.gz
- Subject digest: 886a4957a76319a7db47dac0d1ea388a58c299d6f496bec5a6f471c12947ce4c
- Sigstore transparency entry: 1853405852
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: mkassaf/zotero-summarizer@acb091b01bb8a035ab2320fe3d9be5951fcf6a10
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/mkassaf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@acb091b01bb8a035ab2320fe3d9be5951fcf6a10
- Trigger Event: release

File details

Details for the file zotery-0.1.1-py3-none-any.whl.

File metadata

Download URL: zotery-0.1.1-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zotery-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4bcfd80b82cd1fd508ea74bd96a4057b1a04fbb368467dc7edbda38b8485bbbe`
MD5	`15bdb6405a0ead67dff52bd8b6e90f06`
BLAKE2b-256	`47338354dc948a8ae71dc79a0859bda8d261f382d676ffa3609f0e1a2daa089e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zotery-0.1.1-py3-none-any.whl:

Publisher: publish.yml on mkassaf/zotero-summarizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zotery-0.1.1-py3-none-any.whl
- Subject digest: 4bcfd80b82cd1fd508ea74bd96a4057b1a04fbb368467dc7edbda38b8485bbbe
- Sigstore transparency entry: 1853405883
- Sigstore integration time: Jun 17, 2026
Source repository:
- Permalink: mkassaf/zotero-summarizer@acb091b01bb8a035ab2320fe3d9be5951fcf6a10
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/mkassaf
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@acb091b01bb8a035ab2320fe3d9be5951fcf6a10
- Trigger Event: release

zotery 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

zotery

How it connects to Zotero

Install

Configure Zotero (.env)

Configure the LLM

Where the API key comes from

Usage

Answer a research question (--rq)

Run locally & free with Ollama

How it works

Notes & limits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Configure Zotero (`.env`)

Answer a research question (`--rq`)