Skip to main content

Extract and format bibliographic data from Zotero databases

Project description

zotlib

Tools for extracting and formatting bibliographic data from Zotero 8 databases.

Reads directly from Zotero's local SQLite database — no API key needed. Export collections as CSV or APA-formatted references, generate PDF cover images with thumbnails, and extract annotated PDFs with baked-in highlights and markdown notes. Includes a CLI for common workflows and a Python API for custom pipelines. Built for Zotero 8; older versions are untested and likely incompatible due to schema differences.

Project Structure

zotlib/
├── zotlib/                      # Python library
│   ├── cli.py                   # CLI commands
│   ├── config.py                # Database path discovery
│   ├── database.py              # SQLite interface
│   ├── extractors.py            # Data extraction functions
│   ├── exporters.py             # Collection export (annotations + PDFs)
│   ├── backup.py                # Zotero directory backup
│   ├── tables.py                # Zotero database table definitions
│   ├── covers.py                # PDF cover generation
│   ├── paths.py                 # Path resolution and filename utilities
│   └── formatters/apa.py        # APA citation formatter
├── scripts/                     # Utility scripts
│   ├── extract-annotations.js   # Annotation extractor (interactive + headless)
│   ├── create-parent-item.js    # Create parents for standalone PDFs
│   └── run-extract.sh           # Shell wrapper for headless extraction
├── tests/                       # Test suite
└── pyproject.toml               # Project configuration

Installation

uv add zotlib

From source:

git clone https://github.com/gitronald/zotlib.git
cd zotlib
uv sync

From a specific branch:

uv add git+https://github.com/gitronald/zotlib.git@dev

Configuration

Run zotlib init to auto-discover Zotero paths and save them to zotlib.toml:

zotlib init
database: /mnt/c/Users/rer/Zotero/zotero.sqlite
pdfs_dir: /mnt/i/My Drive/zotero-pdfs

Saved to zotlib.toml

The config file stores the database and linked PDFs directory:

[zotlib]
database = "/path/to/zotero.sqlite"
pdfs_dir = "/path/to/linked-pdfs"

Path resolution priority (for both database and PDFs dir):

  1. CLI flag: --database, --pdfs-dir
  2. Environment variable: ZOTERO_DATABASE
  3. Config file: zotlib.toml
  4. Auto-discovery: Checks common locations (Linux, WSL, macOS)

CLI Commands

Explore

Browse collections and inspect database schema. The show-tables command documents Zotero's largely undocumented SQLite table structure, including column descriptions and types.

# List available collections
zotlib show-collections

# Show database tables
zotlib show-tables
zotlib show-tables items

Export

Export collection data in multiple formats. Supports linked attachments via --pdfs-dir for PDFs stored outside Zotero's default storage.

# Export all tables as CSV
zotlib export-csv

# Export a collection as CSV
zotlib export-csv -c publications

# Format a collection as APA references
zotlib export-apa -c publications

# Generate cover images and thumbnails
zotlib export-covers -c publications

# Export annotated PDFs and markdown notes
zotlib export-annotations -c mycollection

Backup

Archive the entire Zotero data directory as a compressed .tar.bz2 file with a progress bar. Saves to data/backups/zotero-YYYY-MM-DD.tar.bz2 by default. Use -o to specify a custom output path or -d to point to a different database.

zotlib backup
zotlib backup -o ~/backups/zotero-2026-03-21.tar.bz2

Output structure

output/
├── export-csv/                     # Bibliographic metadata
│   └── publications.csv
├── export-apa/                     # APA-formatted references
│   └── publications.md
├── export-covers/                  # PDF cover images
│   └── publications/
│       ├── fullsize/
│       └── thumbnails/
└── export-annotations/             # Annotated PDFs + notes
    └── mycollection/
        └── author-year-title/
            ├── paper.pdf
            └── annotations.md

Export annotations features

  • Multi-attachment support: each PDF gets only its own annotations
  • Standalone attachment support: PDFs added directly to a collection
  • Linked attachment resolution via --pdfs-dir
  • "REVIEW: " prefix stripping from titles

Python API

from zotlib import ZoteroDatabase, extract_cv_items, format_cv_as_apa

db = ZoteroDatabase("/path/to/zotero.sqlite")
items = extract_cv_items(db, collection_name="mypapers")
apa_output = format_cv_as_apa(items, output_path="output/apa.md")

Zotero JavaScript Scripts

WIP — Utilities for Zotero's JavaScript console (Tools > Developer > Run JavaScript). The Zotero SQLite database should never be modified directly via Python — use these JS scripts (which run through Zotero's API) for any write operations.

create-parent-item.js

Creates parent document items for standalone PDF attachments in a collection. Useful when PDFs were added directly without metadata — creates a parent item using the filename as the title and re-parents the attachment.

extract-annotations.js

Extracts annotations from the selected item's PDFs as markdown. Auto-detects its context:

  • Interactive (Tools > Developer > Run JavaScript): shows a file save dialog
  • Headless (via HTTP debug API): writes to ~/Desktop/zotero-annotations/

To run headlessly:

./scripts/run-extract.sh

Requires: Settings > Advanced > "Allow other applications to communicate with Zotero"

The shell script should work on macOS where Zotero and the terminal share the same localhost. On WSL, the script calls Zotero's debug HTTP endpoint on 127.0.0.1:23119, but localhost does not bridge to the Windows host by default. You may need to use the Windows host IP or run the curl command from PowerShell instead.

Annotation Format

Types

Type Extracted Data
Highlight Text + comment + color
Note Comment text
Underline Text + comment
Image Comment only (image not exported)

Color Labels

Hex Code Label
#ffd400 yellow
#ff6666 red
#5fb236 green
#2ea8e5 blue
#a28ae5 purple
#e56eee magenta
#f19837 orange

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zotlib-0.4.3.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zotlib-0.4.3-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file zotlib-0.4.3.tar.gz.

File metadata

  • Download URL: zotlib-0.4.3.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zotlib-0.4.3.tar.gz
Algorithm Hash digest
SHA256 b93f8609f77eb41c3b46114f5589ecd33b27202a10a7b72cc2f6db1d4fb9bc57
MD5 58fe12ffaeb13961b46d40312b0abecf
BLAKE2b-256 354c57405c63e6cd9b23cfcdfbeb133fe7f1d65b19adc20b1438f5cd7a16f48c

See more details on using hashes here.

Provenance

The following attestation bundles were made for zotlib-0.4.3.tar.gz:

Publisher: publish.yml on gitronald/zotlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zotlib-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: zotlib-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zotlib-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 18de87211fca04f9a0fcf67e0a0a2d2e8464f378821c726eff27053328f8d329
MD5 35660bb17f68c96c1d4c5dd4f3bd5022
BLAKE2b-256 975d4f3fcf231afbe9429c5c7d081627ca77bfd80fbc80d3df0a08973ed435b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for zotlib-0.4.3-py3-none-any.whl:

Publisher: publish.yml on gitronald/zotlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page