Replace preprint BibTeX entries with published versions and validate bibliography references
Project description
BibTeX Updater
Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.
Installation
From PyPI (Recommended)
pip install bibtex-updater
# With Google Scholar support
pip install bibtex-updater[scholarly]
# With Zotero support
pip install bibtex-updater[zotero]
# All optional dependencies
pip install bibtex-updater[all]
From Source (Recommended)
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra all
Using uv (No Installation)
Run directly without cloning using uv:
# Run any command directly
uv run --with "bibtex-updater[all]" bibtex-update references.bib -o updated.bib
# Or use the provided wrapper script
./scripts/bibtex-x update references.bib -o updated.bib
./scripts/bibtex-x check references.bib
./scripts/bibtex-x filter paper.tex -b references.bib -o filtered.bib
CLI Commands
| Command | Description |
|---|---|
bibtex-update |
Replace preprints with published versions |
bibtex-check |
Validate references exist with correct metadata |
bibtex-filter |
Filter to only cited entries |
bibtex-zotero |
Update preprints in Zotero library |
bibtex-zotero-organize |
Organize Zotero items into collections by research taxonomy |
bibtex-obsidian-keywords |
AI-powered keyword generation for Obsidian paper notes |
Quick Start
Update Preprints
# Update preprints to published versions
bibtex-update references.bib -o updated.bib
# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose
Validate References (Fact-Check)
# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json
# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict
Filter Bibliography
# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib
# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib
Update Zotero Library
# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Preview changes
bibtex-zotero --dry-run
# Apply updates
bibtex-zotero
Sync BibTeX Updates to Zotero
When updating a .bib file, you can simultaneously update matching entries in your Zotero library:
# Set Zotero credentials
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"
# Update bib file AND sync to Zotero
bibtex-update references.bib -o updated.bib --zotero
# Preview Zotero changes only (bib changes still apply)
bibtex-update references.bib -o updated.bib --zotero --zotero-dry-run
# Limit to a specific Zotero collection
bibtex-update references.bib -o updated.bib --zotero --zotero-collection ABCD1234
The sync matches bib entries to Zotero items by:
- arXiv ID - Most reliable for preprints
- DOI - For preprints with DOIs (e.g., bioRxiv)
- Title + Author - Fuzzy matching as fallback
Standalone Scripts
For environments without pip (e.g., Overleaf), filter_bibliography.py can be used directly as it has no dependencies:
# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib
Documentation
| Document | Description |
|---|---|
| docs/BIBTEX_UPDATER.md | Full BibTeX updater documentation |
| docs/REFERENCE_FACT_CHECKER.md | Full reference fact-checker documentation |
| docs/ZOTERO_UPDATER.md | Full Zotero updater documentation |
| docs/FILTER_BIBLIOGRAPHY.md | Full filter documentation |
| docs/LANDSCAPE.md | Databases, competing tools, and ecosystem landscape |
| examples/ | Example workflows and configuration files |
Overleaf Integration
Both tools integrate with Overleaf via GitHub Actions or latexmkrc.
GitHub Actions (Recommended)
- Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
- Copy a workflow from examples/workflows/ to
.github/workflows/ - Changes synced from Overleaf automatically trigger updates
latexmkrc (Direct Overleaf)
For filter_bibliography.py only (no dependencies required):
- Upload
filter_bibliography.pyto your Overleaf project - Create
.latexmkrcbased on examples/latexmkrc - Recompile - filtered bibliography appears in your file list
Features
BibTeX Updater (bibtex-update)
- Multi-source resolution: arXiv, OpenAlex, Europe PMC, Crossref, DBLP, ACL Anthology, Semantic Scholar, Google Scholar
- High accuracy: Title and author fuzzy matching with confidence thresholds
- ACL Anthology support: Zero-overhead resolution for NLP papers (ACL, EMNLP, NAACL, etc.)
- Batch processing: Multiple files with concurrent workers (default: 8)
- Deduplication: Merge duplicates by DOI or normalized title+authors
- Smart caching: On-disk cache + semantic resolution cache with TTL
- Per-service rate limiting: Optimized rate limits per API (Crossref, S2, DBLP, ACL Anthology, arXiv, OpenAlex, Europe PMC)
- Batch API support: Faster bulk lookups via arXiv/S2/Crossref batch endpoints
- Resolution tracking:
--mark-resolvedtags updated entries to skip on re-runs
Zotero Updater (bibtex-zotero)
- Direct Zotero integration: Fetches and updates items via Zotero API
- Same resolution pipeline: Uses the same multi-source resolution
- Preserves metadata: Keeps notes, tags, and attachments intact
- Idempotent: Already-published papers are automatically skipped
- Dry-run mode: Preview changes before applying
- Tag-based chunking: Track processing state with
preprint-upgraded/preprint-checked/preprint-errortags
Zotero Organizer (bibtex-zotero-organize)
- AI-powered taxonomy: Organize items into hierarchical collections automatically
- Multiple backends: Claude, OpenAI, or local embeddings for classification
- Caching: Classification results cached to reduce API calls
- Batch processing: Configurable limits and dry-run mode
Obsidian Keywords (bibtex-obsidian-keywords)
- AI-powered keywords: Generate
[[wikilinks]]for Obsidian paper notes - Multiple backends: Claude, OpenAI, or local embeddings
- Smart skipping:
--min-keywordsto skip notes that already have enough keywords - Topics file: Provide existing topics for consistent tagging across notes
- Dry-run mode: Preview changes before modifying files
Reference Fact-Checker (bibtex-check)
- Multi-source validation: Crossref, DBLP, Semantic Scholar
- Detailed mismatch detection: Title, author, year, venue comparisons
- Hallucination detection: Identifies likely fabricated references
- Structured reports: JSON and JSONL output formats
- CI/CD integration: Strict mode with exit codes for automation
Filter Bibliography (bibtex-filter)
- Zero dependencies: Uses only Python standard library
- Works on Overleaf: No pip install needed
- Multiple bib files: Merge and filter from multiple sources
- Citation detection: Supports natbib, biblatex, and standard LaTeX citations
Python API
from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache
# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
timeout=30.0,
user_agent="bibtex-updater/0.5.0",
rate_limiter=rate_limiter,
cache=cache
)
# Detect preprints
detector = Detector()
detection = detector.detect(entry)
if detection.is_preprint:
# Resolve to published version
resolver = Resolver(http_client)
candidate = resolver.resolve(detection)
if candidate and candidate.confidence >= 0.9:
# Update the entry
updater = Updater()
updated_entry = updater.update_entry(entry, candidate.record, detection)
Development
# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra all
# Run tests
uv run pytest tests/ -v
# Run tests with coverage
uv run pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing
# Code quality
pre-commit run --all-files
# Build package
uv build
# Check package
uv run twine check dist/*
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bibtex_updater-0.8.0.tar.gz.
File metadata
- Download URL: bibtex_updater-0.8.0.tar.gz
- Upload date:
- Size: 212.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b071c806e69637dd48b344341a4b50f942999eda1af25df21bd7924b5e65265
|
|
| MD5 |
9ac5f0cf6b4f969e86d6777d4e2fca14
|
|
| BLAKE2b-256 |
d3a6b5c1bcc111e2aebeaa71902a3114307cd3fcb65701efcbeb500c992bb30b
|
Provenance
The following attestation bundles were made for bibtex_updater-0.8.0.tar.gz:
Publisher:
publish.yml on rpatrik96/bibtexupdater
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bibtex_updater-0.8.0.tar.gz -
Subject digest:
6b071c806e69637dd48b344341a4b50f942999eda1af25df21bd7924b5e65265 - Sigstore transparency entry: 953053355
- Sigstore integration time:
-
Permalink:
rpatrik96/bibtexupdater@a5215b0e2639a45ba4cd3cbf9394f2ee072c8aa0 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/rpatrik96
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a5215b0e2639a45ba4cd3cbf9394f2ee072c8aa0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bibtex_updater-0.8.0-py3-none-any.whl.
File metadata
- Download URL: bibtex_updater-0.8.0-py3-none-any.whl
- Upload date:
- Size: 139.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
656f24f0ddd1cc78e3273b322ef2aff1ecfa6a0cd2d9b92ba65021d3f5620337
|
|
| MD5 |
1dd148af5e3b2d5a4c8a35c927dd60ba
|
|
| BLAKE2b-256 |
eed3b7aa98f3ef21107d1c3b408ca2fc85b1b9777db40a895ba1e0ea8413891d
|
Provenance
The following attestation bundles were made for bibtex_updater-0.8.0-py3-none-any.whl:
Publisher:
publish.yml on rpatrik96/bibtexupdater
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bibtex_updater-0.8.0-py3-none-any.whl -
Subject digest:
656f24f0ddd1cc78e3273b322ef2aff1ecfa6a0cd2d9b92ba65021d3f5620337 - Sigstore transparency entry: 953053360
- Sigstore integration time:
-
Permalink:
rpatrik96/bibtexupdater@a5215b0e2639a45ba4cd3cbf9394f2ee072c8aa0 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/rpatrik96
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a5215b0e2639a45ba4cd3cbf9394f2ee072c8aa0 -
Trigger Event:
release
-
Statement type: