Skip to main content

Replace preprint BibTeX entries with published versions and validate bibliography references

Project description

BibTeX Updater

Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.

Installation

From PyPI (Recommended)

pip install bibtex-updater

# With Google Scholar support
pip install bibtex-updater[scholarly]

# With Zotero support
pip install bibtex-updater[zotero]

# All optional dependencies
pip install bibtex-updater[all]

From Source

git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev]"

CLI Commands

Command Description
bibtex-update Replace preprints with published versions
bibtex-check Validate references exist with correct metadata
bibtex-filter Filter to only cited entries
bibtex-zotero Update preprints in Zotero library

Quick Start

Update Preprints

# Update preprints to published versions
bibtex-update references.bib -o updated.bib

# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose

Validate References (Fact-Check)

# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json

# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict

Filter Bibliography

# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib

# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib

Update Zotero Library

# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"

# Preview changes
bibtex-zotero --dry-run

# Apply updates
bibtex-zotero

Standalone Scripts

For environments without pip (e.g., Overleaf), filter_bibliography.py can be used directly as it has no dependencies:

# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib

Documentation

Document Description
docs/BIBTEX_UPDATER.md Full BibTeX updater documentation
docs/REFERENCE_FACT_CHECKER.md Full reference fact-checker documentation
docs/ZOTERO_UPDATER.md Full Zotero updater documentation
docs/FILTER_BIBLIOGRAPHY.md Full filter documentation
examples/ Example workflows and configuration files

Overleaf Integration

Both tools integrate with Overleaf via GitHub Actions or latexmkrc.

GitHub Actions (Recommended)

  1. Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
  2. Copy a workflow from examples/workflows/ to .github/workflows/
  3. Changes synced from Overleaf automatically trigger updates

latexmkrc (Direct Overleaf)

For filter_bibliography.py only (no dependencies required):

  1. Upload filter_bibliography.py to your Overleaf project
  2. Create .latexmkrc based on examples/latexmkrc
  3. Recompile - filtered bibliography appears in your file list

Features

BibTeX Updater (bibtex-update)

  • Multi-source resolution: arXiv, Crossref, DBLP, Semantic Scholar, Google Scholar
  • High accuracy: Title and author fuzzy matching with confidence thresholds
  • Batch processing: Multiple files with concurrent workers
  • Deduplication: Merge duplicates by DOI or normalized title+authors
  • Caching: On-disk cache to avoid repeated API calls

Zotero Updater (bibtex-zotero)

  • Direct Zotero integration: Fetches and updates items via Zotero API
  • Same resolution pipeline: Uses the same multi-source resolution
  • Preserves metadata: Keeps notes, tags, and attachments intact
  • Idempotent: Already-published papers are automatically skipped
  • Dry-run mode: Preview changes before applying

Reference Fact-Checker (bibtex-check)

  • Multi-source validation: Crossref, DBLP, Semantic Scholar
  • Detailed mismatch detection: Title, author, year, venue comparisons
  • Hallucination detection: Identifies likely fabricated references
  • Structured reports: JSON and JSONL output formats
  • CI/CD integration: Strict mode with exit codes for automation

Filter Bibliography (bibtex-filter)

  • Zero dependencies: Uses only Python standard library
  • Works on Overleaf: No pip install needed
  • Multiple bib files: Merge and filter from multiple sources
  • Citation detection: Supports natbib, biblatex, and standard LaTeX citations

Python API

from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache

# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
    timeout=30.0,
    user_agent="bibtex-updater/0.1.0",
    rate_limiter=rate_limiter,
    cache=cache
)

# Detect preprints
detector = Detector()
detection = detector.detect(entry)

if detection.is_preprint:
    # Resolve to published version
    resolver = Resolver(http_client)
    candidate = resolver.resolve(detection)

    if candidate and candidate.confidence >= 0.9:
        # Update the entry
        updater = Updater()
        updated_entry = updater.update_entry(entry, candidate.record, detection)

Development

# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev,all]"

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing

# Code quality
pre-commit run --all-files

# Build package
python -m build

# Check package
twine check dist/*

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibtex_updater-0.1.0.tar.gz (93.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibtex_updater-0.1.0-py3-none-any.whl (56.8 kB view details)

Uploaded Python 3

File details

Details for the file bibtex_updater-0.1.0.tar.gz.

File metadata

  • Download URL: bibtex_updater-0.1.0.tar.gz
  • Upload date:
  • Size: 93.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bibtex_updater-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e7868d3ea305de0096a293a904a80ba00ba382dbe2254b37ee818f7a1f49619e
MD5 5ddb0c7c0d64014e73a98765f1c3184f
BLAKE2b-256 11035d436792a15ae587faf4e38fabb9803c01500a0c6a83aa22a9565435f971

See more details on using hashes here.

Provenance

The following attestation bundles were made for bibtex_updater-0.1.0.tar.gz:

Publisher: publish.yml on rpatrik96/bibtexupdater

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bibtex_updater-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bibtex_updater-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bibtex_updater-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 687530aad8105482db392ad2c10cd3c35170512fee2068cffb8efc1e87c7c048
MD5 4c4ee1e5b748754798fe7dd4f9886313
BLAKE2b-256 f4cb19e5f7245acfffd59f509a494b2cc85cd4322d1ae94357e8531d078fa96c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bibtex_updater-0.1.0-py3-none-any.whl:

Publisher: publish.yml on rpatrik96/bibtexupdater

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page