Skip to main content

Verify BibTeX references against OpenAlex & CrossRef APIs to detect errors and AI hallucinations

Project description

BibTeX Verifier

CI PyPI version PyPI Downloads Python 3.9+ License: MIT Online Tool

BibTeX Verifier is an open-source CLI tool that automatically validates every reference in a .bib file against two authoritative academic databases — OpenAlex and CrossRef — to catch typos, wrong years, misattributed authors, and AI-hallucinated citations before they reach your paper.

BibTeX 引用验证工具 是一个开源命令行工具,通过调用 OpenAlex 和 CrossRef 两个权威学术数据库,自动核验 .bib 文件中每条引用的标题、作者、年份等元数据,帮助研究者在论文提交前发现引用错误和 AI 幻觉引用。


Features / 功能特性

Feature Description
AI hallucination detection Flags papers that simply do not exist in any academic database
Dual-source verification CrossRef (exact DOI lookup) + OpenAlex (fuzzy title search)
Field-level checking Title, year, first-author last name, author count
Markdown report Human-readable report with per-entry details and a summary table
JSON output Machine-readable raw results for further processing
CLI & Python API Use as a command or import as a library
No registration needed OpenAlex is free and open; CrossRef is public
Rate-limit safe Built-in throttling and exponential back-off on HTTP errors

Online Tool / 在线工具

No installation needed — use the web interface directly:

https://altman-conquer.github.io/bibtex-verifier/

Upload your .bib file and get a verification report instantly in your browser. No data is sent to any server — all API calls are made directly from your browser to OpenAlex and CrossRef.


Quick Start / 快速开始

pip install bibtex-verifier
bibverify my_paper.bib

This generates my_paper.report.md with a full verification report.


Installation / 安装

From PyPI:

pip install bibtex-verifier

From source:

git clone https://github.com/your-username/bibtex-verifier.git
cd bibtex-verifier
pip install -e .

Requirements: Python 3.9+, no API keys required.


Usage / 使用方法

CLI

# Basic usage
bibverify paper.bib

# Save report to a custom path
bibverify paper.bib --output reports/verification.md

# Also export raw JSON results
bibverify paper.bib --json

# Use your email for higher API rate limits (Polite Pool)
bibverify paper.bib --email you@university.edu

# Adjust fuzzy-match thresholds
bibverify paper.bib --title-threshold 85 --author-threshold 70

All Options / 参数说明

Option Default Description
BIB_FILE Path to the .bib file to verify
--output / -o <bib>.report.md Report output file path
--json false Also write a .json results file
--title-threshold 82 Minimum fuzzy score for title match (0–100)
--author-threshold 72 Minimum fuzzy score for author match (0–100)
--email Email for Polite Pool (faster rate limits)
--rate-limit 0.15 Seconds between API calls
--version / -V Show version and exit

Python API

from pathlib import Path
from bibtex_verifier.loader import load_bib
from bibtex_verifier.apis import oa_search, oa_extract, crossref_by_doi, crossref_extract
from bibtex_verifier.comparator import compare_entry
from bibtex_verifier.report import build_markdown_report

entries = load_bib(Path("paper.bib"))

results = []
for entry in entries:
    # Try CrossRef first if DOI is available
    api_data, source, score = None, None, 0
    if entry.get("doi"):
        msg = crossref_by_doi(entry["doi"])
        if msg:
            api_data = crossref_extract(msg)
            source = "crossref"
            score = 100  # DOI is exact
    # Fall back to OpenAlex
    if not source:
        paper = oa_search(entry.get("title", ""))
        if paper:
            api_data = oa_extract(paper)
            source = "openalex"
            score = paper["_match_score"]

    result = compare_entry(entry, api_data=api_data, source=source, match_score=score)
    results.append(result)

print(build_markdown_report(results, bib_filename="paper.bib"))

Sample Output / 输出示例

BibTeX Verifier v0.1.0
Parsing paper.bib ...
Found 8 entries — estimated time: ~5s

  [  1/8] Vaswani2017attention                    [DOI] OK      score= 98%
  [  2/8] He2016resnet                                  OK      score= 95%
  [  3/8] Touvron2023llama                              WARN    score= 97%
  [  4/8] Brown2020gpt3                                 WARN    score= 99%
  [  5/8] Smith2020vit                                  ERR     score= 94%
  [  6/8] Devlin2019bert                                ERR     score= 81%
  [  7/8] Johnson2021hallucinated                       N/F     score=  0%
  [  8/8] LeCun1989backprop                       [DOI] OK      score=100%

Done! Report saved to paper.report.md

┌─────────────────────────┐
│  Verification Summary   │
├────────────────┬────────┤
│ ✅ OK          │ 3      │
│ ⚠️  WARNING    │ 2      │
│ ❌ ERROR       │ 2      │
│ 🔍 NOT_FOUND   │ 1      │
└────────────────┴────────┘

The generated Markdown report looks like:

# BibTeX 引用验证报告

> 验证文件: `paper.bib`  共 8 条引用

## 汇总

| 状态 | 数量 |
|------|------|
| ✅ 正常 (OK) | 3 |
| ⚠️ 警告 (WARNING) | 2 |
| ❌ 错误 (ERROR) | 2 |
| 🔍 未找到 (NOT_FOUND) | 1 |

## ❌ 错误 (ERROR) (2 条)

### `Smith2020vit`
- **标题 (bib)**: An Image is Worth 16x16 Words...
- **验证来源**: OPENALEX (标题匹配度 94%)
- **问题**:
  - 第一作者姓氏不匹配: bib='smith', 实际='dosovitskiy' (相似度 0%)

How It Works / 工作原理

.bib file
    │
    ▼
┌─────────────┐
│   loader    │  Parse entries with bibtexparser
└──────┬──────┘
       │  entry dict
       ▼
┌─────────────────────────────────────────┐
│              Lookup chain               │
│                                         │
│  1. DOI present?                        │
│     └─► CrossRef exact lookup           │
│                                         │
│  2. No DOI / CrossRef miss?             │
│     └─► OpenAlex fuzzy title search     │
│         (rapidfuzz token_sort_ratio)    │
└────────────────────┬────────────────────┘
                     │  api_data dict
                     ▼
           ┌──────────────────┐
           │   comparator     │  Check title / year /
           │                  │  author / count
           └────────┬─────────┘
                    │  result dict
                    ▼
           ┌──────────────────┐
           │     report       │  Markdown + JSON
           └──────────────────┘

Status Levels / 状态说明

Status Meaning
OK All checked fields match within thresholds
WARNING Minor discrepancy (year ±1, too few authors) — review recommended
ERROR Significant mismatch (title or author wrong) — likely an error
NOT_FOUND No matching paper found — possible AI hallucination or misspelling

Data Sources / 数据来源

OpenAlex

  • URL: openalex.org
  • Free and open, no registration or API key required
  • Rate limit: ~10 req/s (tool defaults to ~7 req/s for safety)
  • Coverage: 250M+ works
  • Tip: Providing --email enables the Polite Pool with higher rate limits

CrossRef

  • URL: crossref.org
  • Free, no registration required
  • Used only when a DOI is present in the .bib entry (exact lookup)
  • Coverage: 150M+ DOI-registered works

Match Thresholds / 比对阈值

Thresholds control sensitivity. Lowering them may reduce false positives at the cost of missing real errors.

Parameter Default Controls
--title-threshold 82 Minimum token_sort_ratio score for title fuzzy match
--author-threshold 72 Minimum ratio score for first-author last-name match
Year tolerance ±1 = WARNING, >1 = WARNING Preprints often appear a year before formal publication

Development / 开发指南

# Clone and install in development mode
git clone https://github.com/your-username/bibtex-verifier.git
cd bibtex-verifier
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint
ruff check bibtex_verifier/

Project Structure / 项目结构

bibtex_verifier/
├── __init__.py     # version
├── loader.py       # .bib file parsing
├── apis.py         # OpenAlex & CrossRef clients, HTTP helpers
├── comparator.py   # field-level comparison logic
├── report.py       # Markdown/JSON report generation
└── cli.py          # Typer CLI (bibverify command)
tests/
├── conftest.py     # shared mock data
├── test_loader.py
├── test_apis.py
├── test_comparator.py
└── test_report.py
examples/
└── example_paper.bib   # demo file covering all verification scenarios

Contributing / 贡献

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Add tests for your changes
  4. Ensure pytest tests/ -v and ruff check bibtex_verifier/ both pass
  5. Open a Pull Request

Known Limitations / 已知限制

  • Conference proceedings may have lower match scores due to inconsistent venue naming across databases.
  • Chinese/Japanese author names may trigger false positives in the author comparison; consider raising --author-threshold in such cases.
  • OpenAlex coverage of very old papers (pre-1990) may be incomplete.
  • The tool checks metadata only — it does not verify that the cited content actually supports your claim.

License / 许可证

MIT License. See LICENSE for details.


Citation / 引用

If you use this tool in your research, please cite:

@software{bibtex_verifier2025,
  title   = {BibTeX Verifier: Automatic Reference Validation Against OpenAlex and CrossRef},
  year    = {2025},
  url     = {https://github.com/your-username/bibtex-verifier},
  license = {MIT},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibtex_verifier-0.1.1.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibtex_verifier-0.1.1-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file bibtex_verifier-0.1.1.tar.gz.

File metadata

  • Download URL: bibtex_verifier-0.1.1.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for bibtex_verifier-0.1.1.tar.gz
Algorithm Hash digest
SHA256 49be142848d33f154e777f2ed44b04fbbefe87093273c516db0608c61fb574b3
MD5 3345ef3704ebb1885b45796a2983016e
BLAKE2b-256 6f927ec8d69b412e187951d885bf609d441d88a534703c0c5052f76eb8eebf94

See more details on using hashes here.

File details

Details for the file bibtex_verifier-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bibtex_verifier-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0336601bb94850e6da71947a4fb8187410e84e357bea374120907c4a1a4571f
MD5 deb197ebb8f8f98ef658a51ad42ced63
BLAKE2b-256 0edbc48ae6525b1b983eb921af326f35f614b9d826c2c11a3742adb5b0fa56c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page