Verify BibTeX references against OpenAlex & CrossRef APIs to detect errors and AI hallucinations
Project description
BibTeX Verifier
BibTeX Verifier is an open-source CLI tool that automatically validates every reference in a .bib file against two authoritative academic databases — OpenAlex and CrossRef — to catch typos, wrong years, misattributed authors, and AI-hallucinated citations before they reach your paper.
BibTeX 引用验证工具 是一个开源命令行工具,通过调用 OpenAlex 和 CrossRef 两个权威学术数据库,自动核验
.bib文件中每条引用的标题、作者、年份等元数据,帮助研究者在论文提交前发现引用错误和 AI 幻觉引用。
Features / 功能特性
| Feature | Description |
|---|---|
| AI hallucination detection | Flags papers that simply do not exist in any academic database |
| Dual-source verification | CrossRef (exact DOI lookup) + OpenAlex (fuzzy title search) |
| Field-level checking | Title, year, first-author last name, author count |
| Markdown report | Human-readable report with per-entry details and a summary table |
| JSON output | Machine-readable raw results for further processing |
| CLI & Python API | Use as a command or import as a library |
| No registration needed | OpenAlex is free and open; CrossRef is public |
| Rate-limit safe | Built-in throttling and exponential back-off on HTTP errors |
Quick Start / 快速开始
pip install bibtex-verifier
bibverify my_paper.bib
This generates my_paper.report.md with a full verification report.
Installation / 安装
From PyPI:
pip install bibtex-verifier
From source:
git clone https://github.com/Altman-conquer/bibtex-verifier.git
cd bibtex-verifier
pip install -e .
Requirements: Python 3.9+, no API keys required.
Usage / 使用方法
CLI
# Basic usage
bibverify paper.bib
# Save report to a custom path
bibverify paper.bib --output reports/verification.md
# Also export raw JSON results
bibverify paper.bib --json
# Use your email for higher API rate limits (Polite Pool)
bibverify paper.bib --email you@university.edu
# Adjust fuzzy-match thresholds
bibverify paper.bib --title-threshold 85 --author-threshold 70
All Options / 参数说明
| Option | Default | Description |
|---|---|---|
BIB_FILE |
— | Path to the .bib file to verify |
--output / -o |
<bib>.report.md |
Report output file path |
--json |
false |
Also write a .json results file |
--title-threshold |
82 |
Minimum fuzzy score for title match (0–100) |
--author-threshold |
72 |
Minimum fuzzy score for author match (0–100) |
--email |
— | Email for Polite Pool (faster rate limits) |
--rate-limit |
0.15 |
Seconds between API calls |
--version / -V |
— | Show version and exit |
Python API
from pathlib import Path
from bibtex_verifier.loader import load_bib
from bibtex_verifier.apis import oa_search, oa_extract, crossref_by_doi, crossref_extract
from bibtex_verifier.comparator import compare_entry
from bibtex_verifier.report import build_markdown_report
entries = load_bib(Path("paper.bib"))
results = []
for entry in entries:
# Try CrossRef first if DOI is available
api_data, source, score = None, None, 0
if entry.get("doi"):
msg = crossref_by_doi(entry["doi"])
if msg:
api_data = crossref_extract(msg)
source = "crossref"
score = 100 # DOI is exact
# Fall back to OpenAlex
if not source:
paper = oa_search(entry.get("title", ""))
if paper:
api_data = oa_extract(paper)
source = "openalex"
score = paper["_match_score"]
result = compare_entry(entry, api_data=api_data, source=source, match_score=score)
results.append(result)
print(build_markdown_report(results, bib_filename="paper.bib"))
Sample Output / 输出示例
BibTeX Verifier v0.1.0
Parsing paper.bib ...
Found 8 entries — estimated time: ~5s
[ 1/8] Vaswani2017attention [DOI] OK score= 98%
[ 2/8] He2016resnet OK score= 95%
[ 3/8] Touvron2023llama WARN score= 97%
[ 4/8] Brown2020gpt3 WARN score= 99%
[ 5/8] Smith2020vit ERR score= 94%
[ 6/8] Devlin2019bert ERR score= 81%
[ 7/8] Johnson2021hallucinated N/F score= 0%
[ 8/8] LeCun1989backprop [DOI] OK score=100%
Done! Report saved to paper.report.md
┌─────────────────────────┐
│ Verification Summary │
├────────────────┬────────┤
│ ✅ OK │ 3 │
│ ⚠️ WARNING │ 2 │
│ ❌ ERROR │ 2 │
│ 🔍 NOT_FOUND │ 1 │
└────────────────┴────────┘
The generated Markdown report looks like:
# BibTeX 引用验证报告
> 验证文件: `paper.bib` 共 8 条引用
## 汇总
| 状态 | 数量 |
|------|------|
| ✅ 正常 (OK) | 3 |
| ⚠️ 警告 (WARNING) | 2 |
| ❌ 错误 (ERROR) | 2 |
| 🔍 未找到 (NOT_FOUND) | 1 |
## ❌ 错误 (ERROR) (2 条)
### `Smith2020vit`
- **标题 (bib)**: An Image is Worth 16x16 Words...
- **验证来源**: OPENALEX (标题匹配度 94%)
- **问题**:
- 第一作者姓氏不匹配: bib='smith', 实际='dosovitskiy' (相似度 0%)
How It Works / 工作原理
.bib file
│
▼
┌─────────────┐
│ loader │ Parse entries with bibtexparser
└──────┬──────┘
│ entry dict
▼
┌─────────────────────────────────────────┐
│ Lookup chain │
│ │
│ 1. DOI present? │
│ └─► CrossRef exact lookup │
│ │
│ 2. No DOI / CrossRef miss? │
│ └─► OpenAlex fuzzy title search │
│ (rapidfuzz token_sort_ratio) │
└────────────────────┬────────────────────┘
│ api_data dict
▼
┌──────────────────┐
│ comparator │ Check title / year /
│ │ author / count
└────────┬─────────┘
│ result dict
▼
┌──────────────────┐
│ report │ Markdown + JSON
└──────────────────┘
Status Levels / 状态说明
| Status | Meaning |
|---|---|
| OK | All checked fields match within thresholds |
| WARNING | Minor discrepancy (year ±1, too few authors) — review recommended |
| ERROR | Significant mismatch (title or author wrong) — likely an error |
| NOT_FOUND | No matching paper found — possible AI hallucination or misspelling |
Data Sources / 数据来源
OpenAlex
- URL: openalex.org
- Free and open, no registration or API key required
- Rate limit: ~10 req/s (tool defaults to ~7 req/s for safety)
- Coverage: 250M+ works
- Tip: Providing
--emailenables the Polite Pool with higher rate limits
CrossRef
- URL: crossref.org
- Free, no registration required
- Used only when a DOI is present in the
.bibentry (exact lookup) - Coverage: 150M+ DOI-registered works
Match Thresholds / 比对阈值
Thresholds control sensitivity. Lowering them may reduce false positives at the cost of missing real errors.
| Parameter | Default | Controls |
|---|---|---|
--title-threshold |
82 | Minimum token_sort_ratio score for title fuzzy match |
--author-threshold |
72 | Minimum ratio score for first-author last-name match |
| Year tolerance | ±1 = WARNING, >1 = WARNING | Preprints often appear a year before formal publication |
Development / 开发指南
# Clone and install in development mode
git clone https://github.com/Altman-conquer/bibtex-verifier.git
cd bibtex-verifier
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Lint
ruff check bibtex_verifier/
Project Structure / 项目结构
bibtex_verifier/
├── __init__.py # version
├── loader.py # .bib file parsing
├── apis.py # OpenAlex & CrossRef clients, HTTP helpers
├── comparator.py # field-level comparison logic
├── report.py # Markdown/JSON report generation
└── cli.py # Typer CLI (bibverify command)
tests/
├── conftest.py # shared mock data
├── test_loader.py
├── test_apis.py
├── test_comparator.py
└── test_report.py
examples/
└── example_paper.bib # demo file covering all verification scenarios
Contributing / 贡献
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/my-feature) - Add tests for your changes
- Ensure
pytest tests/ -vandruff check bibtex_verifier/both pass - Open a Pull Request
Known Limitations / 已知限制
- Conference proceedings may have lower match scores due to inconsistent venue naming across databases.
- Chinese/Japanese author names may trigger false positives in the author comparison; consider raising
--author-thresholdin such cases. - OpenAlex coverage of very old papers (pre-1990) may be incomplete.
- The tool checks metadata only — it does not verify that the cited content actually supports your claim.
License / 许可证
MIT License. See LICENSE for details.
Citation / 引用
If you use this tool in your research, please cite:
@software{bibtex_verifier2025,
title = {BibTeX Verifier: Automatic Reference Validation Against OpenAlex and CrossRef},
year = {2025},
url = {https://github.com/Altman-conquer/bibtex-verifier},
license = {MIT},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bibtex_verifier-0.1.0.tar.gz.
File metadata
- Download URL: bibtex_verifier-0.1.0.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51fd00453d6c4cd7af43c8c773d42faeed8a4241d6741c4c31b2657743579564
|
|
| MD5 |
32780028d1d64bd8c200a2eca1195a01
|
|
| BLAKE2b-256 |
9a19a0703bf59852c2c8e3b414db5dd8d2571158dd2c7a608c4e000a86165784
|
File details
Details for the file bibtex_verifier-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bibtex_verifier-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58f1e56e01a0f1c2edd390a2b5a88fb0b7c8f0808e171bd682e7f444b5b3464b
|
|
| MD5 |
5dcd03a4346d7172b88659db0bbcd4c3
|
|
| BLAKE2b-256 |
11351def1c9505a24abf7a93fd17165373bc81e4d8756a85cfc8d03843fe508d
|