Detect hallucinated and broken citations in academic papers
Project description
bibguard
Detect hallucinated and broken citations in academic papers.
One command to verify every reference in your .bib file against five scholarly databases. Catches phantom DOIs, fabricated arXiv IDs, author mismatches, retracted papers, and AI-hallucinated citations.
pip install bibguard # Python
npx bibguard paper.bib # Node.js (zero install)
Landing Page | PyPI | npm | Desktop App | Browser Extension | Changelog
Why
Large language models hallucinate citations. Copy-paste errors corrupt metadata. Retracted papers slip through review. bibguard catches these problems before submission.
- 5 sources: arXiv, Crossref, DBLP, Semantic Scholar, OpenAlex
- Phantom ID detection: Valid-format DOI/arXiv that doesn't resolve = hallucination signal
- Kill-shot logic: A phantom ID cannot be overridden by a similar search result
- TeX cross-audit: Find
\cite{key}with no.bibentry, and orphan entries never cited - Duplicate detection: Flag near-identical entries with different keys
- Auto-fix: Generate a corrected
.bibwith missing DOIs and eprint IDs filled in - Type-aware:
@misc/@onlineentries won't false-alarm as "hallucinated" - Zero heavy dependencies: Core requires only
requests+bibtexparser
Choose Your Tool
| If you are... | Use | Install |
|---|---|---|
| A researcher who prefers GUI | Desktop App | Download .dmg / .msi / .AppImage — double-click, done |
| A LaTeX user comfortable with terminal | Python CLI | pip install bibguard |
| A JS/TS developer or CI pipeline | npm package | npx bibguard paper.bib |
| Checking citations while browsing | Browser Extension | Chrome / Firefox |
| Need semantic NLI + Bayesian scoring | IntegriRef | Full L0–L4 stack |
Install
# Python
pip install bibguard # minimal
pip install bibguard[fast] # + RapidFuzz for better title matching
pip install bibguard[all] # + RapidFuzz + PyMuPDF for PDF parsing
# Node.js / TypeScript (zero dependencies)
npx bibguard paper.bib # run directly
npm install bibguard # as library
# Desktop app (no terminal needed)
# Download from https://github.com/GeoffreyWang1117/bibguard-desktop/releases
# Browser extension
# Download from https://github.com/GeoffreyWang1117/bibguard-ext
Python requires 3.9+. Node.js requires 18+. Desktop app requires no dependencies.
Usage
CLI
# Basic: verify all entries in a .bib file
bibguard references.bib
# With TeX cross-audit (finds phantom \cite and orphan entries)
bibguard references.bib --tex main.tex
# Save report + auto-fix
bibguard references.bib --tex main.tex --out report.md --fix fixed.bib
# Parallel verification (auto-selects workers for large files)
bibguard references.bib -w 4
# JSON output (for CI pipelines)
bibguard references.bib --json --out report.json
Python API
from bibguard import verify_bib, verify_entry
# Verify entire .bib file (parallel with 4 workers)
results, report = verify_bib("references.bib", tex_path="main.tex", workers=4)
for r in results:
if r.overall != "OK":
print(f"{r.overall}: {r.key} -- {r.title}")
# Verify a single entry
from bibguard.parsers.bibtex import parse_bib
entries = parse_bib("references.bib")
result = verify_entry(entries[0])
print(result.overall, result.checks)
TypeScript / Browser
import { parseBib, verifyAll } from "bibguard";
const entries = parseBib(bibText);
const results = await verifyAll(entries, (i, total, key, status) => {
console.log(`[${i}/${total}] ${key}: ${status}`);
});
All 5 APIs support CORS — works directly in the browser without a proxy.
Exit codes
| Code | Meaning |
|---|---|
| 0 | All entries OK or WARN |
| 1 | At least one FAIL |
| 2 | Input error (file not found) |
Benchmark
Golden test set (58 cases)
Reproduce with python tests/bench_golden.py.
| Category | Metric | Result |
|---|---|---|
| Hallucinated (14 fabricated) | Detected as FAIL | 14/14 (100%) |
| Chimera (5 mixed-metadata) | Detected as >= WARN | 5/5 (100%) |
| Real papers (10 legitimate) | False positive (FAIL) | 0/10 (0%) |
Large-scale validation (200 cases)
Sampled from crawled datasets (800 hallucinated, 400 chimera, 656 real, 153 retracted). Reproduce with python tests/bench_large.py.
| Category | N | OK | WARN | FAIL | Key metric |
|---|---|---|---|---|---|
| Hallucinated | 50 | 0 | 0 | 50 | 100% detected (all FAIL) |
| Chimera | 50 | 0 | 18 | 32 | 100% detected |
| Real papers | 50 | 43 | 7 | 0 | 86% clean, 0% false positive |
| Retracted | 50 | 49 | 1 | 0 | 2% flagged (L0 limitation) |
For semantic NLI, citation graph analysis, and Bayesian risk scoring, see IntegriRef.
AI Coding Assistant Integration
bibguard ships with skill/rule definitions for major AI coding assistants.
Claude Code
mkdir -p ~/.claude/commands
curl -o ~/.claude/commands/bibguard.md \
https://raw.githubusercontent.com/GeoffreyWang1117/bibguard/main/.claude/commands/bibguard.md
Then use /bibguard paper.bib in Claude Code.
OpenAI Codex CLI
mkdir -p ~/.codex/skills/bibguard
curl -o ~/.codex/skills/bibguard/SKILL.md \
https://raw.githubusercontent.com/GeoffreyWang1117/bibguard/main/.codex/skills/bibguard/SKILL.md
Cursor
mkdir -p .cursor/rules
curl -o .cursor/rules/bibguard.md \
https://raw.githubusercontent.com/GeoffreyWang1117/bibguard/main/.cursor/rules/bibguard.md
Any other assistant
bibguard paper.bib --json --out report.json
API sources
| Source | Lookup method | CORS | Coverage |
|---|---|---|---|
| arXiv | ID resolution | Yes | CS, Physics, Math |
| Crossref | DOI resolution | Yes | 150M+ records |
| DBLP | Title search | Yes | CS papers |
| Semantic Scholar | Title search | Yes | 200M+ papers |
| OpenAlex | Title search | Yes | 250M+ works |
All queries respect rate limits. No API keys required.
Contributing
git clone https://github.com/GeoffreyWang1117/bibguard.git
cd bibguard
pip install -e ".[dev]"
pytest
Related
- IntegriRef -- Full L0-L4 verification stack with semantic NLI (93.5%), citation graph analysis, and Bayesian risk scoring
- bibguard-desktop -- Cross-platform desktop app (Windows/macOS/Linux, no dependencies)
- bibguard-js -- TypeScript version (zero deps, browser-native)
- bibguard-ext -- Chrome/Firefox browser extension
Contributors
See CONTRIBUTORS.md for detailed attribution.
- Geoffrey Wang -- Architecture, core algorithms, phantom-ID detection, kill-shot logic, benchmark design
- Claude (Anthropic) -- Modular refactoring, output formatting, packaging, documentation
Support
If you find bibguard useful, please consider giving it a star on GitHub — it helps others discover the project.
Have a bug report, feature request, or suggestion? Open an issue — all feedback is welcome.
Roadmap
Parallel verification— Shipped in v0.3.1:--workers Nflag for concurrent verification via thread pool- Async I/O — migrate from
requeststoasyncio+aiohttpfor further speedup on 500+ entry files - Batch API queries — leverage Crossref, Semantic Scholar, and OpenAlex batch endpoints to reduce per-entry overhead
- Caching layer — local cache for repeated lookups across runs
- Retracted paper detection — integrate Retraction Watch database for L0-level retraction flagging
License
Apache License 2.0. See LICENSE for details.
中文说明
bibguard — 学术论文引用幻觉检测工具
一行命令,检测论文中的虚假引用。 Python + TypeScript + 桌面端 + 浏览器扩展。
选择你的工具
| 使用场景 | 推荐方案 | 安装方式 |
|---|---|---|
| 不想用命令行的研究者 | 桌面版 | 下载 .dmg / .msi / .AppImage,双击即用 |
| 熟悉终端的 LaTeX 用户 | Python CLI | pip install bibguard |
| JS/TS 开发者或 CI 流水线 | npm 包 | npx bibguard paper.bib |
| 浏览网页时顺手检查引用 | 浏览器扩展 | Chrome / Firefox |
| 需要语义 NLI + 贝叶斯评分 | IntegriRef | 完整 L0–L4 验证栈 |
pip install bibguard # Python
npx bibguard paper.bib # Node.js
核心能力
- 幽灵 DOI / arXiv ID 检测 -- 格式正确但不存在 = 最强幻觉信号
- Kill-shot 逻辑 -- 幽灵 ID 不会被相似论文的搜索结果覆盖
- @misc 类型感知 -- 非论文条目(新闻、文档等)不会误报为幻觉
- TeX 交叉审计 -- 检测
\cite{key}在.bib中无定义的条目 - 自动修复 -- 补全缺失的 DOI 和 eprint 字段
基准测试
| 测试集 | 幻觉检出 | 真论文假阳 (FAIL) | 真论文 OK 率 |
|---|---|---|---|
| Golden (58 条) | 100% (14/14) | 0% | -- |
| 大规模 (200 条) | 100% (50/50) | 0% | 86% |
TypeScript 版零依赖,可直接在浏览器运行(5 个 API 全支持 CORS)。支持 Claude Code / Codex / Cursor。
完整版请使用 IntegriRef(L0-L4,含语义 NLI 93.5%、贝叶斯风险评分)。
支持项目
如果 bibguard 对你有帮助,欢迎在 GitHub 上点个 Star ⭐,帮助更多人发现这个工具。有建议或问题?欢迎提 Issue。
路线图
并行验证— 已在 v0.3.1 实现:--workers N线程池并行验证- 异步 I/O — 从
requests迁移到asyncio+aiohttp,进一步加速 500+ 条目文件 - 批量 API 查询 — 利用 Crossref、Semantic Scholar、OpenAlex 批量接口减少逐条开销
- 本地缓存 — 跨运行缓存已验证条目
- 撤稿论文检测 — 对接 Retraction Watch 数据库
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bibguard-0.4.0.tar.gz.
File metadata
- Download URL: bibguard-0.4.0.tar.gz
- Upload date:
- Size: 286.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19153b47a842bbe7f7eaaaa68a71d885babee279d54b96801e53a52f28df8a3c
|
|
| MD5 |
17e5ad240116950904fb834495055715
|
|
| BLAKE2b-256 |
a460b08a210b499a3130f875811d23dce4649cce357897a04b63a97ec8ec6b45
|
File details
Details for the file bibguard-0.4.0-py3-none-any.whl.
File metadata
- Download URL: bibguard-0.4.0-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c66b7b33d6f0dcf8c11f84b4ad45b5a5d81856e1fe4d1575bf1ff85ea93b716
|
|
| MD5 |
6f9fe7b5fbd4b9ba01274723b1ef05a5
|
|
| BLAKE2b-256 |
9ea6909890aae6e174e08efc7371f5c90afbe5f489c6b35ebf416312f52b8603
|