Verify .bib file citations against academic databases (Semantic Scholar, DBLP, Open Library)
Project description
HaRC - Hallucinated Reference Checker
Verify BibTeX citations against academic databases. Catches fake, misspelled, or incorrect references in your .bib files.
Supports:
- Papers: Semantic Scholar + DBLP (with DOI/arXiv ID lookup)
- Books: Open Library (with ISBN lookup)
- URLs: Reachability and title verification
Installation
# Using uv (recommended)
uv add harcx
# Using pip
pip install harcx
CLI Usage
# Check a .bib file
harcx references.bib
# Quiet mode (suppress progress output)
harcx references.bib -q
# Also check URL citations
harcx references.bib --check-urls
# Custom author match threshold (default: 0.6)
harcx references.bib --threshold 0.7
# With Semantic Scholar API key (for higher rate limits)
harcx references.bib --api-key YOUR_API_KEY
Example Output
Parsed 50 entries from references.bib
[1/50] Checking (article): smith2023
Trying arXiv ID: 2301.12345
Found (author match: 1.00)
[2/50] Checking (book): goodfellow2016deep
Trying Open Library title search
Found (author match: 0.75)
[3/50] Checking (article): fake2023
Trying Semantic Scholar title search
Trying DBLP title search
ISSUE: Not found in Semantic Scholar or DBLP
============================================================
Found 1 entries requiring attention:
============================================================
[fake2023]
Title: This Paper Does Not Exist
Bib Authors: fake fakerson
Year: 2023
Issue: Not found in Semantic Scholar or DBLP
Python API
from reference_checker import check_citations, check_web_citations
# Check citations - returns entries that weren't verified
issues = check_citations("references.bib")
for result in issues:
print(f"{result.entry.key}: {result.message}")
# Check URL citations
url_issues = check_web_citations("references.bib")
for result in url_issues:
print(f"{result.entry.key}: {result.url} - {result.message}")
API Reference
def check_citations(
bib_file: str,
author_threshold: float = 0.6,
year_tolerance: int = 1,
api_key: str | None = None,
verbose: bool = False,
) -> list[CheckResult]
def check_web_citations(
bib_file: str,
title_threshold: float = 0.6,
verbose: bool = False,
) -> list[WebCheckResult]
How It Works
- Parse - Reads
.bibfile and extracts entries - Lookup - Tries DOI → arXiv ID → title search (papers) or ISBN → title search (books)
- Match - Compares authors using fuzzy matching
- Report - Returns entries that couldn't be verified
A citation is verified when:
- Found in a database (Semantic Scholar, DBLP, or Open Library)
- Author match score ≥ threshold (default: 60%)
- Year matches within tolerance (default: ±1 year)
Development
git clone https://github.com/YOUR_USERNAME/HaRC.git
cd HaRC
uv sync --all-extras
uv run pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
harcx-0.1.1.tar.gz
(15.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
harcx-0.1.1-py3-none-any.whl
(17.7 kB
view details)
File details
Details for the file harcx-0.1.1.tar.gz.
File metadata
- Download URL: harcx-0.1.1.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad71a027311ddc339a897127bcdd52cc33e79e5a559a68f99f24996e2310fe7f
|
|
| MD5 |
5ceca0d2d61c6ad7b881e7e7c2ab1f1b
|
|
| BLAKE2b-256 |
e4ac03de3f8c19f8599e9a08d7c9ce2669c3c981c90d4de9eca86fe5a310eea2
|
File details
Details for the file harcx-0.1.1-py3-none-any.whl.
File metadata
- Download URL: harcx-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
736faf5706833529cb902efa7f4067507d2f181fd3d06f658edfe459d8c50b30
|
|
| MD5 |
603bf6dd83cee8a65787dce86b0e7021
|
|
| BLAKE2b-256 |
bd8556fb9bf57b1f549df51dce975eb0269a1b9291d46c56f837b449561ce8c7
|