Verify .bib file citations against academic databases (Semantic Scholar, DBLP, Google Scholar, Open Library)
Project description
HaRC - Hallucinated Reference Checker
Verify BibTeX citations against academic databases. Catches fake, misspelled, or incorrect references in your .bib files before submission.
Features
| Source | Lookup Methods | Entry Types |
|---|---|---|
| Semantic Scholar | DOI, arXiv ID, title search | Papers |
| DBLP | Title search | Papers |
| Google Scholar | Title search | Papers |
| Open Library | ISBN, title search | Books |
Additional capabilities:
- Fuzzy author matching - Handles name variations, initials, and spelling differences
- URL verification - Checks reachability and title matching for web citations
- Smart fallback - Tries multiple databases until a valid match is found
Installation
# Using uv (recommended)
uv add harcx
# Using pip
pip install harcx
Quick Start
# Basic usage
harcx references.bib
# Also verify URL citations
harcx references.bib --check-urls
# Quiet mode (errors only)
harcx references.bib -q
CLI Reference
harcx [OPTIONS] BIB_FILE
Options:
-q, --quiet Suppress progress output
--threshold FLOAT Author match threshold (0.0-1.0, default: 0.6)
--api-key KEY Semantic Scholar API key for higher rate limits
--check-urls Verify URL citations for reachability
--title-threshold FLOAT URL title match threshold (0.0-1.0, default: 0.6)
-h, --help Show help message
Example Output
Parsed 50 entries from references.bib
[1/50] Checking (article): smith2023
Trying arXiv ID: 2301.12345
Found (author match: 1.00)
[2/50] Checking (book): goodfellow2016deep
Trying Open Library title search
Found (author match: 0.75)
[3/50] Checking (article): suspicious2023
Trying Semantic Scholar title search
Trying DBLP title search
Trying Google Scholar title search
ISSUE: Not found in Semantic Scholar, DBLP, or Google Scholar
============================================================
Found 1 entries requiring attention:
============================================================
[suspicious2023]
Title: This Paper Does Not Exist
Bib Authors: Suspicious Author
Year: 2023
Issue: Not found in Semantic Scholar, DBLP, or Google Scholar
Python API
from reference_checker import check_citations, check_web_citations
# Check citations - returns entries that weren't verified
issues = check_citations("references.bib")
for result in issues:
print(f"{result.entry.key}: {result.message}")
# Check URL citations
url_issues = check_web_citations("references.bib")
for result in url_issues:
print(f"{result.entry.key}: {result.url} - {result.message}")
Function Signatures
def check_citations(
bib_file: str,
author_threshold: float = 0.6, # Minimum author match score
year_tolerance: int = 1, # Allowed year difference (±)
api_key: str | None = None, # Semantic Scholar API key
verbose: bool = False, # Print progress
) -> list[CheckResult]
def check_web_citations(
bib_file: str,
title_threshold: float = 0.6, # Minimum title match score
verbose: bool = False, # Print progress
) -> list[WebCheckResult]
How It Works
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Parse .bib │ ──▶ │ Lookup │ ──▶ │ Fuzzy Match │ ──▶ │ Report │
│ file │ │ (DOI/title) │ │ Authors │ │ Issues │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────┘
Lookup Order (Papers):
- DOI lookup (Semantic Scholar)
- arXiv ID lookup (Semantic Scholar)
- Title search (Semantic Scholar → DBLP → Google Scholar)
Lookup Order (Books):
- ISBN lookup (Open Library)
- Title search (Open Library → Semantic Scholar → DBLP → Google Scholar)
A citation is verified when:
- Found in at least one database
- Author match score ≥ threshold (default: 60%)
- Year matches within tolerance (default: ±1 year)
Rate Limits
- Semantic Scholar: ~3 req/sec (faster with API key)
- DBLP: ~1 req/sec
- Google Scholar: ~0.5 req/sec (may block excessive requests)
- Open Library: ~1 req/sec
Get a free Semantic Scholar API key at semanticscholar.org/product/api
Development
git clone https://github.com/gurusha01/HaRC.git
cd HaRC
uv sync --all-extras
uv run pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harcx-0.2.0.tar.gz.
File metadata
- Download URL: harcx-0.2.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
373431700a68630cf3f56fba2354b3300f147ae64d3a3dee2faadc3026a2cb09
|
|
| MD5 |
c561436ebb24b8744751d972dc2309e4
|
|
| BLAKE2b-256 |
c1b8bede401915b0ac1d1ab84292dd57555b53941895d4439f3f1f8324cba0b7
|
File details
Details for the file harcx-0.2.0-py3-none-any.whl.
File metadata
- Download URL: harcx-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
329ea9e42f653f876b4388c881b7209e4df666ca34b5d928fa987e4998857e80
|
|
| MD5 |
75888dc95d7a21600348f32bf113bb00
|
|
| BLAKE2b-256 |
065b0bacc3d88b08027c6e10e5523e85d18bf6093aac58c013d4ccac9db41c4f
|