Skip to main content

Verify .bib file citations against academic databases (Semantic Scholar, DBLP, Open Library)

Project description

HaRC - Hallucinated Reference Checker

Verify BibTeX citations against academic databases. Catches fake, misspelled, or incorrect references in your .bib files.

Supports:

  • Papers: Semantic Scholar + DBLP (with DOI/arXiv ID lookup)
  • Books: Open Library (with ISBN lookup)
  • URLs: Reachability and title verification

Installation

# Using uv (recommended)
uv add harcx

# Using pip
pip install harcx

CLI Usage

# Check a .bib file
harcx references.bib

# Quiet mode (suppress progress output)
harcx references.bib -q

# Also check URL citations
harcx references.bib --check-urls

# Custom author match threshold (default: 0.6)
harcx references.bib --threshold 0.7

# With Semantic Scholar API key (for higher rate limits)
harcx references.bib --api-key YOUR_API_KEY

Example Output

Parsed 50 entries from references.bib
[1/50] Checking (article): smith2023
    Trying arXiv ID: 2301.12345
  Found (author match: 1.00)
[2/50] Checking (book): goodfellow2016deep
    Trying Open Library title search
  Found (author match: 0.75)
[3/50] Checking (article): fake2023
    Trying Semantic Scholar title search
    Trying DBLP title search
  ISSUE: Not found in Semantic Scholar or DBLP

============================================================
Found 1 entries requiring attention:
============================================================

[fake2023]
  Title: This Paper Does Not Exist
  Bib Authors: fake fakerson
  Year: 2023
  Issue: Not found in Semantic Scholar or DBLP

Python API

from reference_checker import check_citations, check_web_citations

# Check citations - returns entries that weren't verified
issues = check_citations("references.bib")

for result in issues:
    print(f"{result.entry.key}: {result.message}")

# Check URL citations
url_issues = check_web_citations("references.bib")

for result in url_issues:
    print(f"{result.entry.key}: {result.url} - {result.message}")

API Reference

def check_citations(
    bib_file: str,
    author_threshold: float = 0.6,
    year_tolerance: int = 1,
    api_key: str | None = None,
    verbose: bool = False,
) -> list[CheckResult]
def check_web_citations(
    bib_file: str,
    title_threshold: float = 0.6,
    verbose: bool = False,
) -> list[WebCheckResult]

How It Works

  1. Parse - Reads .bib file and extracts entries
  2. Lookup - Tries DOI → arXiv ID → title search (papers) or ISBN → title search (books)
  3. Match - Compares authors using fuzzy matching
  4. Report - Returns entries that couldn't be verified

A citation is verified when:

  • Found in a database (Semantic Scholar, DBLP, or Open Library)
  • Author match score ≥ threshold (default: 60%)
  • Year matches within tolerance (default: ±1 year)

Development

git clone https://github.com/YOUR_USERNAME/HaRC.git
cd HaRC
uv sync --all-extras
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harcx-0.1.1.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harcx-0.1.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file harcx-0.1.1.tar.gz.

File metadata

  • Download URL: harcx-0.1.1.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for harcx-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ad71a027311ddc339a897127bcdd52cc33e79e5a559a68f99f24996e2310fe7f
MD5 5ceca0d2d61c6ad7b881e7e7c2ab1f1b
BLAKE2b-256 e4ac03de3f8c19f8599e9a08d7c9ce2669c3c981c90d4de9eca86fe5a310eea2

See more details on using hashes here.

File details

Details for the file harcx-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: harcx-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for harcx-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 736faf5706833529cb902efa7f4067507d2f181fd3d06f658edfe459d8c50b30
MD5 603bf6dd83cee8a65787dce86b0e7021
BLAKE2b-256 bd8556fb9bf57b1f549df51dce975eb0269a1b9291d46c56f837b449561ce8c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page