Skip to main content

Academic reference verification tool with multi-database search and AI-powered fraud detection

Project description

VerifyRef

License: GPL v3 Python 3.8+

A tool for verifying the authenticity of academic references in PDF documents using multiple academic databases and optional AI-powered analysis.

Important Note for Reviewers
This tool may produce false positives — authentic references can sometimes be flagged as suspicious or unverified. This can happen due to:

  • New papers not yet indexed in databases
  • Author name format variations (e.g., "J. Smith" vs "John Smith")
  • Regional or specialized venues with limited database coverage
  • OCR/extraction errors from PDF processing

Always manually verify flagged references before making decisions. VerifyRef is a screening tool to assist human reviewers, not a replacement for careful manual checking.

Why VerifyRef?

While reviewing a journal submission, I found a reference that listed my brother, a businessman with no connection to cryptography, as a co-author of a paper on symmetric-key cryptanalysis with a well-known researcher. My brother had nothing to do with this paper. This triggered me to inspect that reference and others in the paper, which turned out to be partially AI-generated with multiple fake references.

Manually checking dozens of references was time-consuming, so I created VerifyRef to automatically extract and verify references against trusted academic databases. Here is the summary of the output for that paper:

                   Verification Summary                   
╭──────────────────────────┬───────┬────────────┬────────╮
│ Classification           │ Count │ Percentage │ Status │
├──────────────────────────┼───────┼────────────┼────────┤
│ [+] AUTHENTIC            │    11 │      61.1% │   *    │
│ [?] SUSPICIOUS           │     6 │      33.3% │   *    │
│ [X] FAKE                 │     0 │       0.0% │   -    │
│ [~] AUTHOR MANIPULATION  │     1 │       5.6% │   *    │
│ [-] FABRICATED           │     0 │       0.0% │   -    │
│ [!] INCONCLUSIVE         │     0 │       0.0% │   -    │
╰──────────────────────────┴───────┴────────────┴────────╯

[REVIEW RECOMMENDED] Some references require manual verification

This tool helps reviewers quickly identify potentially problematic references and AI-generated content, making the peer review process more efficient. Note that VerifyRef is not a replacement for human judgment but a powerful assistant to streamline the verification process. The tool may occasionally misclassify authentic references, so always double-check flagged items manually.

Features

  • Multi-database verification across 8+ academic databases
  • PDF processing using GROBID (works out of the box with public server)
  • Retraction detection via CrossRef and Retraction Watch
  • Author manipulation detection (real titles with fake authors)
  • Optional AI verification using free (Gemini, Groq, Ollama) or paid (OpenAI) providers
  • Book reference handling for textbooks that may not appear in paper databases
  • Parallel processing with multi-threaded database queries
  • JSON and text output formats

Installation

From PyPI (Recommended)

pip install verifyref

# Run verification
verifyref paper.pdf -o results.txt

From Source

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run verification (uses public GROBID server automatically)
python verifyref.py paper.pdf -o results.txt

Docker Installation

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
docker build -t verifyref .

# Interactive mode
docker run -it --rm -v "$(pwd):/app/workspace" verifyref

# Inside the container:
cd /app/workspace/
verifyref paper.pdf -o results.txt

Local GROBID (Optional)

For faster processing or privacy, run GROBID locally:

docker run -d -p 8070:8070 lfoppiano/grobid:0.8.2
export GROBID_URL="http://localhost:8070"
python verifyref.py paper.pdf

VerifyRef automatically detects and uses local GROBID when available.

Usage

Basic Usage

# Verify references in a PDF
python verifyref.py paper.pdf -o results.txt

# Search for a specific citation
python verifyref.py --cite "Differential Cryptanalysis of DES"

# Verify a single reference
python verifyref.py --verify "Author, A.: Title. Venue, 2024"

Advanced Options

# Verification rigor levels
python verifyref.py paper.pdf --rigor strict    # High precision
python verifyref.py paper.pdf --rigor balanced  # Default
python verifyref.py paper.pdf --rigor lenient   # High recall

# Context-aware search
python verifyref.py --cite "cryptanalysis" --context cs
python verifyref.py --cite "gene therapy" --context bio

# AI-enhanced verification
python verifyref.py paper.pdf --enable-ai

# Verbose output
python verifyref.py paper.pdf --verbose

AI Verification Setup

VerifyRef supports multiple AI providers. Ollama is recommended for unlimited free usage:

# Option 1: Ollama (free, local, no rate limits)
brew install ollama
ollama serve
ollama pull llama3.2
export AI_PROVIDER="ollama"
python verifyref.py paper.pdf --enable-ai

# Option 2: Google Gemini (free tier)
export AI_PROVIDER="gemini"
export GOOGLE_GEMINI_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

# Option 3: Groq (free tier)
export AI_PROVIDER="groq"
export GROQ_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

Classification System

VerifyRef uses a 5-category system to evaluate reference authenticity:

Category Criteria Action
AUTHENTIC High similarity (>55%), multiple database matches Accept
SUSPICIOUS Moderate similarity (25-55%), limited evidence Manual review
FABRICATED Very low similarity (<25%), no database matches Investigate
AUTHOR_MANIPULATION Title matches but authors differ significantly Flag misconduct
INCONCLUSIVE Parsing errors, books, or network issues Re-verify

Retracted papers are flagged with a warning regardless of classification.

Database Integration

Primary Databases (no API key required):

  • OpenAlex - Comprehensive coverage (200M+ works)
  • DBLP - Computer Science
  • IACR - Cryptography
  • ArXiv - Preprints
  • CrossRef - DOI metadata and retraction status

Enhanced with API Keys (optional):

  • Semantic Scholar - Higher rate limits
  • PubMed - Biomedical (NCBI key)
  • Springer Nature - STM publications

Smart Fallback:

  • Google Scholar - Used only when other databases find poor matches (<70% similarity)

Configuration

Edit config.py to configure:

# Required
CROSSREF_EMAIL = "your.email@domain.com"

# Optional API keys
SEMANTIC_SCHOLAR_API_KEY = ""
NCBI_API_KEY = ""
SPRINGER_API_KEY = ""

# AI providers (for --enable-ai)
GOOGLE_GEMINI_API_KEY = ""
GROQ_API_KEY = ""
OPENAI_API_KEY = ""

# Database toggles
ENABLE_CROSSREF = True
ENABLE_GOOGLE_SCHOLAR = True

GROBID Configuration

VerifyRef uses a smart fallback chain for PDF processing:

  1. Public GROBID server (default, no setup required)
  2. Local GROBID (if running on localhost:8070)
  3. PyMuPDF fallback (lower accuracy, used when GROBID unavailable)

Override the default GROBID URL:

export GROBID_URL="http://localhost:8070"

Project Structure

verifyref/
├── verifyref.py              # CLI entry point
├── config.py                 # Configuration
├── grobid/
│   ├── client.py             # GROBID client with smart fallback
│   └── fallback_parser.py    # PyMuPDF fallback parser
├── extractor/
│   └── reference_parser.py   # Reference parsing
├── verifier/
│   ├── multi_database_verifier.py
│   ├── classifier.py         # Classification logic
│   ├── ai_verifier.py        # AI verification
│   ├── doi_validation_client.py  # DOI and retraction checking
│   └── *_client.py           # Database clients
└── utils/
    ├── helpers.py
    ├── report_generator.py
    └── ...

Troubleshooting

Issue Solution
No references found Check PDF quality; try a different PDF
GROBID timeout Public server may be busy; try local GROBID
High INCONCLUSIVE rate Use --rigor lenient
AI rate limits Use Ollama (no limits) or wait for cooldown

Ethical Usage

VerifyRef follows strict ethical guidelines:

  • API-only access (no web scraping)
  • Respects all service rate limits
  • No personal data collection
  • Proper attribution in requests

Contributing

See contributing.md for guidelines.

License

GNU General Public License v3 (GPLv3)

Copyright (C) 2025-2026 Hosein Hadipour

Documentation

Caution

VerifyRef is designed to assist in verification of academic references and should not be used as a sole determinant of reference authenticity. It is intended to complement human judgment in the peer review process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyref-1.1.1.tar.gz (153.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verifyref-1.1.1-py3-none-any.whl (179.7 kB view details)

Uploaded Python 3

File details

Details for the file verifyref-1.1.1.tar.gz.

File metadata

  • Download URL: verifyref-1.1.1.tar.gz
  • Upload date:
  • Size: 153.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for verifyref-1.1.1.tar.gz
Algorithm Hash digest
SHA256 1641504aa254cd570dec72e4110d0c44b14377b05bd2fa6e89b1169dbca351e8
MD5 773f9be3f8782c896ed32443c292ae2b
BLAKE2b-256 75c4e6187e286630e874b6b7055e567a6e292f15d7cccacea17118605b5f2296

See more details on using hashes here.

File details

Details for the file verifyref-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: verifyref-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 179.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for verifyref-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b3efb0d2365c5c9f5212d72cf816877ed992d87838da288a30c36688bde5478
MD5 73fddcaa98ba6183ebb9aa446d6a1fa2
BLAKE2b-256 a4b45e3eb8bb5330c5a8916cb59ca47e09d8a68a219c8377c6c958c6ac4a3dcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page