Skip to main content

Academic reference verification tool with multi-database search and AI-powered fraud detection

Project description

VerifyRef

License: GPL v3 Python 3.8+

A tool for verifying the authenticity of academic references in PDF documents using multiple academic databases and optional AI-powered analysis.

⚠️ Important Note for Reviewers
This tool may produce false positives — authentic references can sometimes be flagged as suspicious or unverified. This can happen due to:

  • New papers not yet indexed in databases
  • Author name format variations (e.g., "J. Smith" vs "John Smith")
  • Regional or specialized venues with limited database coverage
  • OCR/extraction errors from PDF processing

Always manually verify flagged references before making decisions. VerifyRef is a screening tool to assist human reviewers, not a replacement for careful manual checking.

Why VerifyRef?

While reviewing a journal submission, I found a reference that listed my brother, a businessman with no connection to cryptography, as a co-author of a paper on symmetric-key cryptanalysis with a well-known researcher. My brother had nothing to do with this paper. This triggered me to inspect that reference and others in the paper, which turned out to be partially AI-generated with multiple fake references.

Manually checking dozens of references was time-consuming, so I created VerifyRef to automatically extract and verify references against trusted academic databases. Here is the summary of the output for that paper:

                   Verification Summary
+---------------------------+-------+------------+--------+
| Classification            | Count | Percentage | Status |
+---------------------------+-------+------------+--------+
| [+] AUTHENTIC             |     6 |      33.3% |   *    |
| [?] SUSPICIOUS            |     8 |      44.4% |   *    |
| [X] FAKE                  |     0 |       0.0% |   -    |
| [~] AUTHOR MANIPULATION   |     0 |       0.0% |   -    |
| [-] FABRICATED            |     4 |      22.2% |   *    |
| [!] INCONCLUSIVE          |     0 |       0.0% |   -    |
+---------------------------+-------+------------+--------+

[REVIEW RECOMMENDED] Some references could not be verified - please double-check flagged items

This tool helps reviewers quickly identify potentially problematic references and AI-generated content, making the peer review process more efficient. Note that VerifyRef is not a replacement for human judgment but a powerful assistant to streamline the verification process. The tool may occasionally misclassify authentic references, so always double-check flagged items manually.

Features

  • Multi-database verification across 8+ academic databases
  • PDF processing using GROBID (works out of the box with public server)
  • Retraction detection via CrossRef and Retraction Watch
  • Author manipulation detection (real titles with fake authors)
  • Optional AI verification using free (Gemini, Groq, Ollama) or paid (OpenAI) providers
  • Book reference handling for textbooks that may not appear in paper databases
  • Parallel processing with multi-threaded database queries
  • JSON and text output formats

Installation

From PyPI (Recommended)

pip install verifyref

# Run verification
verifyref paper.pdf -o results.txt

From Source

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run verification (uses public GROBID server automatically)
python verifyref.py paper.pdf -o results.txt

Docker Installation

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
docker build -t verifyref .

# Interactive mode
docker run -it --rm -v "$(pwd):/app/workspace" verifyref

# Inside the container:
cd /app/workspace/
verifyref paper.pdf -o results.txt

Local GROBID (Optional)

For faster processing or privacy, run GROBID locally:

docker run -d -p 8070:8070 lfoppiano/grobid:0.8.2
export GROBID_URL="http://localhost:8070"
python verifyref.py paper.pdf

VerifyRef automatically detects and uses local GROBID when available.

Usage

Basic Usage

# Verify references in a PDF
python verifyref.py paper.pdf -o results.txt

# Search for a specific citation
python verifyref.py --cite "Differential Cryptanalysis of DES"

# Verify a single reference
python verifyref.py --verify "Author, A.: Title. Venue, 2024"

Advanced Options

# Verification rigor levels
python verifyref.py paper.pdf --rigor strict    # High precision
python verifyref.py paper.pdf --rigor balanced  # Default
python verifyref.py paper.pdf --rigor lenient   # High recall

# Context-aware search
python verifyref.py --cite "cryptanalysis" --context cs
python verifyref.py --cite "gene therapy" --context bio

# AI-enhanced verification
python verifyref.py paper.pdf --enable-ai

# Verbose output
python verifyref.py paper.pdf --verbose

AI Verification Setup

VerifyRef supports multiple AI providers. Ollama is recommended for unlimited free usage:

# Option 1: Ollama (free, local, no rate limits)
brew install ollama
ollama serve
ollama pull llama3.2
export AI_PROVIDER="ollama"
python verifyref.py paper.pdf --enable-ai

# Option 2: Google Gemini (free tier)
export AI_PROVIDER="gemini"
export GOOGLE_GEMINI_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

# Option 3: Groq (free tier)
export AI_PROVIDER="groq"
export GROQ_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

Classification System

VerifyRef uses a 5-category system to evaluate reference authenticity:

Category Criteria Action
AUTHENTIC High similarity (>55%), multiple database matches Accept
SUSPICIOUS Moderate similarity (25-55%), limited evidence Manual review
FABRICATED Very low similarity (<25%), no database matches Investigate
AUTHOR_MANIPULATION Title matches but authors differ significantly Flag misconduct
INCONCLUSIVE Parsing errors, books, or network issues Re-verify

Retracted papers are flagged with a warning regardless of classification.

Database Integration

Primary Databases (no API key required):

  • OpenAlex - Comprehensive coverage (200M+ works)
  • DBLP - Computer Science
  • IACR - Cryptography
  • ArXiv - Preprints
  • CrossRef - DOI metadata and retraction status

Enhanced with API Keys (optional):

  • Semantic Scholar - Higher rate limits
  • PubMed - Biomedical (NCBI key)
  • Springer Nature - STM publications

Smart Fallback:

  • Google Scholar - Used only when other databases find poor matches (<70% similarity)

Configuration

Edit config.py to configure:

# Required
CROSSREF_EMAIL = "your.email@domain.com"

# Optional API keys
SEMANTIC_SCHOLAR_API_KEY = ""
NCBI_API_KEY = ""
SPRINGER_API_KEY = ""

# AI providers (for --enable-ai)
GOOGLE_GEMINI_API_KEY = ""
GROQ_API_KEY = ""
OPENAI_API_KEY = ""

# Database toggles
ENABLE_CROSSREF = True
ENABLE_GOOGLE_SCHOLAR = True

GROBID Configuration

VerifyRef uses a smart fallback chain for PDF processing:

  1. Public GROBID server (default, no setup required)
  2. Local GROBID (if running on localhost:8070)
  3. PyMuPDF fallback (lower accuracy, used when GROBID unavailable)

Override the default GROBID URL:

export GROBID_URL="http://localhost:8070"

Project Structure

verifyref/
├── verifyref.py              # CLI entry point
├── config.py                 # Configuration
├── grobid/
│   ├── client.py             # GROBID client with smart fallback
│   └── fallback_parser.py    # PyMuPDF fallback parser
├── extractor/
│   └── reference_parser.py   # Reference parsing
├── verifier/
│   ├── multi_database_verifier.py
│   ├── classifier.py         # Classification logic
│   ├── ai_verifier.py        # AI verification
│   ├── doi_validation_client.py  # DOI and retraction checking
│   └── *_client.py           # Database clients
└── utils/
    ├── helpers.py
    ├── report_generator.py
    └── ...

Troubleshooting

Issue Solution
No references found Check PDF quality; try a different PDF
GROBID timeout Public server may be busy; try local GROBID
High INCONCLUSIVE rate Use --rigor lenient
AI rate limits Use Ollama (no limits) or wait for cooldown

Ethical Usage

VerifyRef follows strict ethical guidelines:

  • API-only access (no web scraping)
  • Respects all service rate limits
  • No personal data collection
  • Proper attribution in requests

Contributing

See contributing.md for guidelines.

License

GNU General Public License v3 (GPLv3)

Copyright (C) 2025-2026 Hosein Hadipour

Documentation

Caution

VerifyRef is designed to assist in verification of academic references and should not be used as a sole determinant of reference authenticity. It is intended to complement human judgment in the peer review process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyref-1.1.0.tar.gz (153.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verifyref-1.1.0-py3-none-any.whl (179.7 kB view details)

Uploaded Python 3

File details

Details for the file verifyref-1.1.0.tar.gz.

File metadata

  • Download URL: verifyref-1.1.0.tar.gz
  • Upload date:
  • Size: 153.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for verifyref-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c1fae62821a0b65c4d11be205f1cad9e4a9af6f97e9787f2b7980768462c691a
MD5 56af1ccf3218d40e37d49cb013e25377
BLAKE2b-256 d57bdc78dc529b42d618695a0cacdec2e1fac607adcb5a36f0d57fa3ba07c571

See more details on using hashes here.

File details

Details for the file verifyref-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: verifyref-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 179.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for verifyref-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9bc0039782eabecb0af425fa70c1541ead114d73ec883971da628d561b8e203e
MD5 e8f9277ed0355e3da4e8107050af9815
BLAKE2b-256 d82be2d3c324beffeb4269723a793e6f9361b1c5c71d25aa9ca9995f5a838578

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page