Skip to main content

Academic reference verification tool with multi-database search and AI-powered fraud detection

Project description

VerifyRef

License: GPL v3 Python 3.8+

A tool for verifying the authenticity of academic references in PDF documents using multiple academic databases and optional AI-powered analysis.

Why VerifyRef?

While reviewing a journal submission, I found a reference that listed my brother, a businessman with no connection to cryptography, as a co-author of a paper on symmetric-key cryptanalysis with a well-known researcher. My brother had nothing to do with this paper. This triggered me to inspect that reference and others in the paper, which turned out to be partially AI-generated with multiple fake references.

Manually checking dozens of references was time-consuming, so I created VerifyRef to automatically extract and verify references against trusted academic databases. Here is the summary of the output for that paper:

                   Verification Summary
+---------------------------+-------+------------+--------+
| Classification            | Count | Percentage | Status |
+---------------------------+-------+------------+--------+
| [+] AUTHENTIC             |     6 |      33.3% |   *    |
| [?] SUSPICIOUS            |     8 |      44.4% |   *    |
| [X] FAKE                  |     0 |       0.0% |   -    |
| [~] AUTHOR MANIPULATION   |     0 |       0.0% |   -    |
| [-] FABRICATED            |     4 |      22.2% |   *    |
| [!] INCONCLUSIVE          |     0 |       0.0% |   -    |
+---------------------------+-------+------------+--------+

[HIGH RISK] Significant fraud detected

This tool helps reviewers quickly identify potentially fabricated references and AI-generated content, making the peer review process more efficient. Note that VerifyRef is not a replacement for human judgment but a powerful assistant to streamline the verification process.

Features

  • Multi-database verification across 8+ academic databases
  • PDF processing using GROBID (works out of the box with public server)
  • Retraction detection via CrossRef and Retraction Watch
  • Author manipulation detection (real titles with fake authors)
  • Optional AI verification using free (Gemini, Groq, Ollama) or paid (OpenAI) providers
  • Book reference handling for textbooks that may not appear in paper databases
  • Parallel processing with multi-threaded database queries
  • JSON and text output formats

Installation

Quick Start (No Docker Required)

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run verification (uses public GROBID server automatically)
python verifyref.py paper.pdf -o results.txt

Docker Installation

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
docker build -t verifyref .

# Interactive mode
docker run -it --rm -v "$(pwd):/app/workspace" verifyref

# Inside the container:
cd /app/workspace/
verifyref paper.pdf -o results.txt

Local GROBID (Optional)

For faster processing or privacy, run GROBID locally:

docker run -d -p 8070:8070 lfoppiano/grobid:0.8.2
export GROBID_URL="http://localhost:8070"
python verifyref.py paper.pdf

VerifyRef automatically detects and uses local GROBID when available.

Usage

Basic Usage

# Verify references in a PDF
python verifyref.py paper.pdf -o results.txt

# Search for a specific citation
python verifyref.py --cite "Differential Cryptanalysis of DES"

# Verify a single reference
python verifyref.py --verify "Author, A.: Title. Venue, 2024"

Advanced Options

# Verification rigor levels
python verifyref.py paper.pdf --rigor strict    # High precision
python verifyref.py paper.pdf --rigor balanced  # Default
python verifyref.py paper.pdf --rigor lenient   # High recall

# Context-aware search
python verifyref.py --cite "cryptanalysis" --context cs
python verifyref.py --cite "gene therapy" --context bio

# AI-enhanced verification
python verifyref.py paper.pdf --enable-ai

# Verbose output
python verifyref.py paper.pdf --verbose

AI Verification Setup

VerifyRef supports multiple AI providers. Ollama is recommended for unlimited free usage:

# Option 1: Ollama (free, local, no rate limits)
brew install ollama
ollama serve
ollama pull llama3.2
export AI_PROVIDER="ollama"
python verifyref.py paper.pdf --enable-ai

# Option 2: Google Gemini (free tier)
export AI_PROVIDER="gemini"
export GOOGLE_GEMINI_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

# Option 3: Groq (free tier)
export AI_PROVIDER="groq"
export GROQ_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

Classification System

VerifyRef uses a 5-category system to evaluate reference authenticity:

Category Criteria Action
AUTHENTIC High similarity (>55%), multiple database matches Accept
SUSPICIOUS Moderate similarity (25-55%), limited evidence Manual review
FABRICATED Very low similarity (<25%), no database matches Investigate
AUTHOR_MANIPULATION Title matches but authors differ significantly Flag misconduct
INCONCLUSIVE Parsing errors, books, or network issues Re-verify

Retracted papers are flagged with a warning regardless of classification.

Database Integration

Primary Databases (no API key required):

  • OpenAlex - Comprehensive coverage (200M+ works)
  • DBLP - Computer Science
  • IACR - Cryptography
  • ArXiv - Preprints
  • CrossRef - DOI metadata and retraction status

Enhanced with API Keys (optional):

  • Semantic Scholar - Higher rate limits
  • PubMed - Biomedical (NCBI key)
  • Springer Nature - STM publications

Smart Fallback:

  • Google Scholar - Used only when other databases find poor matches (<70% similarity)

Configuration

Edit config.py to configure:

# Required
CROSSREF_EMAIL = "your.email@domain.com"

# Optional API keys
SEMANTIC_SCHOLAR_API_KEY = ""
NCBI_API_KEY = ""
SPRINGER_API_KEY = ""

# AI providers (for --enable-ai)
GOOGLE_GEMINI_API_KEY = ""
GROQ_API_KEY = ""
OPENAI_API_KEY = ""

# Database toggles
ENABLE_CROSSREF = True
ENABLE_GOOGLE_SCHOLAR = True

GROBID Configuration

VerifyRef uses a smart fallback chain for PDF processing:

  1. Public GROBID server (default, no setup required)
  2. Local GROBID (if running on localhost:8070)
  3. PyMuPDF fallback (lower accuracy, used when GROBID unavailable)

Override the default GROBID URL:

export GROBID_URL="http://localhost:8070"

Project Structure

verifyref/
├── verifyref.py              # CLI entry point
├── config.py                 # Configuration
├── grobid/
│   ├── client.py             # GROBID client with smart fallback
│   └── fallback_parser.py    # PyMuPDF fallback parser
├── extractor/
│   └── reference_parser.py   # Reference parsing
├── verifier/
│   ├── multi_database_verifier.py
│   ├── classifier.py         # Classification logic
│   ├── ai_verifier.py        # AI verification
│   ├── doi_validation_client.py  # DOI and retraction checking
│   └── *_client.py           # Database clients
└── utils/
    ├── helpers.py
    ├── report_generator.py
    └── ...

Troubleshooting

Issue Solution
No references found Check PDF quality; try a different PDF
GROBID timeout Public server may be busy; try local GROBID
High INCONCLUSIVE rate Use --rigor lenient
AI rate limits Use Ollama (no limits) or wait for cooldown

Ethical Usage

VerifyRef follows strict ethical guidelines:

  • API-only access (no web scraping)
  • Respects all service rate limits
  • No personal data collection
  • Proper attribution in requests

Contributing

See contributing.md for guidelines.

License

GNU General Public License v3 (GPLv3)

Copyright (C) 2025-2026 Hosein Hadipour

Documentation

Caution

VerifyRef is designed to assist in verification of academic references and should not be used as a sole determinant of reference authenticity. It is intended to complement human judgment in the peer review process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyref-1.0.0.tar.gz (152.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verifyref-1.0.0-py3-none-any.whl (178.3 kB view details)

Uploaded Python 3

File details

Details for the file verifyref-1.0.0.tar.gz.

File metadata

  • Download URL: verifyref-1.0.0.tar.gz
  • Upload date:
  • Size: 152.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for verifyref-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7efac44843edb832cf86afd8705aaab7eef8833dfe2760fd79fc4dc06d46ccb5
MD5 fc61a68371ba206d26ba8063de6c699f
BLAKE2b-256 d6acc90467461ed85a8202f6d595e72b3bf67c7f273f1bb46dd300d528a36ce8

See more details on using hashes here.

File details

Details for the file verifyref-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: verifyref-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 178.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for verifyref-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9b1f28970e7cfdc7731e499e056fab01ba1c1e3bddeeed885c9c1f9562588a1
MD5 701fdab333cc6336cf4f927ea7547191
BLAKE2b-256 6025717e857d75e5eb52b126753ade7a05f30bf07b8ad4df8c827545d8026911

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page