Skip to main content

AI-Powered Citation Validation for Academic Papers

Project description

ValiRef Logo

ValiRef

AI-Powered Citation Validation for Academic Papers

FeaturesInstallationUsageHow It WorksBenchmark

Python 3.12+ License: MIT Async First


Overview

ValiRef is an intelligent tool designed to detect hallucinated citations in academic papers. With the rise of AI-generated content, Large Language Models (LLMs) sometimes generate plausible-sounding but non-existent references. ValiRef helps researchers, reviewers, and publishers verify the authenticity of citations in PDF documents.

What ValiRef Detects

Hallucination Type Description Example
🔮 Fabrication Completely fake paper that doesn't exist A paper with a convincing title but no actual publication
👤 Attribution Error Real paper, wrong authors Citing "Attention is All You Need" by someone other than Vaswani et al.
📄 Irrelevance Real paper, but claim doesn't match content Citing a paper about NLP for a claim about computer vision
🔄 Counterfactual Real paper, opposite conclusion Claiming a paper supports X when it actually argues against X

Features

  • 🔍 Multi-Source Verification - Cross-references citations against ArXiv, Google Scholar, Semantic Scholar, OpenReview, OpenAlex, and DuckDuckGo
  • 🤖 AI-Powered Detection - Uses DeepSeek LLM with ReAct reasoning to analyze search results
  • Async-First Architecture - Concurrent validation of multiple references for optimal performance
  • 📊 Rich CLI Output - Beautiful terminal interface with progress bars, real-time metrics, and detailed reports
  • 📈 Benchmark Suite - Built-in dataset generation and evaluation framework
  • 🛡️ Resilient API Handling - Token bucket rate limiting + circuit breaker pattern for reliable external API calls
  • 🎯 High Accuracy - 72%+ accuracy on 100-sample benchmark with confidence scoring and detailed reasoning

Installation

Prerequisites

  • Python 3.12 or higher
  • uv package manager (recommended) or pip

Install from PyPI (Recommended)

pip install valiref

Install from Source

# Clone the repository
git clone https://github.com/Gianthard-cyh/ValiRef.git
cd ValiRef

# Install dependencies
uv sync

# Set up environment variables
cp .env.example .env
# Edit .env and add your DeepSeek API key

Environment Configuration

Create a .env file with your API keys:

DEEPSEEK_API_KEY=your_deepseek_api_key_here

# Optional: for enhanced search capabilities
SERPAPI_API_KEY=your_serpapi_key
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_key

# Optional: LangSmith tracing
LANGCHAIN_TRACING_V2=false
LANGCHAIN_API_KEY=your_langchain_key
LANGCHAIN_PROJECT=ValiRef

Usage

Validate References in a PDF

# Basic usage
uv run python -m src.cli validate paper.pdf

# With concurrent workers (default: 5)
uv run python -m src.cli validate paper.pdf --workers 10

# Output as JSON
uv run python -m src.cli validate paper.pdf --json

# Enable verbose logging
uv run python -m src.cli validate paper.pdf --verbose

Example Output

Validation Summary for paper.pdf
Total References: 12
Validated: 12
Duration: 15.34s

┌─────────────────────────────────────────────────────────────────────┐
│ ✅ Reference #1 - REAL REFERENCE                                    │
├─────────────────────────────────────────────────────────────────────┤
│ Title: Attention Is All You Need                                    │
│ Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, et al.          │
│ Confidence: 0.98                                                    │
│                                                                     │
│ Reasoning:                                                          │
│ Found exact match on ArXiv (arxiv.org/abs/1706.03762). Title,       │
│ authors, and venue (NIPS 2017) all match the citation.              │
│                                                                     │
│ Evidence / Sources:                                                 │
│ - https://arxiv.org/abs/1706.03762                                  │
└─────────────────────────────────────────────────────────────────────┘

How It Works

ValiRef employs a sophisticated multi-step validation pipeline:

┌─────────────┐    ┌──────────────┐    ┌──────────────┐    ┌─────────────┐
│  PDF Input  │ →  │   Extract    │ →  │    Search    │ →  │   Validate  │
│             │    │  References  │    │  Multi-Source│    │  with LLM   │
└─────────────┘    └──────────────┘    └──────────────┘    └─────────────┘
                                                              │
                                                              ▼
                                                        ┌─────────────┐
                                                        │   Report    │
                                                        │  Results    │
                                                        └─────────────┘

1. Reference Extraction

  • Parses PDF documents using PyMuPDF
  • Uses LLM to intelligently extract structured reference data from bibliography sections
  • Handles various citation formats (APA, MLA, Chicago, etc.)

2. Multi-Source Search

Simultaneously queries multiple academic databases:

  • ArXiv - Preprint server with full-text access
  • Google Scholar - Broad academic search
  • Semantic Scholar - AI-powered academic search
  • OpenReview - Peer-reviewed conference papers
  • OpenAlex - Open academic graph
  • DuckDuckGo - Web search fallback

3. AI Validation

The HallucinationDetector uses a ReAct (Reasoning + Acting) agent powered by DeepSeek LLM:

  • Analyzes search results from all sources
  • Compares paper metadata (title, authors, abstract, venue)
  • Evaluates claims against actual paper content
  • Provides confidence scores with detailed reasoning

Resilient API Architecture

ValiRef implements a production-grade resilience layer for external API calls:

┌─────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  SearchTool │────▶│ ToolRequestQueue│────▶│  Token Bucket   │
│  (per source)│     │  (rate limiter) │     │ (smooth flow)   │
└─────────────┘     └─────────────────┘     └─────────────────┘
                              │
                              ▼
                     ┌─────────────────┐
                     │ Circuit Breaker │
                     │ (fail-fast for  │
                     │  unhealthy APIs)│
                     └─────────────────┘

Features:

  • Token Bucket Rate Limiting - Smooth request flow with configurable burst capacity per source
  • Circuit Breaker Pattern - Automatically stops requests to failing services (3 failures → OPEN, 15s recovery timeout)
  • Real-time Metrics - Live display of API call statistics, active requests, and circuit states
  • Graceful Degradation - Failed sources are marked unavailable but don't block other sources

Benchmark

ValiRef includes a comprehensive benchmark suite for evaluating hallucination detection performance.

Performance Results

On a 100-sample mixed dataset:

Metric Value
Accuracy 72.0%
Precision 1.0000
Recall 0.2800 (Counterfactual) / 1.0000 (Fabrication)
F1 Score 0.4375 (Counterfactual) / 1.0000 (Fabrication)
Throughput ~0.09 samples/sec
Duration ~18 min (100 samples)

Per-Type Performance

Hallucination Type Accuracy Precision Recall F1 Score Samples
Fabrication 100% 1.0000 1.0000 1.0000 19
AttributionError 100% 1.0000 1.0000 1.0000 19
Irrelevance 74% 1.0000 0.7368 0.8485 19
Counterfactual 28% 1.0000 0.2800 0.4375 25
Real Papers 72% 0.0000 0.0000 0.0000 18

Generate Benchmark Dataset

uv run python scripts/generate_dataset.py \
  --topic cs.CL \
  --count 1000 \
  --output data/dataset.csv

Dataset Composition

The benchmark dataset combines real ArXiv papers with synthetic hallucinations:

Category Description Percentage
Real Genuine papers from ArXiv 50%
Fabrication AI-generated fake papers 12.5%
Attribution Error Real papers with wrong authors 12.5%
Irrelevance Real papers with mismatched claims 12.5%
Counterfactual Real papers with inverted claims 12.5%

Running Tests

# Run unit tests (fast, no external APIs)
uv run pytest

# Run integration tests (slow, requires API keys)
uv run pytest -m integration

# Run specific test
uv run pytest tests/core/test_tools.py -v

Architecture

valiref/
├── src/
│   ├── cli.py                 # Typer-based CLI interface
│   ├── cli_callbacks.py       # Progress callbacks and Live display
│   ├── core/                  # Core validation engine
│   │   ├── pipeline.py        # Async validation orchestration
│   │   ├── detector.py        # LLM-based hallucination detection
│   │   ├── extract.py         # PDF/text extraction
│   │   ├── tools.py           # Academic search tools with rate limiting
│   │   ├── search_queue.py    # Token bucket + circuit breaker
│   │   ├── tool_monitor.py    # Real-time metrics via blinker signals
│   │   ├── config.py          # Configuration management
│   │   └── logger.py          # Rich-based logging
│   ├── bench/                 # Benchmark framework
│   │   ├── crawler.py         # ArXiv paper crawler
│   │   ├── dataset.py         # Hallucination injection
│   │   ├── bench.py           # Benchmark runner with live metrics
│   │   └── schema.py          # Pydantic data models
│   └── api/                   # API interface (future)
├── scripts/
│   └── generate_dataset.py    # Dataset generation script
├── tests/                     # Test suite
└── data/                      # Benchmark datasets

Configuration

Key settings in src/core/config.py:

Setting Default Description
LLM_MODEL deepseek-chat LLM for validation
LLM_TEMPERATURE 0.7 Creativity vs determinism
DETECTOR_TEMPERATURE 0.1 Lower for consistent reasoning
EXTRACTION_CHAR_LIMIT 20000 Max chars from PDF references
MAX_WORKERS 5 Concurrent validation threads

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Install dev dependencies
uv sync --dev

# Run linting
uv run ruff check .
uv run ruff format .

# Run tests
uv run pytest

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments


Built with ❤️ for the research community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valiref-0.1.0.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valiref-0.1.0-py3-none-any.whl (43.8 kB view details)

Uploaded Python 3

File details

Details for the file valiref-0.1.0.tar.gz.

File metadata

  • Download URL: valiref-0.1.0.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for valiref-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3ec619f1ed9f78a68f6c6d653295f178b7ef89b3df278e523ae96def142fe1f2
MD5 f386319e6a6772a249ef21797924766b
BLAKE2b-256 054505d26d6579bbf05fd2516f2fa680f4f0afaf449d620e86e9be8a9d4ad43b

See more details on using hashes here.

File details

Details for the file valiref-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: valiref-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for valiref-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4ccd536de1aab04c1a5f9d8fcf7ec18996a364630fe46d446a767b3f570f55b
MD5 3e36d05695344e3e0fae4bd2dfe6a341
BLAKE2b-256 7ff8acaa2633a7d247938580330bcb995f6a0fc430489374e64894f714629b89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page