A comprehensive tool for validating reference accuracy in academic papers

These details have not been verified by PyPI

Project links

Project description

RefChecker

Validate reference accuracy in academic papers. Useful for authors checking bibliographies and reviewers ensuring citations are authentic. RefChecker verifies citations against Semantic Scholar, OpenAlex, and CrossRef.

Built by Mark Russinovich with AI assistants (Cursor, GitHub Copilot, Claude Code). Watch the deep dive video.

Quick Start
Features
Sample Output
Install
Run
Output
Configure
Local Database
Testing
License

Quick Start

Web UI (Docker)

docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest

Open http://localhost:8000 in your browser.

Web UI (pip)

pip install academic-refchecker[llm,webui]
refchecker-webui

CLI (pip)

pip install academic-refchecker[llm]
academic-refchecker --paper 1706.03762
academic-refchecker --paper /path/to/paper.pdf

Performance: Set SEMANTIC_SCHOLAR_API_KEY for 1-2s per reference vs 5-10s without.

Features

Multiple formats: ArXiv papers, PDFs, LaTeX, text files
LLM-powered extraction: OpenAI, Anthropic, Google, Azure, vLLM
Multi-source verification: Semantic Scholar, OpenAlex, CrossRef
Comprehensive checks: Titles, authors, years, venues, DOIs, ArXiv IDs
Smart matching: Handles formatting variations (BERT vs B-ERT, pre-trained vs pretrained)
Detailed reports: Errors, warnings, corrected references
Bulk web checks: Upload multiple files or a ZIP in the Web UI to validate many papers at once

Sample Output

Web UI

RefChecker Web UI

CLI

📄 Processing: Attention Is All You Need
   URL: https://arxiv.org/abs/1706.03762

[1/45] Neural machine translation in linear time
       Nal Kalchbrenner et al. | 2017
       ⚠️  Warning: Year mismatch: cited '2017', actual '2016'

[2/45] Effective approaches to attention-based neural machine translation
       Minh-Thang Luong et al. | 2015
       ❌ Error: First author mismatch: cited 'Minh-Thang Luong', actual 'Thang Luong'

[3/45] Deep Residual Learning for Image Recognition
       Kaiming He et al. | 2016 | https://doi.org/10.1109/CVPR.2016.91
       ❌ Error: DOI mismatch: cited '10.1109/CVPR.2016.91', actual '10.1109/CVPR.2016.90'

============================================================
📋 SUMMARY
📚 Total references processed: 68
❌ Total errors: 55  ⚠️ Total warnings: 16  ❓ Unverified: 15

Install

PyPI (Recommended)

pip install academic-refchecker[llm,webui]  # Web UI + CLI + LLM providers
pip install academic-refchecker             # CLI only

From Source (Development)

git clone https://github.com/markrussinovich/refchecker.git && cd refchecker
python -m venv .venv && source .venv/bin/activate
pip install -e ".[llm,webui]"

Requirements: Python 3.7+ (3.10+ recommended). Node.js 18+ is only needed for Web UI development.

Run

Web UI

The Web UI shows live progress, history, and export (including corrected values).

refchecker-webui --port 8000

Tip: You can bulk-check multiple papers by selecting several files or a single ZIP; the Web UI will group them into a batch in the history sidebar.

Development (frontend)

cd web-ui
npm install
npm start

Open http://localhost:5173.

Alternative (separate servers):

# Terminal 1
python -m uvicorn backend.main:app --reload --port 8000

# Terminal 2
cd web-ui
npm run dev

Verify the backend is running:

curl http://localhost:8000/

Web UI documentation: see web-ui/README.md.

Multi-User Hosted Server (OAuth)

The hosted multi-user mode requires every visitor to sign in via OAuth (Google, GitHub, or Microsoft) before using the app. LLM API keys are entered once by each user in the Settings panel, saved in the browser's localStorage, and sent in the request body on every check — they are never stored on the server.

1. Generate a JWT Secret Key

python -c "import secrets; print(secrets.token_hex(32))"

Copy the output — this is your JWT_SECRET_KEY.

2. Register an OAuth Application

Configure at least one provider:

Google — Google Cloud Console → Create credentials → OAuth 2.0 Client ID → Web application

Authorised redirect URI: https://<your-domain>/api/auth/callback/google

GitHub — GitHub Settings › Developer settings › OAuth Apps → New OAuth App

Authorization callback URL: https://<your-domain>/api/auth/callback/github

Microsoft — Azure portal › App registrations → New registration

Redirect URI: https://<your-domain>/api/auth/callback/microsoft

3. Configure Environment Variables

git clone https://github.com/markrussinovich/refchecker.git && cd refchecker
cp .env.example .env

Edit .env with your values:

# Required
JWT_SECRET_KEY=<output from step 1>
SITE_URL=https://<your-domain>
HTTPS_ONLY=true

# At least one OAuth provider (add whichever you registered in step 2)
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...

GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...

MS_CLIENT_ID=...
MS_CLIENT_SECRET=...

# Optional tuning
ADMIN_EMAILS=your@email.com   # also grants admin to specific emails (first user is auto-admin)
MAX_CHECKS_PER_USER=3         # max concurrent checks per user (default: 3)

4. Launch with Docker Compose

docker compose up -d

The server starts on port 8000. Place it behind a TLS-terminating reverse proxy (nginx, Caddy, etc.) for HTTPS.

Verify it is running:

curl http://localhost:8000/api/auth/providers
# {"providers":["google","github"]}

Local / Development Launch

Without Docker:

pip install "academic-refchecker[llm,webui]"
JWT_SECRET_KEY=<secret> GOOGLE_CLIENT_ID=... GOOGLE_CLIENT_SECRET=... \
  refchecker-webui --port 8000

Or with hot-reload for development:

# Terminal 1 — API
JWT_SECRET_KEY=<secret> GOOGLE_CLIENT_ID=... GOOGLE_CLIENT_SECRET=... \
  python -m uvicorn backend.main:app --reload --port 8000

# Terminal 2 — Frontend (http://localhost:5173)
cd web-ui && npm run dev

Notes

Admin access: The first user to sign in is automatically granted admin rights. Additional admins can be designated via the ADMIN_EMAILS env var (comma-separated list of email addresses).
LLM API keys: Each user enters their own key in Settings → API Keys. Keys are saved in localStorage and sent per-request in the request body — never stored on or logged by the server.
Rate limiting: Each user may run up to MAX_CHECKS_PER_USER concurrent checks (default 3). The 4th simultaneous request returns HTTP 429.
CLI mode is unaffected: academic-refchecker (CLI) does not require OAuth and continues to work without any auth configuration.

Docker

Pre-built multi-architecture images are published to GitHub Container Registry on every release.

Quick Start

docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest

Open http://localhost:8000 in your browser.

With LLM API Key

Pass your API key for LLM-powered reference extraction (recommended):

# Anthropic Claude (recommended)
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=your_key ghcr.io/markrussinovich/refchecker:latest

# OpenAI
docker run -p 8000:8000 -e OPENAI_API_KEY=your_key ghcr.io/markrussinovich/refchecker:latest

# Google Gemini
docker run -p 8000:8000 -e GOOGLE_API_KEY=your_key ghcr.io/markrussinovich/refchecker:latest

Persistent Data

Mount a volume to persist check history and settings between restarts:

docker run -p 8000:8000 \
  -e ANTHROPIC_API_KEY=your_key \
  -v refchecker-data:/app/data \
  ghcr.io/markrussinovich/refchecker:latest

Docker Compose

For easier configuration with an .env file:

git clone https://github.com/markrussinovich/refchecker.git && cd refchecker
cp .env.example .env  # Add your API keys
docker compose up -d

Common commands:

docker compose logs -f    # View logs
docker compose down       # Stop
docker compose pull       # Update to latest

Available Tags

Tag	Description	Arch	Size
`latest`	Latest stable release	amd64, arm64	~800MB
`X.Y.Z`	Specific version (e.g., `2.0.18`)	amd64, arm64	~800MB

CLI

# ArXiv (ID or URL)
academic-refchecker --paper 1706.03762
academic-refchecker --paper https://arxiv.org/abs/1706.03762

# Local files
academic-refchecker --paper paper.pdf
academic-refchecker --paper paper.tex
academic-refchecker --paper paper.txt
academic-refchecker --paper refs.bib

# Faster/offline verification (local DB)
academic-refchecker --paper paper.pdf --db-path semantic_scholar_db/semantic_scholar.db

# Save results
academic-refchecker --paper 1706.03762 --output-file errors.txt

Output

RefChecker reports these result types:

Type	Description	Examples
❌ Error	Critical issues needing correction	Author/title/DOI mismatches, incorrect ArXiv IDs
⚠️ Warning	Minor issues to review	Year differences, venue variations
ℹ️ Suggestion	Recommended improvements	Add missing ArXiv/DOI URLs, small metadata fixes
❓ Unverified	Could not verify against any source	Rare publications, preprints

Verified references include discovered URLs (Semantic Scholar, ArXiv, DOI). Suggestions are non-blocking improvements.

Detailed examples

❌ Error: First author mismatch: cited 'T. Xie', actual 'Zhao Xu'
❌ Error: DOI mismatch: cited '10.5555/3295222.3295349', actual '10.48550/arXiv.1706.03762'
⚠️ Warning: Year mismatch: cited '2024', actual '2023'
ℹ️ Suggestion: Add ArXiv URL https://arxiv.org/abs/1706.03762
❓ Could not verify: Llama guard (M. A. Research, 2024)

Configure

LLM

LLM-powered extraction improves accuracy with complex bibliographies. Claude Sonnet 4 performs best; GPT-4o may hallucinate DOIs.

Provider	Env Variable	Example Model
Anthropic	`ANTHROPIC_API_KEY`	`claude-sonnet-4-20250514`
OpenAI	`OPENAI_API_KEY`	`gpt-5.2-mini`
Google	`GOOGLE_API_KEY`	`gemini-3`
Azure	`AZURE_OPENAI_API_KEY`	`gpt-4o`
vLLM	(local)	`meta-llama/Llama-3.3-70B-Instruct`

export ANTHROPIC_API_KEY=your_key
academic-refchecker --paper 1706.03762 --llm-provider anthropic

academic-refchecker --paper paper.pdf --llm-provider openai --llm-model gpt-5.2-mini
academic-refchecker --paper paper.pdf --llm-provider vllm --llm-model meta-llama/Llama-3.3-70B-Instruct

Local models (vLLM)

There is no separate “GPU Docker image”. For local inference, install the vLLM extra and run an OpenAI-compatible vLLM server:

pip install "academic-refchecker[vllm]"
python scripts/start_vllm_server.py --model meta-llama/Llama-3.3-70B-Instruct --port 8001
academic-refchecker --paper paper.pdf --llm-provider vllm --llm-endpoint http://localhost:8001/v1

Command Line

--paper PAPER              # ArXiv ID, URL, or file path
--llm-provider PROVIDER    # openai, anthropic, google, azure, vllm
--llm-model MODEL          # Override default model
--db-path PATH             # Local database for offline verification
--output-file [PATH]       # Save results (default: reference_errors.txt)
--debug                    # Verbose output

Environment Variables

# LLM
export REFCHECKER_LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=your_key           # Also: OPENAI_API_KEY, GOOGLE_API_KEY

# Performance
export SEMANTIC_SCHOLAR_API_KEY=your_key    # Higher rate limits / faster verification

Local Database

For offline verification or faster processing:

python scripts/download_db.py \
  --field "computer science" \
  --start-year 2020 --end-year 2024

academic-refchecker --paper paper.pdf --db-path semantic_scholar_db/semantic_scholar.db

Testing

490+ tests covering unit, integration, and end-to-end scenarios.

pytest tests/                    # All tests
pytest tests/unit/              # Unit only
pytest --cov=src tests/         # With coverage

See tests/README.md for details.

License

MIT License - see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.99

Apr 15, 2026

3.0.98

Apr 15, 2026

3.0.97

Apr 15, 2026

3.0.96

Apr 15, 2026

3.0.95

Apr 15, 2026

3.0.94

Apr 15, 2026

3.0.93

Apr 14, 2026

3.0.92

Apr 14, 2026

3.0.91

Apr 14, 2026

3.0.90

Apr 14, 2026

3.0.89

Apr 14, 2026

3.0.88

Apr 14, 2026

3.0.87

Apr 14, 2026

3.0.86

Apr 12, 2026

3.0.85

Apr 12, 2026

3.0.84

Apr 12, 2026

3.0.83

Apr 12, 2026

3.0.82

Apr 12, 2026

3.0.81

Apr 11, 2026

3.0.80

Apr 11, 2026

3.0.79

Apr 11, 2026

3.0.78

Apr 9, 2026

3.0.77

Apr 9, 2026

3.0.76

Apr 8, 2026

3.0.75

Apr 8, 2026

3.0.74

Apr 8, 2026

3.0.73

Apr 8, 2026

3.0.72

Apr 8, 2026

3.0.71

Apr 7, 2026

3.0.70

Apr 6, 2026

3.0.69

Apr 6, 2026

3.0.68

Apr 6, 2026

3.0.67

Apr 6, 2026

3.0.66

Apr 4, 2026

3.0.65

Apr 4, 2026

3.0.64

Apr 4, 2026

3.0.63

Apr 4, 2026

3.0.62

Apr 2, 2026

3.0.61

Apr 2, 2026

3.0.60

Apr 1, 2026

3.0.59

Apr 1, 2026

3.0.58

Apr 1, 2026

3.0.57

Apr 1, 2026

3.0.56

Apr 1, 2026

3.0.55

Mar 31, 2026

3.0.54

Mar 30, 2026

3.0.53

Mar 30, 2026

3.0.52

Mar 29, 2026

3.0.51

Mar 29, 2026

3.0.50

Mar 28, 2026

3.0.49

Mar 28, 2026

3.0.48

Mar 28, 2026

3.0.47

Mar 28, 2026

3.0.46

Mar 28, 2026

3.0.45

Mar 27, 2026

3.0.44

Mar 27, 2026

3.0.43

Mar 27, 2026

3.0.42

Mar 27, 2026

3.0.41

Mar 26, 2026

3.0.40

Mar 26, 2026

3.0.39

Mar 26, 2026

3.0.38

Mar 26, 2026

3.0.37

Mar 25, 2026

3.0.36

Mar 25, 2026

3.0.35

Mar 25, 2026

3.0.34

Mar 25, 2026

3.0.33

Mar 25, 2026

3.0.32

Mar 25, 2026

3.0.31

Mar 25, 2026

3.0.30

Mar 23, 2026

3.0.29

Mar 22, 2026

3.0.28

Mar 20, 2026

3.0.27

Mar 19, 2026

3.0.26

Mar 19, 2026

3.0.25

Mar 19, 2026

3.0.24

Mar 18, 2026

3.0.23

Mar 17, 2026

3.0.22

Mar 17, 2026

3.0.21

Mar 17, 2026

3.0.20

Mar 16, 2026

3.0.19

Mar 16, 2026

3.0.18

Mar 16, 2026

3.0.17

Mar 12, 2026

3.0.16

Mar 12, 2026

3.0.15

Mar 12, 2026

3.0.14

Mar 12, 2026

3.0.13

Mar 11, 2026

3.0.12

Mar 10, 2026

3.0.11

Mar 8, 2026

3.0.10

Mar 7, 2026

3.0.9

Mar 7, 2026

3.0.8

Mar 7, 2026

3.0.7

Mar 7, 2026

3.0.6

Mar 7, 2026

3.0.5

Mar 6, 2026

3.0.4

Mar 6, 2026

3.0.3

Mar 6, 2026

3.0.2

Mar 6, 2026

3.0.1

Mar 6, 2026

2.0.29

Mar 6, 2026

2.0.28

Mar 6, 2026

2.0.27

Mar 6, 2026

2.0.26

Mar 6, 2026

This version

2.0.25

Mar 6, 2026

2.0.24

Mar 6, 2026

2.0.23

Feb 5, 2026

2.0.22

Feb 4, 2026

2.0.21

Feb 1, 2026

2.0.20

Jan 30, 2026

2.0.19

Jan 30, 2026

2.0.18

Jan 30, 2026

2.0.17

Jan 30, 2026

2.0.16

Jan 29, 2026

2.0.15

Jan 28, 2026

2.0.14

Jan 28, 2026

2.0.13

Jan 26, 2026

2.0.12

Jan 18, 2026

2.0.11

Jan 17, 2026

2.0.10

Jan 17, 2026

2.0.9

Jan 16, 2026

2.0.8

Jan 16, 2026

2.0.7

Jan 15, 2026

2.0.6

Jan 15, 2026

2.0.5

Jan 15, 2026

2.0.4

Jan 15, 2026

2.0.3

Jan 15, 2026

2.0.2

Jan 15, 2026

2.0.1

Jan 15, 2026

1.2.69

Jan 15, 2026

1.2.68

Jan 14, 2026

1.2.67

Jan 14, 2026

1.2.66

Jan 14, 2026

1.2.65

Jan 12, 2026

1.2.64

Jan 12, 2026

1.2.63

Jan 12, 2026

1.2.62

Jan 12, 2026

1.2.61

Jan 12, 2026

1.2.60

Jan 12, 2026

1.2.59

Jan 12, 2026

1.2.58

Jan 12, 2026

1.2.57

Jan 12, 2026

1.2.56

Jan 12, 2026

1.2.55

Jan 2, 2026

1.2.54

Oct 21, 2025

1.2.53

Sep 15, 2025

1.2.52

Sep 15, 2025

1.2.51

Aug 28, 2025

1.2.50

Aug 16, 2025

1.2.49

Aug 16, 2025

1.2.48

Aug 13, 2025

1.2.47

Aug 11, 2025

1.2.46

Aug 11, 2025

1.2.45

Aug 11, 2025

1.2.44

Aug 9, 2025

1.2.43

Aug 9, 2025

1.2.42

Aug 9, 2025

1.2.41

Aug 9, 2025

1.2.40

Aug 8, 2025

1.2.39

Aug 8, 2025

1.2.38

Aug 8, 2025

1.2.37

Aug 8, 2025

1.2.36

Aug 7, 2025

1.2.35

Aug 6, 2025

1.2.34

Aug 6, 2025

1.2.33

Aug 5, 2025

1.2.32

Aug 5, 2025

1.2.31

Aug 5, 2025

1.2.30

Aug 5, 2025

1.2.29

Aug 3, 2025

1.2.28

Aug 3, 2025

1.2.27

Aug 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_refchecker-2.0.25.tar.gz (959.4 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

academic_refchecker-2.0.25-py3-none-any.whl (987.0 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file academic_refchecker-2.0.25.tar.gz.

File metadata

Download URL: academic_refchecker-2.0.25.tar.gz
Upload date: Mar 6, 2026
Size: 959.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for academic_refchecker-2.0.25.tar.gz
Algorithm	Hash digest
SHA256	`f251fe97b85fb77584f218618eaefad7e640bd63d073fbedf5a2fa8ec48cccc1`
MD5	`5a6a6b03b550b7f6b1df1a34d277f494`
BLAKE2b-256	`5ed51c70942a7f57243c70e51418d8ccc29e0e8e70b45b7358f550f30b4e1059`

See more details on using hashes here.

File details

Details for the file academic_refchecker-2.0.25-py3-none-any.whl.

File metadata

Download URL: academic_refchecker-2.0.25-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 987.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for academic_refchecker-2.0.25-py3-none-any.whl
Algorithm	Hash digest
SHA256	`058a6cc27ef79ebaf33b8e3e7c86620f753a124ba70215eb0a070aac020d45a6`
MD5	`2637076f40d125ed3ef0c1d5588b4898`
BLAKE2b-256	`bf2af3a5206ce3cd7ca0c1ddabcad4e6721e6a773c1a92c5c2883ec98b7b21f5`

See more details on using hashes here.

academic-refchecker 2.0.25

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RefChecker

Contents

Quick Start

Web UI (Docker)

Web UI (pip)

CLI (pip)

Features

Sample Output

Install

PyPI (Recommended)

From Source (Development)

Run

Web UI

Development (frontend)

Multi-User Hosted Server (OAuth)

1. Generate a JWT Secret Key

2. Register an OAuth Application

3. Configure Environment Variables

4. Launch with Docker Compose

Local / Development Launch

Notes

Docker

Quick Start

With LLM API Key

Persistent Data

Docker Compose

Available Tags

CLI

Output

Configure

LLM

Local models (vLLM)

Command Line

Environment Variables

Local Database

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes