Cross-check LLM output against your real codebase to detect hallucinated references
Project description
hallucination-grep
Cross-check LLM output against your real codebase. Finds references to functions, classes, files, and imports that the LLM mentioned but that don't actually exist in your code.
No external AI dependencies. Pure Python + AST analysis.
Install
pip install hallucination-grep
Or install from source:
git clone https://github.com/yourname/hallucination-grep
cd hallucination-grep
pip install -e .
Usage
# From a file
hallucination-grep response.txt --codebase ./src
# From stdin (pipe from clipboard or LLM)
echo "Use the get_user_profile() function..." | hallucination-grep --codebase .
# Specific checks only
hallucination-grep response.txt --codebase . --check functions,files
# JSON output for CI
hallucination-grep response.txt --codebase . --json
Example output
╔═════════════════════════════╗
║ HALLUCINATION GREP REPORT ║
╚═════════════════════════════╝
Scanned: 13 lines of LLM output
Codebase: src (4 Python files, 4 total files indexed)
HALLUCINATIONS DETECTED: 5
╭──────────────────────────────────────────────────────╮
│ ✗ Function: get_user_profile() │
│ Mentioned in: line 3 of input │
│ Status: Does not exist in codebase │
│ Similar: get_user_data() in src/users.py:5 │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│ ✗ File: src/helpers/formatter.py │
│ Mentioned in: line 3 of input │
│ Status: File does not exist │
│ Similar: src/utils/format.py │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│ ✗ Import: from config import DATABASE_URL │
│ Mentioned in: line 8 of input │
│ Status: Does not exist in codebase │
│ Similar: config.settings (module) │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│ ✗ Class: DataProcessor │
│ Mentioned in: line 4 of input │
│ Status: Does not exist │
│ Similar: none found │
╰──────────────────────────────────────────────────────╯
VERIFIED REFERENCES: 4
✓ Function: get_user_data() (src/users.py:5)
✓ Class: UserManager (src/users.py:1)
✓ Function: save() (src/base.py:2)
✓ File: src/utils/format.py
Hallucination rate: 55% (5 of 9 code references)
How it works
-
Extract references from the LLM output using regex:
- Function calls:
get_user_profile(),fetchResults() - Class names:
UserManager,DataProcessor - File paths:
src/utils.py,config/settings.json - Import statements:
from utils import format_response - Method calls:
.save(),.process() - Constants:
DATABASE_URL,SECRET_KEY
- Function calls:
-
Index the codebase using Python's
astmodule:- All defined functions and methods (with file + line)
- All class definitions
- All existing files
- All module-level variable assignments
-
Cross-reference: flag anything the LLM mentioned that doesn't exist
-
Similarity suggestions: use
difflib.get_close_matches()to suggest what the LLM might have meant
Options
| Flag | Description |
|---|---|
--codebase PATH |
Directory to index (required) |
--check LIST |
Comma-separated check types: functions,classes,files,imports,methods,variables |
--json |
Machine-readable JSON output |
--min-confidence FLOAT |
Filter hallucinations by confidence threshold |
--no-color |
Disable Rich color output |
CI integration
Exit code is 1 when hallucinations are found, 0 when clean. Use with
--json for structured output:
# .github/workflows/llm-check.yml
- name: Check LLM response for hallucinations
run: |
hallucination-grep llm_output.txt --codebase ./src --json > hallucination_report.json
cat hallucination_report.json
Architecture
src/hallucination_grep/
├── __init__.py # Public API
├── cli.py # Click entry point, Rich output
├── extractor.py # Regex-based reference extraction from LLM text
├── indexer.py # AST-based codebase indexing
└── checker.py # Cross-reference + similarity matching
Development
pip install -e ".[dev]"
pytest
black .
mypy src/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hallucination_grep-0.1.0.tar.gz.
File metadata
- Download URL: hallucination_grep-0.1.0.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be46036ba236e2d4365d823a2dda41f53b66f74ce64a6a52621818e42223697b
|
|
| MD5 |
9808c22bf379b14ee71e79d060e0b75c
|
|
| BLAKE2b-256 |
a563dc6a8d5ea9c44eb4fbc7f4482e76dc35f9ac7b8bee6da75919e4332ffe8e
|
File details
Details for the file hallucination_grep-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hallucination_grep-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3ed27bfd6f320fef58061fd4db42776d8a93417a8f910ea74b6ae141876f64e
|
|
| MD5 |
e66c9bb5e529a6ab802ff20d6c4004b8
|
|
| BLAKE2b-256 |
791f3b1834314e72e7f362aef38fdb8e5cfde56d52f054db64046922e18a8c8d
|