Skip to main content

A tool for detecting and quantifying hallucinations in LLM responses through context and common knowledge verification

Project description

Hallucination Detection Model (HDM-2)

AIMon Logo

Paper: arXiv Badge
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification.
Notebook: Colab Badge
HDM-2-3B Model: HF Model Badge
HDM-Bench Dataset: HF Dataset Badge

AIMon's Hallucination Detection Model-2 (HDM-2) is a powerful tool for identifying hallucinations in large language model (LLM) responses. This repository contains the inference code for HDM-2, allowing developers to integrate hallucination detection into their AI pipelines.

Features

LLM Response Taxonomy

As shown in the figure above, an LLM response can be broken down into context based generation, common knowledge based generation, enterprise knowledge based generation and innocuous statments.

HDM-2 offers the following features to help classify the output into this taxonomy.

  • Token-level Detection: Identifies specific hallucinated words and spans
  • Sentence-level Classification: Classifies entire sentences as hallucinated or factual
  • Severity Scoring: Provides a quantitative measure of hallucination severity
  • Flexible Integration: Easy to integrate with existing LLM applications
  • Optimized Performance: Supports both CPU and GPU inference with optional quantization

Installation

From PyPI (Recommended)

pip install hdm2

From Source

git clone https://github.com/aimonlabs/hallucination-detection-model.git
cd hallucination-detection-model
pip install -e .

For GPU acceleration (recommended for production use):

pip install hdm2[gpu]

Quick Start

from hdm2 import HallucinationDetectionModel

# Initialize the model
hdm = HallucinationDetectionModel()

# Prepare your inputs
prompt = "Describe what penguins are"
context = """
Penguins are flightless aquatic birds that live almost exclusively in the Southern Hemisphere. They are highly adapted for life in the water, with a countershaded dark and white plumage.
"""
response = """
Penguins are flightless aquatic birds that have evolved to thrive in cold environments, primarily in the Southern Hemisphere. Their bodies are perfectly adapted for marine life - they have wings that have evolved into flippers for swimming, dense waterproof feathers for insulation, and a countershaded dark and white plumage that provides camouflage while swimming. The black back and white front coloration helps them blend in when viewed from above or below in the water. Penguins feed primarily on fish, squid, and krill, which they catch while swimming underwater. They are highly social birds that nest in colonies, sometimes containing thousands of individuals. Of the 18 penguin species, the Emperor penguin is the largest, standing about 1.1 meters tall, while the Little Blue penguin is the smallest at around 40 centimeters.
"""

# Detect hallucinations
results = hdm.apply(prompt, context, response)

# Check results
if results['hallucination_detected']:
    print(f"Hallucination detected with severity: {results['adjusted_hallucination_severity']:.4f}")
    
    # Print hallucinated sentences
    print("\nHallucinated sentences:")
    for sentence_result in results['ck_results']:
        if sentence_result['prediction'] == 1:  # 1 indicates hallucination
            print(f"- {sentence_result['text']}")
else:
    print("No hallucinations detected.")

Advanced Usage

Customizing Detection Parameters

# Initialize with custom device and quantization options
hdm = HallucinationDetectionModel(
    device="cuda",  # Force CUDA (GPU) usage
    load_in_8bit=True  # Use 8-bit quantization to reduce memory usage
)

# Customize detection thresholds and options
results = hdm.apply(
    prompt=prompt,
    context=context, 
    response=response,
    token_threshold=0.6,  # Increase token-level threshold (0-1)
    ck_threshold=0.8,     # Increase sentence-level threshold (0-1)
    debug=True            # Enable debug output
)

Loading from Local Path

If you've previously downloaded the model:

hdm = HallucinationDetectionModel(
    model_components_path="path/to/model_components/",
    ck_classifier_path="path/to/ck_classifier/"
)

Detection with word-level annotations

from hdm2.utils.render_utils import display_hallucination_results_words

display_hallucination_results_words(
    results,
    show_scores=False, # True if you want to display scores alongside the candidate words
    color_scheme="blue-red",
    separate_classes=True, # False if you don't want separate colors for Common Knowledge sentences
)

Please refer to the model page on HuggingFace for an example on how to display word-level annotations for inspecting the output of the model.

An example from a different call is shown below.

  • Color tones indicate the scores (darker color means higher score).
  • Words with red background are hallucinations.
  • Words with blue background are context-hallucinations but marked as problem-free by the common-knowledge checker.
  • Words with white background are problem-free text.
  • Finally, all the candidate sentences (sentences that contain context-hallucinations) are shown at the bottom, together with results from the common-knowledge checker.

image/png

Notice that

  • Innocuous statements like Can I help you with something else?, and Hi, I'm an AIMon bot are not marked as hallucinations.
  • Common-knowledge statements are correctly filtered out by the common-knowledge checker, even though they are not present in the context, e.g., Heart disease remains the leading cause of death globally, according to the World Health Organization.
  • Statements with enterprise knowledge cannot be handled by this model. Please contact us if you want to use additional capabilities for your use-cases.

Output Format

The apply() method returns a dictionary with the following keys:

  • hallucination_detected (bool): Whether any hallucination was detected
  • hallucination_severity (float): Overall hallucination severity score (0.0-1.0)
  • adjusted_hallucination_severity(float): Adjusted hallucination severity score (0.0-1.0) that incorporates the results from the common knowledge model. It's value is 0.0 if all candidate sentences are common knowledge.
  • ck_results (list): Per-sentence results with hallucination probabilities
  • high_scoring_words (list): Words/spans with high hallucination scores
  • candidate_sentences (list): Sentences with potential hallucinations

Model Weights and Evaluation Dataset on HuggingFace 🤗

As a service to the community, we are releasing the weights for our 3B parameter model, along with the evaluation split of our dataset HDMBench. Please refer to the paper (linked below) for details on the dataset and the model architecture.

Note that this dataset is meant only for benchmarking, and it should not be used for training or hyperparameter-tuning.

Model weights on HF here.

HDMBench evaluation split on HF here.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Please reach out to us for enterprise and commercial licensing. Contact us at info@aimon.ai.

This project is licensed under the terms of the license included here Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

Citation

The full-text of our paper 📃 is available on arXiv here.

If you use HDM-2 in your research, please cite:

@misc{paudel2025hallucinothallucinationdetectioncontext,
      title={HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification}, 
      author={Bibek Paudel and Alexander Lyzhov and Preetam Joshi and Puneet Anand},
      year={2025},
      eprint={2504.07069},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.07069}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdm2-0.7.0.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdm2-0.7.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file hdm2-0.7.0.tar.gz.

File metadata

  • Download URL: hdm2-0.7.0.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for hdm2-0.7.0.tar.gz
Algorithm Hash digest
SHA256 9b72503f01702ea88ace09d2822a698b5748a14ccc87e46cae6d530846d4a980
MD5 0b6ee8adf1799a1c6934cb7811879f6f
BLAKE2b-256 250dd600bc1f9e4f830bd933b667082acea2b84405d4accdc14256366876ff1c

See more details on using hashes here.

File details

Details for the file hdm2-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: hdm2-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for hdm2-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b4598d5cc0d6007c2444136329d2c2355c1744a5f63dc37a7d94365ebcc9eb9
MD5 fb74df8c26038d4e587b65162e4f1bab
BLAKE2b-256 f662546f58694b01ab103fc337aa102676b5e4ea9166b386b74a6a9688695f0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page