Detect and reduce LLM hallucinations with semantic entropy, log-probability analysis, and web grounding

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

catch-cap

for detecting and reducing hallucinations in Large Language Model responses through semantic entropy analysis, log-probability monitoring, and web-grounded fact-checking.

Installation

pip install catch-cap

API Keys Setup

Set your API keys as environment variables:

export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key" 
export GROQ_API_KEY="your-groq-key"
export TAVILY_API_KEY="your-tavily-key"

Or use a .env file:

OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key
GROQ_API_KEY=your-groq-key
TAVILY_API_KEY=your-tavily-key

What's New in v0.2.2

catch-cap v0.2.2 is now available – a major update transforming catch-cap into a production-ready hallucination detection middleware. This release brings reliability, speed, and better insight for anyone working with LLM outputs.

Highlights:

Confidence Scoring: Each detection gives a 0–1 confidence score and a human-readable level ("High", "Medium", "Low").
Rate Limiting: Throttle model/API usage to prevent overages and stay within quotas.
Structured Logging: New logging for full pipeline observability and debug support.
Graceful Degradation: If a component fails (e.g., web search times out), detection keeps going using available data.
Automatic Retries: All API/network calls retry on transient errors, with exponential backoff.
10x Faster Embeddings: Embeddings are batched for performance and cost-efficiency.
Extensive Metadata: Results include reasons, detection time, and methods used.

For full details, new configuration options, and migration guidance, see the v0.2.2 Release Notes.

Quick Start

import asyncio
from catch_cap import CatchCap, CatchCapConfig, ModelConfig

async def main():
    config = CatchCapConfig(
        generator=ModelConfig(provider="openai", name="gpt-4.1-mini"),
    )
    
    detector = CatchCap(config)
    result = await detector.run("How many r's are there in strawberry?")
    
    print(f"Confabulation detected: {result.confabulation_detected}")
    if result.corrected_answer:
        print(f"Corrected answer: {result.corrected_answer}")

asyncio.run(main())

Supported Models

OpenAI

Text Generation: all models except thinking models Embeddings: text-embedding-3-large, text-embedding-3-small
Log Probabilities: Supported

Google Gemini

Text Generation: all models except thinking models Embeddings: text-embedding-004, embedding-001
Log Probabilities: Not supported

Groq

Text Generation: all models except thinking models Embeddings: Use OpenAI or Gemini
Log Probabilities: Limited support

Configuration

Basic Configuration

from catch_cap import CatchCapConfig, ModelConfig

config = CatchCapConfig(
    generator=ModelConfig(provider="openai", name="gpt-4.1-mini")
)

Full Configuration

from catch_cap import *

config = CatchCapConfig(
    generator=ModelConfig(
        provider="openai", 
        name="gpt-4.1-mini",
        temperature=0.7,
        max_tokens=500
    ),
    semantic_entropy=SemanticEntropyConfig(
        enabled=True,
        n_responses=3,
        threshold=0.3,
        embedding_model="text-embedding-3-large",
        embedding_provider="openai"
    ),
    logprobs=LogProbConfig(
        enabled=True,
        min_logprob=-5.0,
        fraction_threshold=0.15,
        min_flagged_tokens=5
    ),
    web_search=WebSearchConfig(
        provider="tavily",
        max_results=3,
        synthesizer_model=ModelConfig(provider="openai", name="gpt-4.1-nano")
    ),
    judge=JudgeConfig(
        model=ModelConfig(provider="openai", name="gpt-4.1-nano"),
        instructions="Compare responses for factual accuracy. Return CONSISTENT or INCONSISTENT only."
    ),
    enable_correction=True
)

Usage Examples

Minimal Setup

config = CatchCapConfig(
    generator=ModelConfig(provider="openai", name="gpt-4.1-mini")
)
detector = CatchCap(config)
result = await detector.run("How many r's are there in strawberry?")

Semantic Entropy Only

config = CatchCapConfig(
    generator=ModelConfig(provider="gemini", name="gemini-2.0-flash"),
    semantic_entropy=SemanticEntropyConfig(n_responses=5, threshold=0.2),
    logprobs=LogProbConfig(enabled=False),
    web_search=WebSearchConfig(provider="none")
)

Maximum Detection

config = CatchCapConfig(
    generator=ModelConfig(provider="openai", name="gpt-4.1-mini"),
    semantic_entropy=SemanticEntropyConfig(n_responses=5, threshold=0.2),
    logprobs=LogProbConfig(min_logprob=-4.0, fraction_threshold=0.1),
    web_search=WebSearchConfig(
        provider="tavily",
        synthesizer_model=ModelConfig(provider="openai", name="gpt-4.1-nano")
    ),
    judge=JudgeConfig(model=ModelConfig(provider="openai", name="gpt-4"))
)

Production Setup

from dotenv import load_dotenv
load_dotenv()

config = CatchCapConfig(
    generator=ModelConfig(provider="openai", name="gpt-4.1-mini", temperature=0.7),
    semantic_entropy=SemanticEntropyConfig(n_responses=3, threshold=0.3),
    web_search=WebSearchConfig(
        provider="tavily",
        synthesizer_model=ModelConfig(provider="openai", name="gpt-4.1-nano")
    ),
    judge=JudgeConfig(model=ModelConfig(provider="openai", name="gpt-4.1-nano"))
)

Result Analysis

result = await detector.run("Your_query_here")

# Basic results
print(f"Query: {result.query}")
print(f"Confabulation detected: {result.confabulation_detected}")
print(f"Original response: {result.responses[0].text}")

# Semantic entropy analysis
if result.semantic_entropy:
    print(f"Entropy score: {result.semantic_entropy.entropy_score}")
    print(f"Model confident: {result.semantic_entropy.is_confident}")

# Log probability analysis  
if result.logprob_analysis:
    print(f"Suspicious tokens ratio: {result.logprob_analysis.flagged_token_ratio}")
    print(f"Total flagged tokens: {result.logprob_analysis.flagged_token_count}")

# Judge verdict
if result.judge_verdict:
    print(f"Judge verdict: {result.judge_verdict.verdict}")
    print(f"Factually consistent: {result.judge_verdict.is_consistent}")

# Corrections
if result.corrected_answer:
    print(f"Corrected answer: {result.corrected_answer}")

Error Handling

from catch_cap.exceptions import CatchCapError, ProviderNotAvailableError

try:
    result = await detector.run(query)
except ProviderNotAvailableError:
    print("Model provider not available")
except CatchCapError as e:
    print(f"Detection error: {e}")

Batch Processing

queries = ["Query 1", "Query 2", "Query 3"]
results = []

for query in queries:
    result = await detector.run(query)
    results.append(result)
    print(f"Query: {query}")
    print(f"Confabulation detected: {result.confabulation_detected}")

Configuration Reference

ModelConfig

ModelConfig(
    provider="openai",     # "openai", "gemini", or "groq"
    name="gpt-4.1-mini",         # Model name
    temperature=0.7,      # Sampling temperature (0.0-2.0)
    top_p=0.9,           # Nucleus sampling
    max_tokens=1000,     # Max output tokens
    extra_args={}        # Provider-specific args
)

SemanticEntropyConfig

SemanticEntropyConfig(
    enabled=True,                    # Enable semantic entropy detection
    n_responses=5,                   # Number of responses to generate
    threshold=0.25,                  # Entropy threshold (lower = more confident)
    embedding_model="text-embedding-3-small",
    embedding_provider="openai"
)

LogProbConfig

LogProbConfig(
    enabled=True,              # Enable log-prob analysis
    min_logprob=-4.5,         # Token log-prob threshold
    fraction_threshold=0.2,    # Fraction of flagged tokens needed
    min_flagged_tokens=5       # Minimum flagged tokens to trigger
)

WebSearchConfig

WebSearchConfig(
    provider="tavily",         # "tavily", "searxng", or "none"
    max_results=5,             # Number of search results
    timeout_seconds=20,        # Search timeout
    searxng_url="http://localhost:8080/search",  # For SearXNG
    synthesizer_model=ModelConfig(provider="openai", name="gpt-4.1-mini")
)

JudgeConfig

JudgeConfig(
    model=ModelConfig(provider="openai", name="gpt-4.1-mini"),
    instructions="Compare responses for factual accuracy. Return CONSISTENT or INCONSISTENT only.",
    acceptable_labels=("CONSISTENT", "INCONSISTENT")
)

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.2.2

Oct 15, 2025

0.2.0

Oct 15, 2025

0.1.4

Sep 25, 2025

0.1.3

Sep 25, 2025

0.1.2

Sep 25, 2025

0.1.1

Sep 24, 2025

0.1.0

Sep 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catch_cap-0.2.2.tar.gz (21.0 kB view details)

Uploaded Oct 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

catch_cap-0.2.2-py3-none-any.whl (26.8 kB view details)

Uploaded Oct 15, 2025 Python 3

File details

Details for the file catch_cap-0.2.2.tar.gz.

File metadata

Download URL: catch_cap-0.2.2.tar.gz
Upload date: Oct 15, 2025
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for catch_cap-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`a6c4532444f4cb1cb3c69e587143cad922055f269d4f1667d3cc5d0a27e6481b`
MD5	`9e5b8c4ff8dcb44a83c659c0b9d9cc27`
BLAKE2b-256	`a74b1568d1e60960fd3053e7d3b6003141cf841fc9449526d17392613e914c86`

See more details on using hashes here.

File details

Details for the file catch_cap-0.2.2-py3-none-any.whl.

File metadata

Download URL: catch_cap-0.2.2-py3-none-any.whl
Upload date: Oct 15, 2025
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for catch_cap-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e67774b882c7582e6039b59ec7f0c284cee4f5fb16e4ae6ec743ef909c267b6`
MD5	`00284f66a797f96880d1bfeae26c32e2`
BLAKE2b-256	`93f9763284b6917270d46810d393e1dfecc63d293580f8144a108d4f4cc13580`

See more details on using hashes here.

catch-cap 0.2.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

catch-cap

Installation

API Keys Setup

What's New in v0.2.2

Quick Start

Supported Models

OpenAI

Google Gemini

Groq

Configuration

Basic Configuration

Full Configuration

Usage Examples

Minimal Setup

Semantic Entropy Only

Maximum Detection

Production Setup

Result Analysis

Error Handling

Batch Processing

Configuration Reference

ModelConfig

SemanticEntropyConfig

LogProbConfig

WebSearchConfig

JudgeConfig

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes