Geometric LLM hallucination detection. No second LLM. Deterministic. Auditable.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

factlens

These details have not been verified by PyPI

Project links

Project description

Geometric LLM hallucination detection. No second LLM. Deterministic. Auditable.

Documentation | Research Papers | Examples | Contributing

factlens detects LLM hallucinations using embedding geometry instead of a second LLM. It computes deterministic, auditable scores from the spatial relationships between questions, responses, and source context in an embedding space. The result is a verification signal you can explain in an audit, reproduce on demand, and run in regulated environments.

Why factlens?

Problem	How factlens solves it
Second-LLM judges are non-deterministic and expensive	Single embedding model (`all-mpnet-base-v2`), deterministic output, sub-second latency
Probabilistic scores cannot be audited	Geometric ratios and angular measurements with clear mathematical definitions
Regulatory compliance requires explainability	Every score traces to Euclidean distances and cosine similarities in $R^n$
One method does not fit all use cases	SGI for RAG/context verification, DGI for context-free chat, `evaluate()` auto-selects

SGI: Semantic Grounding Index | DGI: Directional Grounding Index

Installation

pip install factlens

With LLM provider support:

pip install "factlens[openai]"       # OpenAI
pip install "factlens[anthropic]"    # Anthropic
pip install "factlens[google]"       # Google Generative AI
pip install "factlens[providers]"    # All providers

With framework integrations:

pip install "factlens[langchain]"    # LangChain
pip install "factlens[crewai]"       # CrewAI
pip install "factlens[semantic-kernel]"  # Semantic Kernel
pip install "factlens[autogen]"      # AutoGen
pip install "factlens[all]"          # Everything

Requirements: Python 3.10+, numpy, sentence-transformers.

Quick start

SGI -- with context (RAG verification)

SGI (Semantic Grounding Index) measures whether a response engaged with the provided context or stayed anchored to the question. It requires three inputs.

from factlens import compute_sgi

result = compute_sgi(
    question="What is the capital of France?",
    context="France is in Western Europe. Its capital is Paris.",
    response="The capital of France is Paris.",
)

print(result.value)       # 1.23 — ratio of distances
print(result.normalized)  # 0.61 — mapped to [0, 1]
print(result.flagged)     # False — above review threshold
print(result.explanation) # "SGI=1.230 — strong context engagement (pass)"

Interpretation: SGI > 1.0 means the response is closer to the context than to the question in embedding space. The response engaged with the source material.

DGI -- without context

DGI (Directional Grounding Index) detects hallucinations without requiring source context. It checks whether the question-to-response displacement vector aligns with the characteristic direction of verified grounded responses.

from factlens import compute_dgi

result = compute_dgi(
    question="What causes seasons on Earth?",
    response="Seasons are caused by Earth's 23.5-degree axial tilt.",
)

print(result.value)       # 0.42 — cosine similarity to reference direction
print(result.normalized)  # 0.71 — mapped to [0, 1]
print(result.flagged)     # False — above pass threshold (0.30)

Domain calibration improves DGI accuracy from AUROC ~0.76 (generic) to 0.90-0.99:

from factlens import compute_dgi

result = compute_dgi(
    question="What is the statute of limitations for breach of contract in California?",
    response="Four years under California Code of Civil Procedure Section 337.",
    reference_csv="legal_calibration_pairs.csv",
)

evaluate() -- auto-select

The evaluate() function picks the right method automatically: SGI when context is provided, DGI when it is not.

from factlens import evaluate

# With context -> SGI
score = evaluate(
    question="What is X?",
    response="X is Y.",
    context="According to the manual, X is Y.",
)
assert score.method == "sgi"

# Without context -> DGI
score = evaluate(
    question="What is X?",
    response="X is Y.",
)
assert score.method == "dgi"

Batch evaluation

from factlens import evaluate_batch

items = [
    {"question": "Q1?", "response": "A1.", "context": "Source."},
    {"question": "Q2?", "response": "A2."},
    {"question": "Q3?", "response": "A3.", "context": "Reference."},
]

results = evaluate_batch(items)
flagged = [r for r in results if r.flagged]
print(f"{len(flagged)}/{len(results)} flagged for review")

CLI

# Single response check
factlens check \
  --question "What is the capital of France?" \
  --response "The capital of France is Paris." \
  --context "France is in Western Europe. Its capital is Paris."

# Batch CSV evaluation
factlens evaluate input.csv --output results.csv

# Domain calibration
factlens calibrate --pairs domain_pairs.csv --output calibration.json

# Run the confabulation benchmark
factlens benchmark

LLM provider guard

from factlens.providers.openai import OpenAIProvider

provider = OpenAIProvider(model="gpt-4o")
response = provider.complete(
    prompt="Summarize this document.",
    context="The document text here...",
)

if response.factlens_score and response.factlens_score.flagged:
    print("Hallucination risk detected — review recommended.")
else:
    print(response.text)

Architecture

factlens/
├── __init__.py              # Public API: compute_sgi, compute_dgi, evaluate, calibrate
├── sgi.py                   # Semantic Grounding Index (context-required)
├── dgi.py                   # Directional Grounding Index (context-free)
├── evaluate.py              # High-level evaluate() and evaluate_batch()
├── calibrate.py             # Domain-specific DGI calibration
├── score.py                 # Result types: SGIResult, DGIResult, FactlensScore
├── _version.py              # CalVer version (2026.4.28)
├── _internal/               # Private implementation
│   ├── geometry.py          # Euclidean distance, displacement, unit normalize
│   ├── embeddings.py        # Sentence transformer encoding
│   ├── thresholds.py        # Decision boundaries and normalization
│   └── csv_loader.py        # Calibration data loading
├── cli/
│   └── main.py              # CLI: check, evaluate, calibrate, benchmark
├── providers/               # LLM provider wrappers
│   ├── _base.py             # BaseLLMProvider protocol + LLMResponse
│   ├── openai.py            # OpenAI provider
│   ├── anthropic.py         # Anthropic provider
│   └── google.py            # Google Generative AI provider
└── integrations/            # Framework integrations
    ├── langchain/           # LangChain evaluator + callback
    ├── crewai/              # CrewAI tool
    ├── semantic_kernel/     # Semantic Kernel filter
    └── autogen/             # AutoGen checker

The architecture follows a layered design:

┌─────────────────────────────────────────────┐
│            Public API (evaluate)             │
├──────────────────┬──────────────────────────┤
│   SGI (sgi.py)   │      DGI (dgi.py)        │
├──────────────────┴──────────────────────────┤
│        _internal (geometry, embeddings)      │
├─────────────────────────────────────────────┤
│  sentence-transformers (all-mpnet-base-v2)   │
└─────────────────────────────────────────────┘
         ▲                          ▲
         │                          │
   ┌─────┴─────┐            ┌──────┴──────┐
   │ Providers  │            │Integrations │
   │ (OpenAI,   │            │ (LangChain, │
   │  Anthropic,│            │  CrewAI,    │
   │  Google)   │            │  SK, AutoGen│
   └────────────┘            └─────────────┘

Scoring methods

SGI (Semantic Grounding Index)

SGI = dist(phi(response), phi(question)) / dist(phi(response), phi(context))

Score	Interpretation
SGI > 1.20	Strong context engagement (pass)
0.95 < SGI < 1.20	Partial engagement (review recommended)
SGI < 0.95	Weak engagement (flagged)

DGI (Directional Grounding Index)

delta = phi(response) - phi(question)
DGI = dot(delta / ||delta||, mu_hat)

Score	Interpretation
DGI > 0.30	Aligns with grounded patterns (pass)
0.00 < DGI < 0.30	Weak alignment (flagged)
DGI < 0.00	Opposes grounded direction (high risk)

Providers and integrations

Component	Install extra	Description
OpenAI	`openai`	Wraps `openai` SDK with automatic scoring
Anthropic	`anthropic`	Wraps `anthropic` SDK with automatic scoring
Google	`google`	Wraps `google-generativeai` with automatic scoring
LangChain	`langchain`	Evaluator + callback handler
CrewAI	`crewai`	Tool for agent pipelines
Semantic Kernel	`semantic-kernel`	Function calling filter
AutoGen	`autogen`	Agent chat checker

Domain calibration

Generic DGI uses a bundled reference direction that achieves AUROC ~0.76. For production use, calibrate with 20-100 verified question-response pairs from your domain:

from factlens import calibrate

result = calibrate(csv_path="my_domain_pairs.csv")
print(f"Concentration: {result.concentration:.2f}")
result.save("calibration.json")

Domain-specific calibration typically reaches AUROC 0.90-0.99. The confabulation benchmark (arXiv:2603.13259) reports DGI AUROC 0.958 with domain calibration.

Research

factlens implements the methods described in three peer-reviewed papers:

Semantic Grounding Index (SGI) Marin, J. (2025). Semantic Grounding Index for LLM Hallucination Detection. arXiv:2512.13771
Directional Grounding Index (DGI) Marin, J. (2026). A Geometric Taxonomy of Hallucinations in Large Language Models. arXiv:2602.13224
Confabulation Benchmark Marin, J. (2026). Rotational Dynamics of Factual Constraint Processing in Large Language Models. arXiv:2603.13259

Contributing

See CONTRIBUTING.md for development setup, code standards, and PR process.

License

MIT -- Javier Marin (javier@jmarin.info)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

factlens

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2026.4.28.2

Apr 28, 2026

2026.4.28.1

Apr 28, 2026

This version

2026.4.28

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factlens-2026.4.28.tar.gz (75.1 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

factlens-2026.4.28-py3-none-any.whl (79.8 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file factlens-2026.4.28.tar.gz.

File metadata

Download URL: factlens-2026.4.28.tar.gz
Upload date: Apr 28, 2026
Size: 75.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for factlens-2026.4.28.tar.gz
Algorithm	Hash digest
SHA256	`9e602721e6e58b1a8aa503d7eb6c9c0876c8b23f65ca8e3cb111e62a25a5595b`
MD5	`88e428553867e4eb18a8eb391988e491`
BLAKE2b-256	`907408b7e9a3a68eb7fcd78801b6ad0137019e5370bf1baea115a5c3f489885e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for factlens-2026.4.28.tar.gz:

Publisher: release.yml on factlens/factlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: factlens-2026.4.28.tar.gz
- Subject digest: 9e602721e6e58b1a8aa503d7eb6c9c0876c8b23f65ca8e3cb111e62a25a5595b
- Sigstore transparency entry: 1397447972
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: factlens/factlens@2dca855d20d85f3288cd5c7ab45361cfce7ad7bd
- Branch / Tag: refs/tags/v2026.4.28
- Owner: https://github.com/factlens
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2dca855d20d85f3288cd5c7ab45361cfce7ad7bd
- Trigger Event: push

File details

Details for the file factlens-2026.4.28-py3-none-any.whl.

File metadata

Download URL: factlens-2026.4.28-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 79.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for factlens-2026.4.28-py3-none-any.whl
Algorithm	Hash digest
SHA256	`551cd28a9ad731d1b47348bb5e0022a42939949444b5c7e51811ec9987cef7dd`
MD5	`5ba562734b58c8ba4cb70e797ec29a9d`
BLAKE2b-256	`409df6bb5afce9ee797c52e2688a892511b684012d109add2774cb7818808cd7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for factlens-2026.4.28-py3-none-any.whl:

Publisher: release.yml on factlens/factlens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: factlens-2026.4.28-py3-none-any.whl
- Subject digest: 551cd28a9ad731d1b47348bb5e0022a42939949444b5c7e51811ec9987cef7dd
- Sigstore transparency entry: 1397448018
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: factlens/factlens@2dca855d20d85f3288cd5c7ab45361cfce7ad7bd
- Branch / Tag: refs/tags/v2026.4.28
- Owner: https://github.com/factlens
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2dca855d20d85f3288cd5c7ab45361cfce7ad7bd
- Trigger Event: push

factlens 2026.4.28

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Geometric LLM hallucination detection. No second LLM. Deterministic. Auditable.

Why factlens?

Installation

Quick start

SGI -- with context (RAG verification)

DGI -- without context

evaluate() -- auto-select

Batch evaluation

CLI

LLM provider guard

Architecture

Scoring methods

SGI (Semantic Grounding Index)

DGI (Directional Grounding Index)

Providers and integrations

Domain calibration

Research

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance