Skip to main content

Cartograph hallucination risk across an LLM's knowledge space

Project description

๐Ÿ—บ๏ธ hallucimap

Cartograph hallucination risk across an LLM's knowledge space

Python License: MIT CI Ruff Async Models


hallucimap doesn't just ask "did the model hallucinate?" โ€” it builds a persistent danger map showing exactly where a model confabulates across every domain, time period, and entity type it knows.


  science/physics      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘  0.82  โ—„ high risk zone
  history/wwii         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘  0.53
  medicine/anatomy     โ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  0.31
  finance/markets      โ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  0.19
  factual/mathematics  โ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  0.08  โ—„ well-calibrated

๐Ÿงญ What is this?

Most hallucination tools check a single output โ€” one prompt, one verdict. That tells you almost nothing about the model's systematic failure modes.

hallucimap takes a different approach:

  1. Probe the model across hundreds of questions spanning domains, time periods, and entity types
  2. Score each answer using consistency sampling โ€” ask the same question N times and measure how much the model contradicts itself
  3. Map the scores into a persistent RiskAtlas โ€” a 2-D grid of hallucination risk per knowledge cell
  4. Visualize the atlas as an interactive heatmap so you can instantly see the danger zones

The result is a reusable, persistent fingerprint of where a model is unreliable โ€” not just on today's test set, but structurally across its knowledge space.


โœจ Features

Feature Description
๐Ÿ” Consistency Sampling Ask the same question N times at temperature > 0. Low agreement = high risk.
๐Ÿ“ Factual Grounding Cross-check answers against known references to catch confident confabulation.
๐Ÿ—„๏ธ Persistent RiskAtlas JSON-serializable danger map that accumulates across multiple scan sessions.
๐ŸŒ Multi-Model OpenAI (GPT-4o), Anthropic (Claude 3.5+), or any local HuggingFace model.
โšก Async-First Fully non-blocking โ€” scans run concurrently with tunable parallelism.
๐Ÿ—บ๏ธ Interactive Heatmap Plotly HTML output โ€” hover for domain, risk score, confidence, and sample count.
๐Ÿ” Incremental Scans Load an existing atlas and extend it โ€” only probe what you haven't mapped yet.
๐Ÿ–ฅ๏ธ CLI hallucimap scan and hallucimap show โ€” batteries included.

๐Ÿš€ Quickstart

Install

pip install hallucimap

Scan a model

# OpenAI
export OPENAI_API_KEY=sk-...
hallucimap scan --model gpt-4o --domains science,history,medicine --samples 5

# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
hallucimap scan --model claude-3-5-sonnet-20241022 --domains law,finance --samples 5

Visualize the danger map

# Open interactive heatmap in browser
hallucimap show atlas_gpt-4o.json --browser

# Save as standalone HTML
hallucimap show atlas_gpt-4o.json --save map.html

Print a summary table

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ               Risk Atlas: gpt-4o                             โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Domain        โ”‚ Subdomain     โ”‚  Risk โ”‚ Confidence โ”‚ Samplesโ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ science       โ”‚ physics       โ”‚ 0.821 โ”‚       0.91 โ”‚     25 โ”‚
โ”‚ temporal      โ”‚ post_cutoff   โ”‚ 0.764 โ”‚       0.88 โ”‚     20 โ”‚
โ”‚ entity        โ”‚ person        โ”‚ 0.612 โ”‚       0.85 โ”‚     15 โ”‚
โ”‚ history       โ”‚ wwii          โ”‚ 0.534 โ”‚       0.83 โ”‚     15 โ”‚
โ”‚ medicine      โ”‚ pharmacology  โ”‚ 0.487 โ”‚       0.82 โ”‚     15 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
RiskAtlas(model=gpt-4o, cells=24, mean_risk=0.371, sessions=2)

๐Ÿ Python API

import asyncio
from hallucimap import RiskAtlas, HallucinationScorer
from hallucimap.models import AnthropicAdapter
from hallucimap.probes import DomainProbe, EntityProbe, TemporalProbe
from hallucimap.viz import HeatmapRenderer

async def main():
    # 1. Set up adapter + scorer
    adapter = AnthropicAdapter(model="claude-3-5-sonnet-20241022")
    scorer  = HallucinationScorer(adapter=adapter, n_samples=5, temperature=0.9)
    atlas   = RiskAtlas(model_id="claude-3-5-sonnet-20241022")

    # 2. Run probes across multiple domains
    probes = [
        DomainProbe(domain="science"),
        DomainProbe(domain="medicine"),
        EntityProbe(entity_type="person"),
        TemporalProbe(cutoff_year=2024),
    ]
    for probe in probes:
        results  = await probe.run_all(adapter, concurrency=10)
        questions = [(r.question, r.domain, r.subdomain) for r in results]
        references = [r.reference for r in results]
        scored   = await scorer.score_batch(questions, references=references)
        atlas.update(scored)

    # 3. Inspect the danger map
    print(atlas.summary())
    for cell in atlas.hottest_cells(n=5):
        print(f"  {cell.domain}/{cell.subdomain}  risk={cell.risk_score:.3f}")

    # 4. Persist + visualize
    atlas.save("atlas.json")
    HeatmapRenderer(atlas).save("atlas.html")   # standalone interactive HTML

asyncio.run(main())

๐Ÿ—๏ธ How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    questions     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   N completions  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Probes    โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  LLM Adapter  โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ HallucinationScorer โ”‚
โ”‚             โ”‚                  โ”‚  (async+retry)โ”‚                  โ”‚                     โ”‚
โ”‚ โ€ข Temporal  โ”‚                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚ consistency score   โ”‚
โ”‚ โ€ข Entity    โ”‚                                                      โ”‚ + grounding score   โ”‚
โ”‚ โ€ข Domain    โ”‚                                                      โ”‚ โ†’ risk_score [0,1]  โ”‚
โ”‚ โ€ข Factual   โ”‚                                                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                                 โ”‚
                                                                                โ–ผ
                                                                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                                     โ”‚      RiskAtlas      โ”‚
                                                                     โ”‚                     โ”‚
                                                                     โ”‚  domain/subdomain   โ”‚
                                                                     โ”‚  โ†’ AtlasCell        โ”‚
                                                                     โ”‚  (risk, confidence, โ”‚
                                                                     โ”‚   sample_count)     โ”‚
                                                                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                                                โ”‚
                                                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                              โ”‚                                โ”‚
                                                              โ–ผ                                โ–ผ
                                                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                   โ”‚  atlas.save()    โ”‚           โ”‚  HeatmapRenderer โ”‚
                                                   โ”‚  atlas.json      โ”‚           โ”‚  โ†’ atlas.html    โ”‚
                                                   โ”‚  (incremental)   โ”‚           โ”‚  (Plotly, hover) โ”‚
                                                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Scoring algorithm

The risk score for any (question, domain, subdomain) is:

consistency  =  mean pairwise similarity across N samples
grounding    =  token-F1 between response and reference answer (if known)

risk_score   =  ฮฑ ร— (1 โˆ’ consistency)  +  ฮฒ ร— (1 โˆ’ grounding)
             where ฮฑ = 0.7, ฮฒ = 0.3  (grounding term omitted if no reference)

Phase 2 will replace the token-F1 heuristic with sentence-transformer embeddings (all-MiniLM-L6-v2) for semantic consistency scoring.


๐Ÿ”ฌ Probe Types

FactualProbe โ€” calibration baseline

Tests unambiguous facts with known answers (capitals, atomic numbers, historical dates). A well-functioning model should score near 0 here; elevated scores flag systemic issues.

from hallucimap.probes import FactualProbe
probe = FactualProbe()
# โ†’ "What is the chemical symbol for water?"  ref: "H2O"
# โ†’ "How many sides does a hexagon have?"     ref: "6"

EntityProbe โ€” named entity knowledge

Probes biographical facts, founding dates, and key attributes of people, organizations, and places โ€” a classic hallucination flashpoint where models invent plausible-but-wrong details.

from hallucimap.probes import EntityProbe
probe = EntityProbe(entity_type="person")
# โ†’ "What year was Marie Curie born?"        ref: "1867"
# โ†’ "What university did Einstein attend?"   ref: "ETH Zurich"

DomainProbe โ€” deep domain knowledge

Targets specialized fields where overconfident confabulation is dangerous: biomedical, legal, financial, and scientific knowledge.

from hallucimap.probes import DomainProbe
probe = DomainProbe(domain="medicine", subdomain="pharmacology")
# โ†’ "What is the antidote for acetaminophen overdose?"  ref: "N-acetylcysteine"
# โ†’ "What are SSRIs used to treat?"                     ref: "depression, anxiety"

TemporalProbe โ€” post-cutoff events

Tests knowledge of events after the model's training cutoff. A well-calibrated model should hedge; a hallucinating one will invent confident but fabricated details.

from hallucimap.probes import TemporalProbe
probe = TemporalProbe(cutoff_year=2024, target_years=[2024, 2025])
# โ†’ "Who won the Nobel Prize in Physics in 2025?"
# โ†’ "What major AI models were released in 2025?"

๐Ÿค– Supported Models

Provider Models Adapter
OpenAI gpt-4o, gpt-4-turbo, gpt-3.5-turbo OpenAIAdapter
Anthropic claude-3-5-sonnet-20241022, claude-opus-4-6, claude-3-5-haiku-20241022 AnthropicAdapter
HuggingFace Any local causal LM (Llama, Mistral, Phiโ€ฆ) HFAdapter

All adapters share the same async interface โ€” swap models by changing one line.

# Swap from OpenAI to Anthropic โ€” nothing else changes
adapter = OpenAIAdapter(model="gpt-4o")
adapter = AnthropicAdapter(model="claude-3-5-sonnet-20241022")
adapter = HFAdapter(model="meta-llama/Llama-3-8B-Instruct", device="cuda")

๐Ÿ“ Project Structure

hallucimap/
โ”œโ”€โ”€ src/hallucimap/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ atlas.py        โ† RiskAtlas โ€” load / update / save / query
โ”‚   โ”‚   โ”œโ”€โ”€ scorer.py       โ† HallucinationScorer โ€” consistency + grounding
โ”‚   โ”‚   โ””โ”€โ”€ topology.py     โ† KnowledgeTopology โ€” 2-D PCA/UMAP projection
โ”‚   โ”œโ”€โ”€ probes/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py         โ† BaseProbe (abstract)
โ”‚   โ”‚   โ”œโ”€โ”€ temporal.py     โ† post-cutoff date facts
โ”‚   โ”‚   โ”œโ”€โ”€ entity.py       โ† named entities (people, orgs, places)
โ”‚   โ”‚   โ”œโ”€โ”€ domain.py       โ† domain knowledge (bio, law, finance)
โ”‚   โ”‚   โ””โ”€โ”€ factual.py      โ† verifiable factual claims
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py         โ† BaseLLMAdapter (abstract)
โ”‚   โ”‚   โ”œโ”€โ”€ openai_adapter.py
โ”‚   โ”‚   โ”œโ”€โ”€ anthropic_adapter.py
โ”‚   โ”‚   โ””โ”€โ”€ hf_adapter.py
โ”‚   โ”œโ”€โ”€ viz/
โ”‚   โ”‚   โ””โ”€โ”€ heatmap.py      โ† Plotly interactive heatmap renderer
โ”‚   โ”œโ”€โ”€ testing.py          โ† MockAdapter for downstream tests
โ”‚   โ””โ”€โ”€ cli.py              โ† hallucimap scan / hallucimap show
โ”œโ”€โ”€ tests/                  โ† 53 tests, 63% coverage
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ scan_gpt4o.py
โ”‚   โ””โ”€โ”€ scan_claude.py
โ””โ”€โ”€ .github/workflows/ci.yml

๐Ÿ› ๏ธ Development

git clone https://github.com/advait27/hallucimap.git
cd hallucimap
pip install -e ".[dev]"
# Lint
ruff check src tests

# Type-check
mypy src

# Tests with coverage
pytest

# Full CI check (lint + types + tests)
ruff check src tests && mypy src && pytest

๐Ÿ—“๏ธ Roadmap

Phase Status Description
0 โ€” Scaffold โœ… Done Package structure, Pydantic models, adapter stubs, CLI skeleton
1 โ€” Adapters โœ… Done OpenAI, Anthropic, HuggingFace adapters with retry logic
2 โ€” Scorer ๐Ÿ”ง Next Embedding-based consistency via all-MiniLM-L6-v2
3 โ€” Topology โณ Planned UMAP projection of knowledge space; semantic clustering
4 โ€” Probe Datasets โณ Planned TriviaQA, Wikidata, curated post-cutoff corpora
5 โ€” Heatmap v2 โณ Planned Topology-aware heatmap overlay; cluster annotations
6 โ€” CLI + Docs โณ Planned Rich progress bars, hallucimap summary, hosted docs
7 โ€” PyPI โณ Planned Publish to PyPI; versioned releases

๐Ÿ“„ License

MIT ยฉ Advait Dharmadhikari โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hallucimap-0.1.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hallucimap-0.1.0-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file hallucimap-0.1.0.tar.gz.

File metadata

  • Download URL: hallucimap-0.1.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for hallucimap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f165c3e942a389aec30a696307568ffd508a194a8a11cd4a88ed8ee0249e484f
MD5 c014c92d296d2f2a09c1c679111b430b
BLAKE2b-256 cf0ae183baba470a1f1b136cab29d1424c70724f5f07fecabf3e78bd289ea71c

See more details on using hashes here.

File details

Details for the file hallucimap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hallucimap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for hallucimap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5eb75a711d8646a6b20652d2163774d8f30125e1d0fc6f65647722e5778e87aa
MD5 f8d2711963379be08bd083a96df0562d
BLAKE2b-256 d9f3f8cd0181992cd3d47f2178623e2d287b4cace7702c3dc11d6534c2733b1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page