Skip to main content

Hallucination neuron discovery and causal validation for transformer LLMs

Project description

hprobe

Discover and causally validate hallucination-associated FFN neurons (H-Neurons) in transformer LLMs.

Based on arXiv:2512.01797.

Install

pip install hprobe
# or
uv add hprobe

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
from hprobe import HProbe

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it")

# samples: list of dicts with question, options, answer
probe = HProbe(model, tokenizer)
probe.fit(samples, options_key="choices", answer_key="answer")

print(probe.n_neurons_, probe.layer_distribution_)

results = probe.score()
print(f"AUROC {results['auroc']:.3f}  gap {results['auroc_gap']:+.3f}")

probe.causal_validate()

CLI

# Fit and score on an MCQ dataset
hprobe run --model google/gemma-3-4b-it --data dataset.jsonl --samples 500

# Transfer: score a saved probe on a different model
hprobe transfer --probe results/probe --model google/gemma-3-4b --data dataset.jsonl

# Fit from pre-generated responses with judge labels
hprobe responses --model google/gemma-3-4b-it --data responses.jsonl

Supported formats

Input files: .jsonl, .json, .parquet

Auto-detected dataset formats: mmlu, medqa, medmcqa. Any other format works by passing options_key and answer_key directly.

Key options

Parameter Default Description
l1_C 0.01 Inverse L1 strength — lower = fewer neurons
contrastive True 3-vs-1 labeling at the generated answer token
layer_stride 1 Sample every Nth layer (2 = faster)
validation_split 0.2 Holdout fraction for scoring
max_tokens 1024 Truncation length

Save & load

probe.save("results/gemma_medqa")          # writes .json + .pkl
probe = HProbe.load("results/gemma_medqa", model, tokenizer)
probe.score_on(new_samples, options_key="choices", answer_key="answer")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hprobes-0.4.0.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hprobes-0.4.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file hprobes-0.4.0.tar.gz.

File metadata

  • Download URL: hprobes-0.4.0.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hprobes-0.4.0.tar.gz
Algorithm Hash digest
SHA256 88ea353fb3ce873876a2b345031a30c265b5bf8477d286a5682659c1b76789ec
MD5 40748692f562544313b818f19eb8aa5e
BLAKE2b-256 e1014bfce2d933a8255bfb9693075d249741f6215f60e93a3285448df40cc7ab

See more details on using hashes here.

File details

Details for the file hprobes-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: hprobes-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hprobes-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e800ed6a7f11858e03ce45438828a002b6c168c43f0daf72da91820c0752ff17
MD5 3d3d29bf50485d40267ed00b177022d8
BLAKE2b-256 334650b8fd4e39d3a35ba93e5d1cf8ed378aaffa6c19eab601fa2745ce0cd47f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page