Skip to main content

Hallucination neuron discovery and causal validation for transformer LLMs

Project description

hprobes

Docs DeepWiki

Discover and causally validate hallucination-associated FFN neurons (H-Neurons) in transformer LLMs.

Based on arXiv:2512.01797.

Install

pip install hprobes
# or
uv add hprobes

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
from hprobes import HProbe

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it")

# samples: list of dicts with question, options, answer
probe = HProbe(model, tokenizer)
probe.fit(samples, options_key="choices", answer_key="answer")

print(probe.n_neurons_, probe.layer_distribution_)

results = probe.score()
print(f"AUROC {results['auroc']:.3f}  gap {results['auroc_gap']:+.3f}")

probe.causal_validate()

CLI

# Fit and score on an MCQ dataset
hprobes run --model google/gemma-3-4b-it --data dataset.jsonl --samples 500

# Transfer: score a saved probe on a different model
hprobes transfer --probe results/probe --model google/gemma-3-4b --data dataset.jsonl

# Fit from pre-generated responses with judge labels
hprobes responses --model google/gemma-3-4b-it --data responses.jsonl

Supported formats

Input files: .jsonl, .json, .parquet

Auto-detected dataset formats: mmlu, medqa, medmcqa. Any other format works by passing options_key and answer_key directly.

Key options

Parameter Default Description
l1_C 0.01 Inverse L1 strength — lower = fewer neurons
contrastive True 3-vs-1 labeling at the generated answer token
layer_stride 1 Sample every Nth layer (2 = faster)
validation_split 0.2 Holdout fraction for scoring
max_tokens 1024 Truncation length

Save & load

probe.save("results/gemma_medqa")          # writes .json + .pkl
probe = HProbe.load("results/gemma_medqa", model, tokenizer)
probe.score_on(new_samples, options_key="choices", answer_key="answer")

Acknowledgements

This research is conducted in collaboration with the Great Ormond Street Hospital DRIVE Unit.

Contributors

  • Huseyin Cavus — Core Contributor
  • Dr. Pavithra Rajendran — Machine Learning Lead, GOSH DRIVE
  • Sebin Sabu — Senior AI Scientist, GOSH DRIVE
  • Jaskaran Singh Kawatra — ML Engineer, GOSH DRIVE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hprobes-0.5.4.tar.gz (177.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hprobes-0.5.4-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file hprobes-0.5.4.tar.gz.

File metadata

  • Download URL: hprobes-0.5.4.tar.gz
  • Upload date:
  • Size: 177.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hprobes-0.5.4.tar.gz
Algorithm Hash digest
SHA256 7bb4bc59eceda50973f7c7dee7d0f79518a8647f798baef65355aeddd52bfd1c
MD5 a5ae20b5c7b5a0c2d8e0521b9e1b402c
BLAKE2b-256 a805039427254efa0ffbd8dda2987e9cac5632a1a611d3471fdb968ab6415dce

See more details on using hashes here.

File details

Details for the file hprobes-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: hprobes-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hprobes-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 648c3d0527cd8cba0eb8bcbc64d95626ff6bb66223e6e441dd6b02302b45498c
MD5 d777330c33eda964aaedc213cc181730
BLAKE2b-256 a1dcfc4b19ee21c33c060ca0f8dfdabbb5396c195bceb8e0aabb331cd16a29ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page