Skip to main content

Load pre-trained lie-detection probes.

Project description

lie_detectors

Load the pre-trained lie-detection probes published at ai-safety-institute/lie-detection.

Install

uv add lie-detectors        # or: pip install lie-detectors

Usage

import torch
from lie_detectors import get_probe

# Downloads the repo's default checkpoint (probe.pt = the best sweep result).
probe = get_probe("ai-safety-institute/dyl-qwen-qwen3.5-122b-a10b-fp8")

# Which model layer the probe reads activations from.
# Note e.g. the Unrelated Questions probe is instead trained on logprobs
print(probe.layer)  # e.g. 28

# `activations` are residual-stream activations of shape (..., d_model) for the layer the
# probe was trained on.
scores = probe(activations)        # higher score => more likely deceptive
flagged = scores > probe.threshold # threshold is calibrated to ~1% FPR

Pick a specific checkpoint from the hyperparameter sweep with filename= (see each repo's sweep.json for the full list and their metrics):

probe = get_probe(
    "ai-safety-institute/dyl-qwen-qwen3.5-122b-a10b-fp8",
    filename="l_40_ar_mlp_wd_0_001_lr_0_0001_ep_100.pt",
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lie_detectors-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lie_detectors-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file lie_detectors-0.1.0.tar.gz.

File metadata

  • Download URL: lie_detectors-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lie_detectors-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eaf638b9e16ff51c770a341354f26d9040c963c943a395af46a4913f181357ac
MD5 ddb679bb521982d2c0d7651d2ed528c2
BLAKE2b-256 161628b1f1fbc3fbc7a4ce94336c92b6ada58aa2ec3181bf20733bc54f66c9fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for lie_detectors-0.1.0.tar.gz:

Publisher: release.yaml on UKGovernmentBEIS/lie_detectors

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lie_detectors-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lie_detectors-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lie_detectors-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e97c2f97e35315479a85ad93d1bbd05ba09838c0aa3b68df3bdd801a0d3a175
MD5 dd37075b80f8f25943387606ffcf2224
BLAKE2b-256 f9730363af9b63659be21110e4f5704870cc76a7b01da0d3600f464945b25692

See more details on using hashes here.

Provenance

The following attestation bundles were made for lie_detectors-0.1.0-py3-none-any.whl:

Publisher: release.yaml on UKGovernmentBEIS/lie_detectors

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page