Load pre-trained lie-detection probes.
Project description
lie_detectors
Load the pre-trained lie-detection probes published at ai-safety-institute/lie-detection.
Install
uv add lie-detectors # or: pip install lie-detectors
Usage
import torch
from lie_detectors import get_probe
# Downloads the repo's default checkpoint (probe.pt = the best sweep result).
probe = get_probe("ai-safety-institute/dyl-qwen-qwen3.5-122b-a10b-fp8")
# Which model layer the probe reads activations from.
# Note e.g. the Unrelated Questions probe is instead trained on logprobs
print(probe.layer) # e.g. 28
# `activations` are residual-stream activations of shape (..., d_model) for the layer the
# probe was trained on.
scores = probe(activations) # higher score => more likely deceptive
flagged = scores > probe.threshold # threshold is calibrated to ~1% FPR
Pick a specific checkpoint from the hyperparameter sweep with filename= (see each repo's
sweep.json for the full list and their metrics):
probe = get_probe(
"ai-safety-institute/dyl-qwen-qwen3.5-122b-a10b-fp8",
filename="l_40_ar_mlp_wd_0_001_lr_0_0001_ep_100.pt",
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lie_detectors-0.1.0.tar.gz.
File metadata
- Download URL: lie_detectors-0.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaf638b9e16ff51c770a341354f26d9040c963c943a395af46a4913f181357ac
|
|
| MD5 |
ddb679bb521982d2c0d7651d2ed528c2
|
|
| BLAKE2b-256 |
161628b1f1fbc3fbc7a4ce94336c92b6ada58aa2ec3181bf20733bc54f66c9fa
|
Provenance
The following attestation bundles were made for lie_detectors-0.1.0.tar.gz:
Publisher:
release.yaml on UKGovernmentBEIS/lie_detectors
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lie_detectors-0.1.0.tar.gz -
Subject digest:
eaf638b9e16ff51c770a341354f26d9040c963c943a395af46a4913f181357ac - Sigstore transparency entry: 1721586508
- Sigstore integration time:
-
Permalink:
UKGovernmentBEIS/lie_detectors@88043087688308bc9cefe7cbdbf9b3806a679149 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/UKGovernmentBEIS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@88043087688308bc9cefe7cbdbf9b3806a679149 -
Trigger Event:
release
-
Statement type:
File details
Details for the file lie_detectors-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lie_detectors-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e97c2f97e35315479a85ad93d1bbd05ba09838c0aa3b68df3bdd801a0d3a175
|
|
| MD5 |
dd37075b80f8f25943387606ffcf2224
|
|
| BLAKE2b-256 |
f9730363af9b63659be21110e4f5704870cc76a7b01da0d3600f464945b25692
|
Provenance
The following attestation bundles were made for lie_detectors-0.1.0-py3-none-any.whl:
Publisher:
release.yaml on UKGovernmentBEIS/lie_detectors
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lie_detectors-0.1.0-py3-none-any.whl -
Subject digest:
7e97c2f97e35315479a85ad93d1bbd05ba09838c0aa3b68df3bdd801a0d3a175 - Sigstore transparency entry: 1721586628
- Sigstore integration time:
-
Permalink:
UKGovernmentBEIS/lie_detectors@88043087688308bc9cefe7cbdbf9b3806a679149 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/UKGovernmentBEIS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@88043087688308bc9cefe7cbdbf9b3806a679149 -
Trigger Event:
release
-
Statement type: