InterpreLens: A Lens for Interpreting Large Language Models based on Transformers architecture.
Project description
inteprelens
inteprelens is the extracted core package for transformer architecture analysis.
It keeps the Causal Head Gating (CHG) workflow and adds tracing utilities for
named transformer stages and internal logits.
Overview
Use inteprelens when you want to:
- train CHG masks over attention heads
- inspect necessary, sufficient, and facilitating heads
- trace intermediate transformer states such as attention output projection inputs, block outputs, final norm states, and per-layer logits
- score final-token logits, log-probabilities, and probabilities for calibration-style analysis
- run gradient-based attribution through facilitating head circuits
This repo is the reusable core package only. It does not include downstream task pipelines or visualization workflows.
Installation
Install the runtime package:
pip install inteprelens
For local development:
uv sync --group dev
uv run pytest -q
For build and publish checks:
uv sync --group publish
uv run --group publish python -m build
uv run --group publish twine check dist/*
If you use a gated Hugging Face model such as meta-llama/Llama-3.2-1B, make sure
HF_TOKEN is available in your environment or .env.
Quick Start
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=["What is the capital of France?"],
targets=["Paris"],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
print(results.summary())
print(results.necessary_heads().head())
Usage Examples
CHG analysis
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=[
"The capital of France is",
"2 + 2 equals",
],
targets=[
"Paris",
"4",
],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
necessary = results.necessary_heads()
taxonomy = results.head_taxonomy()
print(necessary.head())
print(taxonomy.head())
Trace transformer stages and logits
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
trace = analyzer.trace(
texts="Paris is the capital of",
layers=[0],
sites=["attn_o_proj_pre", "final_norm", "logits"],
)
print(trace.get("attn_o_proj_pre", 0).shape)
print(trace.get("logits", 0).shape)
print(trace.final_logits.shape)
trace.final_logits contains the model's final output logits for the traced batch.
Score final-token logits, log-probabilities, and probabilities
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
scores = analyzer.score(
texts=[
"The capital of France is",
"2 + 2 equals",
],
temperature=1.0,
)
print(scores.logits.shape)
print(scores.log_probs.shape)
print(scores.probs.shape)
final_logits = scores.final_token_logits()
final_log_probs = scores.final_token_log_probs()
final_probs = scores.final_token_probs()
print(final_logits.shape)
print(final_log_probs.shape)
print(final_probs.shape)
Use logits as the canonical calibration output, derive log_probs for stable
token scoring, and use probs when you need confidence-style metrics.
Gradient-based attribution through facilitating heads
from inteprelens import CausalCircuitAttribution, LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=["The capital of France is"],
targets=["Paris"],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
facilitating_mask = results.get_facilitating_mask()
attribution = CausalCircuitAttribution(analyzer.model, analyzer.tokenizer)
sentence_scores = attribution.compute_sentence_importance(
document="Paris is the capital of France. It is one of Europe's largest cities.",
sentences=[
"Paris is the capital of France.",
"It is one of Europe's largest cities.",
],
summary="Paris is the capital of France.",
facilitating_mask=facilitating_mask,
)
print(sentence_scores)
Public API
LensAPI: high-level interface for CHG fitting, token scoring, and named-site tracingTransformerTracer: lower-level tracer for direct transformer-site collectionCausalCircuitAttribution: gradient attribution through facilitating CHG headsCHGDataset: helper for building CHG-ready datasets from prompt/target pairs
Acknowledgements
inteprelens builds on and adapts code and ideas from the
Causal Head Gating project.
The extracted core package keeps CHG support and extends it with transformer-stage
tracing and internal-logit inspection for architecture analysis workflows.
License
This project is released under the MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inteprelens-0.1.0.tar.gz.
File metadata
- Download URL: inteprelens-0.1.0.tar.gz
- Upload date:
- Size: 53.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
590b0b34f7846130f3737e7a8826ac130bfa9e8e7168aedf80ebe7dfe2391399
|
|
| MD5 |
cd7be8ec6f16d51678fb2aabc2c0ef7e
|
|
| BLAKE2b-256 |
3606a777c48588e568728d43dde05842c9dfeed619b9c08264f80ca9c15eb29e
|
Provenance
The following attestation bundles were made for inteprelens-0.1.0.tar.gz:
Publisher:
publish.yml on Aisuko/inteprelens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inteprelens-0.1.0.tar.gz -
Subject digest:
590b0b34f7846130f3737e7a8826ac130bfa9e8e7168aedf80ebe7dfe2391399 - Sigstore transparency entry: 1065751160
- Sigstore integration time:
-
Permalink:
Aisuko/inteprelens@b8a802257d2bcd17c690da7121de32a11f806043 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Aisuko
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b8a802257d2bcd17c690da7121de32a11f806043 -
Trigger Event:
release
-
Statement type:
File details
Details for the file inteprelens-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inteprelens-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb41bee98cf3aac6c7e2033ce6954d4bc48712ae9c78246c9c9d18ae9a40b4db
|
|
| MD5 |
650cbd8bb2211ad4f4c896f18c65b14c
|
|
| BLAKE2b-256 |
daed511453e2c0df8e8344af91484a550ebbdd7ad12c47512a13d7e7854c2dd2
|
Provenance
The following attestation bundles were made for inteprelens-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Aisuko/inteprelens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inteprelens-0.1.0-py3-none-any.whl -
Subject digest:
fb41bee98cf3aac6c7e2033ce6954d4bc48712ae9c78246c9c9d18ae9a40b4db - Sigstore transparency entry: 1065751164
- Sigstore integration time:
-
Permalink:
Aisuko/inteprelens@b8a802257d2bcd17c690da7121de32a11f806043 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Aisuko
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b8a802257d2bcd17c690da7121de32a11f806043 -
Trigger Event:
release
-
Statement type: