InterpreLens: A Lens for Interpreting Large Language Models based on Transformers architecture.
Project description
inteprelens
inteprelens is the extracted core package for transformer architecture analysis.
It keeps the Causal Head Gating (CHG) workflow and adds tracing utilities for
named transformer stages and internal logits.
Overview
Use inteprelens when you want to:
- train CHG masks over attention heads
- inspect necessary, sufficient, and facilitating heads
- trace intermediate transformer states such as attention output projection inputs, block outputs, final norm states, and per-layer logits
- score final-token logits, log-probabilities, and probabilities for calibration-style analysis
- run gradient-based attribution through facilitating head circuits
- probe gradients and kwargs at any layer via context-manager hook utilities
This repo is the reusable core package only. It does not include downstream task pipelines or visualization workflows.
Installation
Show
Install the runtime package:
pip install inteprelens
For local development:
uv sync --group dev
uv run pytest -q
For build and publish checks:
uv sync --group publish
uv run --group publish python -m build
uv run --group publish twine check dist/*
If you use a gated Hugging Face model such as meta-llama/Llama-3.2-1B, make sure
HF_TOKEN is available in your environment or .env.
Quick Start
Show
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=["What is the capital of France?"],
targets=["Paris"],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
print(results.summary())
print(results.necessary_heads().head())
Usage Examples
Show all examples
CHG analysis
Show
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=[
"The capital of France is",
"2 + 2 equals",
],
targets=[
"Paris",
"4",
],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
necessary = results.necessary_heads()
taxonomy = results.head_taxonomy()
print(necessary.head())
print(taxonomy.head())
Trace transformer stages and logits
Show
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
trace = analyzer.trace(
texts="Paris is the capital of",
layers=[0],
sites=["attn_o_proj_pre", "final_norm", "logits"],
)
print(trace.get("attn_o_proj_pre", 0).shape)
print(trace.get("logits", 0).shape)
print(trace.final_logits.shape)
trace.final_logits contains the model's final output logits for the traced batch.
Score final-token logits, log-probabilities, and probabilities
Show
from inteprelens import LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
scores = analyzer.score(
texts=[
"The capital of France is",
"2 + 2 equals",
],
temperature=1.0,
)
print(scores.logits.shape)
print(scores.log_probs.shape)
print(scores.probs.shape)
final_logits = scores.final_token_logits()
final_log_probs = scores.final_token_log_probs()
final_probs = scores.final_token_probs()
print(final_logits.shape)
print(final_log_probs.shape)
print(final_probs.shape)
Use logits as the canonical calibration output, derive log_probs for stable
token scoring, and use probs when you need confidence-style metrics.
Gradient-based attribution through facilitating heads
Show
from inteprelens import CausalCircuitAttribution, LensAPI
analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")
results = analyzer.fit(
texts=["The capital of France is"],
targets=["Paris"],
num_masks=1,
num_updates=1,
num_reg_updates=1,
batch_size=1,
verbose=False,
)
facilitating_mask = results.get_facilitating_mask()
attribution = CausalCircuitAttribution(analyzer.model, analyzer.tokenizer)
sentence_scores = attribution.compute_sentence_importance(
document="Paris is the capital of France. It is one of Europe's largest cities.",
sentences=[
"Paris is the capital of France.",
"It is one of Europe's largest cities.",
],
summary="Paris is the capital of France.",
facilitating_mask=facilitating_mask,
)
print(sentence_scores)
Hook utilities
Show
All hooks are context managers — handles are removed automatically on exit.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from inteprelens.models.adapters import get_adapter
from inteprelens import capture_gradients, capture_kwargs_hook, tensor_grad_hook
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B", torch_dtype=torch.bfloat16
).cuda()
adapter = get_adapter(model)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
inputs = tokenizer("The Eiffel Tower is in", return_tensors="pt").to("cuda")
target_id = tokenizer(" Paris", add_special_tokens=False)["input_ids"][0]
# 1. capture_gradients — grad_output flowing through attention layer 0
with capture_gradients(adapter.get_attention_module(0), site="output") as grads:
loss = -model(**inputs).logits[0, -1, target_id]
loss.backward()
print(grads[0][0].shape) # [batch, seq, hidden]
# 2. capture_kwargs_hook — inspect attention_mask kwargs at layer 0
with capture_kwargs_hook(adapter.get_layers()[0]) as kw:
with torch.no_grad():
model(**inputs)
print(kw[0].get("attention_mask"))
# 3. tensor_grad_hook — gradient probe on MLP activation
act_ref = []
h = adapter.get_mlp(0).register_forward_hook(lambda m, i, o: act_ref.append(o))
out = model(**inputs)
h.remove()
with tensor_grad_hook(lambda: act_ref[0]) as grads:
loss = -out.logits[0, -1, target_id]
loss.backward()
print(grads[0].shape) # [batch, seq, hidden]
Public API
Show
Classes
| Class | Description |
|---|---|
LensAPI |
High-level interface: CHG fitting, token scoring, named-site tracing. |
TransformerTracer |
Lower-level tracer for direct transformer-site activation capture. |
CausalCircuitAttribution |
Gradient attribution through facilitating CHG heads. |
CHGDataset |
Helper for building CHG-ready datasets from prompt/target pairs. |
Hook utilities
All functions are context managers exported from inteprelens directly.
| Function | PyTorch API | Use case |
|---|---|---|
capture_gradients |
register_full_backward_hook |
Capture grad_input / grad_output per module during backward |
capture_kwargs_hook |
register_forward_pre_hook(with_kwargs=True) |
Inspect/modify kwargs (attention_mask, etc.) before forward (≥ PyTorch 2.0) |
tensor_grad_hook |
Tensor.register_hook |
Gradient probe on any activation tensor |
autograd_node_hook |
autograd.graph.Node.register_hook |
Low-level autograd graph traversal during backward (≥ PyTorch 2.0) |
capture_backward_pre |
register_module_full_backward_pre_hook |
Capture grad_output entering a module before backward executes (≥ PyTorch 2.1) |
Acknowledgements
Show
inteprelens builds on and adapts code and ideas from the
Causal Head Gating project.
The extracted core package keeps CHG support and extends it with transformer-stage
tracing and internal-logit inspection for architecture analysis workflows.
License
This project is released under the MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inteprelens-0.2.1.tar.gz.
File metadata
- Download URL: inteprelens-0.2.1.tar.gz
- Upload date:
- Size: 59.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c77be51fa4fca0033637f9d9f4ead1f22fccd1e994a6c4f14355e15fdcbf372f
|
|
| MD5 |
5998a20c3fbf4f768fc3b28e2fcdf5d0
|
|
| BLAKE2b-256 |
320fe86a3c79de65f85de205344305011ccd6b03ef0164ee56cd58c47a8ced28
|
Provenance
The following attestation bundles were made for inteprelens-0.2.1.tar.gz:
Publisher:
publish.yml on Aisuko/inteprelens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inteprelens-0.2.1.tar.gz -
Subject digest:
c77be51fa4fca0033637f9d9f4ead1f22fccd1e994a6c4f14355e15fdcbf372f - Sigstore transparency entry: 1238660294
- Sigstore integration time:
-
Permalink:
Aisuko/inteprelens@d6cc3533a2471497d05f661efc09cb85c1fa50bb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/Aisuko
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6cc3533a2471497d05f661efc09cb85c1fa50bb -
Trigger Event:
release
-
Statement type:
File details
Details for the file inteprelens-0.2.1-py3-none-any.whl.
File metadata
- Download URL: inteprelens-0.2.1-py3-none-any.whl
- Upload date:
- Size: 64.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9abb800e5f40aaa794c3b056a75eb52e8984a6cfe5b6bfa7f3748e795f10a35
|
|
| MD5 |
f931c7f2601663839918961830c5a260
|
|
| BLAKE2b-256 |
dcb0c1e8fe4f42ccc7501ff0309dcd3588c3b56222d59717795a858f90278eb8
|
Provenance
The following attestation bundles were made for inteprelens-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on Aisuko/inteprelens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inteprelens-0.2.1-py3-none-any.whl -
Subject digest:
c9abb800e5f40aaa794c3b056a75eb52e8984a6cfe5b6bfa7f3748e795f10a35 - Sigstore transparency entry: 1238660302
- Sigstore integration time:
-
Permalink:
Aisuko/inteprelens@d6cc3533a2471497d05f661efc09cb85c1fa50bb -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/Aisuko
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6cc3533a2471497d05f661efc09cb85c1fa50bb -
Trigger Event:
release
-
Statement type: