InterpreLens: A Lens for Interpreting Large Language Models based on Transformers architecture.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

inteprelens

inteprelens is the extracted core package for transformer architecture analysis. It keeps the Causal Head Gating (CHG) workflow and adds tracing utilities for named transformer stages and internal logits.

Overview

Use inteprelens when you want to:

train CHG masks over attention heads
inspect necessary, sufficient, and facilitating heads
trace intermediate transformer states such as attention output projection inputs, block outputs, final norm states, and per-layer logits
score final-token logits, log-probabilities, and probabilities for calibration-style analysis
run gradient-based attribution through facilitating head circuits
probe gradients and kwargs at any layer via context-manager hook utilities

This repo is the reusable core package only. It does not include downstream task pipelines or visualization workflows.

Installation

Show

Install the runtime package:

pip install inteprelens

For local development:

uv sync --group dev
uv run pytest -q

For build and publish checks:

uv sync --group publish
uv run --group publish python -m build
uv run --group publish twine check dist/*

If you use a gated Hugging Face model such as meta-llama/Llama-3.2-1B, make sure HF_TOKEN is available in your environment or .env.

Quick Start

Show

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["What is the capital of France?"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

print(results.summary())
print(results.necessary_heads().head())

Usage Examples

Show all examples

CHG analysis

Show

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    targets=[
        "Paris",
        "4",
    ],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

necessary = results.necessary_heads()
taxonomy = results.head_taxonomy()

print(necessary.head())
print(taxonomy.head())

Trace transformer stages and logits

Show

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

trace = analyzer.trace(
    texts="Paris is the capital of",
    layers=[0],
    sites=["attn_o_proj_pre", "final_norm", "logits"],
)

print(trace.get("attn_o_proj_pre", 0).shape)
print(trace.get("logits", 0).shape)
print(trace.final_logits.shape)

trace.final_logits contains the model's final output logits for the traced batch.

Score final-token logits, log-probabilities, and probabilities

Show

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

scores = analyzer.score(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    temperature=1.0,
)

print(scores.logits.shape)
print(scores.log_probs.shape)
print(scores.probs.shape)

final_logits = scores.final_token_logits()
final_log_probs = scores.final_token_log_probs()
final_probs = scores.final_token_probs()

print(final_logits.shape)
print(final_log_probs.shape)
print(final_probs.shape)

Use logits as the canonical calibration output, derive log_probs for stable token scoring, and use probs when you need confidence-style metrics.

Gradient-based attribution through facilitating heads

Show

from inteprelens import CausalCircuitAttribution, LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["The capital of France is"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

facilitating_mask = results.get_facilitating_mask()
attribution = CausalCircuitAttribution(analyzer.model, analyzer.tokenizer)

sentence_scores = attribution.compute_sentence_importance(
    document="Paris is the capital of France. It is one of Europe's largest cities.",
    sentences=[
        "Paris is the capital of France.",
        "It is one of Europe's largest cities.",
    ],
    summary="Paris is the capital of France.",
    facilitating_mask=facilitating_mask,
)

print(sentence_scores)

Hook utilities

Show

All hooks are context managers — handles are removed automatically on exit.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from inteprelens.models.adapters import get_adapter
from inteprelens import capture_gradients, capture_kwargs_hook, tensor_grad_hook

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B", torch_dtype=torch.bfloat16
).cuda()
adapter  = get_adapter(model)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
inputs = tokenizer("The Eiffel Tower is in", return_tensors="pt").to("cuda")
target_id = tokenizer(" Paris", add_special_tokens=False)["input_ids"][0]

# 1. capture_gradients — grad_output flowing through attention layer 0
with capture_gradients(adapter.get_attention_module(0), site="output") as grads:
    loss = -model(**inputs).logits[0, -1, target_id]
    loss.backward()
print(grads[0][0].shape)   # [batch, seq, hidden]

# 2. capture_kwargs_hook — inspect attention_mask kwargs at layer 0
with capture_kwargs_hook(adapter.get_layers()[0]) as kw:
    with torch.no_grad():
        model(**inputs)
print(kw[0].get("attention_mask"))

# 3. tensor_grad_hook — gradient probe on MLP activation
act_ref = []
h = adapter.get_mlp(0).register_forward_hook(lambda m, i, o: act_ref.append(o))
out = model(**inputs)
h.remove()
with tensor_grad_hook(lambda: act_ref[0]) as grads:
    loss = -out.logits[0, -1, target_id]
    loss.backward()
print(grads[0].shape)   # [batch, seq, hidden]

Public API

Show

Classes

Class	Description
`LensAPI`	High-level interface: CHG fitting, token scoring, named-site tracing.
`TransformerTracer`	Lower-level tracer for direct transformer-site activation capture.
`CausalCircuitAttribution`	Gradient attribution through facilitating CHG heads.
`CHGDataset`	Helper for building CHG-ready datasets from prompt/target pairs.

Hook utilities

All functions are context managers exported from inteprelens directly.

Function	PyTorch API	Use case
`capture_gradients`	`register_full_backward_hook`	Capture `grad_input` / `grad_output` per module during backward
`capture_kwargs_hook`	`register_forward_pre_hook(with_kwargs=True)`	Inspect/modify kwargs (`attention_mask`, etc.) before forward (≥ PyTorch 2.0)
`tensor_grad_hook`	`Tensor.register_hook`	Gradient probe on any activation tensor
`autograd_node_hook`	`autograd.graph.Node.register_hook`	Low-level autograd graph traversal during backward (≥ PyTorch 2.0)
`capture_backward_pre`	`register_module_full_backward_pre_hook`	Capture `grad_output` entering a module before backward executes (≥ PyTorch 2.1)

Acknowledgements

Show

inteprelens builds on and adapts code and ideas from the Causal Head Gating project. The extracted core package keeps CHG support and extends it with transformer-stage tracing and internal-logit inspection for architecture analysis workflows.

License

This project is released under the MIT License. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Aisuko

Release history Release notifications | RSS feed

This version

0.2.1

Apr 5, 2026

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inteprelens-0.2.1.tar.gz (59.6 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inteprelens-0.2.1-py3-none-any.whl (64.9 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file inteprelens-0.2.1.tar.gz.

File metadata

Download URL: inteprelens-0.2.1.tar.gz
Upload date: Apr 5, 2026
Size: 59.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inteprelens-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`c77be51fa4fca0033637f9d9f4ead1f22fccd1e994a6c4f14355e15fdcbf372f`
MD5	`5998a20c3fbf4f768fc3b28e2fcdf5d0`
BLAKE2b-256	`320fe86a3c79de65f85de205344305011ccd6b03ef0164ee56cd58c47a8ced28`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inteprelens-0.2.1.tar.gz:

Publisher: publish.yml on Aisuko/inteprelens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inteprelens-0.2.1.tar.gz
- Subject digest: c77be51fa4fca0033637f9d9f4ead1f22fccd1e994a6c4f14355e15fdcbf372f
- Sigstore transparency entry: 1238660294
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: Aisuko/inteprelens@d6cc3533a2471497d05f661efc09cb85c1fa50bb
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/Aisuko
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6cc3533a2471497d05f661efc09cb85c1fa50bb
- Trigger Event: release

File details

Details for the file inteprelens-0.2.1-py3-none-any.whl.

File metadata

Download URL: inteprelens-0.2.1-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 64.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inteprelens-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9abb800e5f40aaa794c3b056a75eb52e8984a6cfe5b6bfa7f3748e795f10a35`
MD5	`f931c7f2601663839918961830c5a260`
BLAKE2b-256	`dcb0c1e8fe4f42ccc7501ff0309dcd3588c3b56222d59717795a858f90278eb8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for inteprelens-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Aisuko/inteprelens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: inteprelens-0.2.1-py3-none-any.whl
- Subject digest: c9abb800e5f40aaa794c3b056a75eb52e8984a6cfe5b6bfa7f3748e795f10a35
- Sigstore transparency entry: 1238660302
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: Aisuko/inteprelens@d6cc3533a2471497d05f661efc09cb85c1fa50bb
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/Aisuko
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6cc3533a2471497d05f661efc09cb85c1fa50bb
- Trigger Event: release

inteprelens 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

inteprelens

Overview

Installation

Quick Start

Usage Examples

CHG analysis

Trace transformer stages and logits

Score final-token logits, log-probabilities, and probabilities

Gradient-based attribution through facilitating heads

Hook utilities

Public API

Classes

Hook utilities

Acknowledgements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance