Skip to main content

InterpreLens: A Lens for Interpreting Large Language Models based on Transformers architecture.

Project description

inteprelens

inteprelens is the extracted core package for transformer architecture analysis. It keeps the Causal Head Gating (CHG) workflow and adds tracing utilities for named transformer stages and internal logits.

Overview

Use inteprelens when you want to:

  • train CHG masks over attention heads
  • inspect necessary, sufficient, and facilitating heads
  • trace intermediate transformer states such as attention output projection inputs, block outputs, final norm states, and per-layer logits
  • score final-token logits, log-probabilities, and probabilities for calibration-style analysis
  • run gradient-based attribution through facilitating head circuits

This repo is the reusable core package only. It does not include downstream task pipelines or visualization workflows.

Installation

Install the runtime package:

pip install inteprelens

For local development:

uv sync --group dev
uv run pytest -q

For build and publish checks:

uv sync --group publish
uv run --group publish python -m build
uv run --group publish twine check dist/*

If you use a gated Hugging Face model such as meta-llama/Llama-3.2-1B, make sure HF_TOKEN is available in your environment or .env.

Quick Start

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["What is the capital of France?"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

print(results.summary())
print(results.necessary_heads().head())

Usage Examples

CHG analysis

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    targets=[
        "Paris",
        "4",
    ],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

necessary = results.necessary_heads()
taxonomy = results.head_taxonomy()

print(necessary.head())
print(taxonomy.head())

Trace transformer stages and logits

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

trace = analyzer.trace(
    texts="Paris is the capital of",
    layers=[0],
    sites=["attn_o_proj_pre", "final_norm", "logits"],
)

print(trace.get("attn_o_proj_pre", 0).shape)
print(trace.get("logits", 0).shape)
print(trace.final_logits.shape)

trace.final_logits contains the model's final output logits for the traced batch.

Score final-token logits, log-probabilities, and probabilities

from inteprelens import LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

scores = analyzer.score(
    texts=[
        "The capital of France is",
        "2 + 2 equals",
    ],
    temperature=1.0,
)

print(scores.logits.shape)
print(scores.log_probs.shape)
print(scores.probs.shape)

final_logits = scores.final_token_logits()
final_log_probs = scores.final_token_log_probs()
final_probs = scores.final_token_probs()

print(final_logits.shape)
print(final_log_probs.shape)
print(final_probs.shape)

Use logits as the canonical calibration output, derive log_probs for stable token scoring, and use probs when you need confidence-style metrics.

Gradient-based attribution through facilitating heads

from inteprelens import CausalCircuitAttribution, LensAPI

analyzer = LensAPI.from_pretrained("meta-llama/Llama-3.2-1B")

results = analyzer.fit(
    texts=["The capital of France is"],
    targets=["Paris"],
    num_masks=1,
    num_updates=1,
    num_reg_updates=1,
    batch_size=1,
    verbose=False,
)

facilitating_mask = results.get_facilitating_mask()
attribution = CausalCircuitAttribution(analyzer.model, analyzer.tokenizer)

sentence_scores = attribution.compute_sentence_importance(
    document="Paris is the capital of France. It is one of Europe's largest cities.",
    sentences=[
        "Paris is the capital of France.",
        "It is one of Europe's largest cities.",
    ],
    summary="Paris is the capital of France.",
    facilitating_mask=facilitating_mask,
)

print(sentence_scores)

Public API

  • LensAPI: high-level interface for CHG fitting, token scoring, and named-site tracing
  • TransformerTracer: lower-level tracer for direct transformer-site collection
  • CausalCircuitAttribution: gradient attribution through facilitating CHG heads
  • CHGDataset: helper for building CHG-ready datasets from prompt/target pairs

Acknowledgements

inteprelens builds on and adapts code and ideas from the Causal Head Gating project. The extracted core package keeps CHG support and extends it with transformer-stage tracing and internal-logit inspection for architecture analysis workflows.

License

This project is released under the MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inteprelens-0.1.0.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inteprelens-0.1.0-py3-none-any.whl (59.7 kB view details)

Uploaded Python 3

File details

Details for the file inteprelens-0.1.0.tar.gz.

File metadata

  • Download URL: inteprelens-0.1.0.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inteprelens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 590b0b34f7846130f3737e7a8826ac130bfa9e8e7168aedf80ebe7dfe2391399
MD5 cd7be8ec6f16d51678fb2aabc2c0ef7e
BLAKE2b-256 3606a777c48588e568728d43dde05842c9dfeed619b9c08264f80ca9c15eb29e

See more details on using hashes here.

Provenance

The following attestation bundles were made for inteprelens-0.1.0.tar.gz:

Publisher: publish.yml on Aisuko/inteprelens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file inteprelens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inteprelens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inteprelens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb41bee98cf3aac6c7e2033ce6954d4bc48712ae9c78246c9c9d18ae9a40b4db
MD5 650cbd8bb2211ad4f4c896f18c65b14c
BLAKE2b-256 daed511453e2c0df8e8344af91484a550ebbdd7ad12c47512a13d7e7854c2dd2

See more details on using hashes here.

Provenance

The following attestation bundles were made for inteprelens-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Aisuko/inteprelens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page