Implementation of IPE: Isolating Path Effect for Latent Circuit Identification. This module implements an interface to handle complex patching of transformer nodes, a path-wise backpropagation algorithm and various graph search strategies to identify task specific circuits in transformer models.

These details have not been verified by PyPI

Project links

Project description

Quickstart Guide: Isolating Path Effect for Latent Circuit Discovery

Resources

Full documentation and API reference is hosted at: https://ipe-documentation.ceru-sh.site/

Running visualization server is hosted at: https://path-visualizer.ceru-sh.site/

Brief Background

What are we looking for?

This library provides a set of functions for discovering circuits within Large Language Models (LLMs). In the context of mechanistic interpretability, a circuit refers to the specific components and connections in a neural network that are responsible for a particular behavior or computation. Circuit discovery aims to isolate and understand these pathways, helping researchers explain how models arrive at their predictions.

By identifying circuits, we can gain insights into the internal mechanisms of LLMs, improve model transparency, and potentially guide model editing or debugging.

The holy grail of circuit discovery would be to isolate the subnetwork components responsible for harmful behavior, from allucinations, to offensive language. This would ideally provide a strong barrier against LLMs risks.

Why looking for paths?

Recent advances in circuit discovery, such as Edge Attribution Patching and ACDC (Automated Circuit DisCovery), focus on identifying the specific edges—connections between neurons or attention heads—that contribute to a model's behavior. These methods systematically intervene on model components to measure their causal impact, allowing researchers to map out the computational graph underlying a prediction.

Transformer models, like those used in LLMs, are naturally structured as computational trees: each output can be traced back through a series of operations and connections. By "unrolling" these trees, we can follow individual paths from input to output, attributing behavior to specific sequences of computations.

Empirical observation seem to suggests that paths often correspond to single behaviors—for example, a particular reasoning step or token prediction. While superposition (multiple behaviors sharing the same parameters) is a known challenge in neural networks, path-based analysis frequently reveals that certain behaviors are localized to distinct computational routes. This supports the claim that isolating paths can help us understand and manipulate specific model behaviors, even if some degree of superposition remains.

Another interesting advantage of looking for paths is that they highlight the flows of specific information which is often interpretable with simple decoding methods like logit lens

Usage

Installation

Via pip

To install the library using pip you can run:

pip install ipe

Via git repository

To install the library from git you first need to download the repository:

git clone https://github.com/andreac01/IPE-LatentCircuitIdentification.git if using https or git@github.com:andreac01/IPE-LatentCircuitIdentification.git if using ssh

Then open the cloned repository, finally to install the package run:

pip install -e .

Now you should be able to import the package in any script you want by writing import ipe at the top of your python file

Models

The code in this repository is compatible with models from transformer-lens which is built on top of PyTorch. Here we provide a simple example of how you could load a model using the library. For more details consult the official transformer-lens documentation

%load_ext autoreload
%autoreload 2

import torch
from transformer_lens import HookedTransformer

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HookedTransformer.from_pretrained('gpt2-small', device=device, torch_dtype=torch.float32, center_unembed=True)

Loaded pretrained model gpt2-small into HookedTransformer

Experiment Setup

The class ExperimentManager can be used to find the most contributing, visualize and analyze them. The ExperimentManager have many parameters, however for simplicity most of them have default value. You can consult our API reference for a more complete brakdown of them.

The key parameters are:

model: the transformer-lens model to be studied
prompts: a list of prompts, we suggest to use small batches to avoid excessive memory consumption
targets: the list of target completions of the prompt
cf_prompts / cf_targets: optional parameters to be provided when using counterfactual
metric: the string corresponding to the metric function to be used. Options are 'target_logit_percentage', 'target_probability_percentage', 'logit_difference', 'kl_divergence', or 'indirect_effect'. Defaults to 'target_logit_percentage'
algorithm: the search algorithm to be used. Options are 'PathAttributionPatching' or 'IsolatingPathEffect'. Defaults to 'PathAttributionPatching'.
search_strategy: The search strategy to use within the chosen algorithm. Options are 'Threshold', 'BestFirstSearch', or 'LimitedLevelWidth'. Defaults to 'BestFirstSearch'.

Furthermore you can pass custom parameters to initialize the metric and/or algorithm by passing them as a dictionary. You can consult the API reference to see which parameters they expect.

from ipe.experiment import ExperimentManager

experiment = ExperimentManager(
    model=model,
    prompts=['When John and Mary went to the shops. John gave the bag to'],
    targets=[' Mary'],
    cf_prompts=['When John and Mary went to the shops. Mary gave the bag to'], # Not required, default is []
    cf_targets=[' John'], # Not required, default is []
    positional_search=True, # Not required, default is False
    metric='target_logit_percentage', # Not required, default is 'target_logit_percentage'
    metric_params={}, # Not required, default is {}
    patch_type='zero', # Not required, default is 'auto' which uses the counterfactual prompt if provided. Zero does not use the counterfactual prompt
    # Other params...
)

WARNING: [load_metric] Using ExperimentManager attribute for 'model': (value too long to display)
WARNING: [load_metric] Using ExperimentManager attribute for 'clean_final_resid': (value too long to display)
WARNING: [load_metric] Using ExperimentManager attribute for 'target_tokens': [5335]
WARNING: [load_algorithm] Overriding provided metric with the one specified in the algorithm parameters.
WARNING: [load_algorithm] Overriding provided model with the one specified in the algorithm parameters.
WARNING: [load_algorithm] Overriding provided root with the one specified in the algorithm parameters.
WARNING: [load_algorithm] Using default parameter for 'include_negative': True
WARNING: [load_algorithm] Using default parameter for 'top_n': 100
WARNING: [load_algorithm] Using default parameter for 'max_time': 300

Exeriment Run and Visualization

The experiment class once initialized offer a fast way to run experiments and a simple way to visualize circuits and messages

Running The Experiment

experiment.run(return_paths=False)

Completed paths: 100%|██████████| 100/100 [01:14<00:00,  1.33it/s]

Visualizing the Paths

experiment.plot(heads_per_row=6)

example of found paths visualized

Inspecting a Path

experiment.decode_paths()

example of decoding visualization

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.1

Nov 15, 2025

This version

0.0.0

Sep 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipe-0.0.0.tar.gz (51.6 kB view details)

Uploaded Sep 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ipe-0.0.0-py3-none-any.whl (46.4 kB view details)

Uploaded Sep 29, 2025 Python 3

File details

Details for the file ipe-0.0.0.tar.gz.

File metadata

Download URL: ipe-0.0.0.tar.gz
Upload date: Sep 29, 2025
Size: 51.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ipe-0.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ac1151cb198832085fd5e136a863074462401aa779fd2c042fe02273968e0ef0`
MD5	`09e9c2f9a3a543a6b281c8a2ce2a5b28`
BLAKE2b-256	`4413a7ed63cd4280989587d50a6ebe9d7d45c024497cf1702f7e1cda541e3060`

See more details on using hashes here.

File details

Details for the file ipe-0.0.0-py3-none-any.whl.

File metadata

Download URL: ipe-0.0.0-py3-none-any.whl
Upload date: Sep 29, 2025
Size: 46.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ipe-0.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd30a99505d42a948312c73ff994f9808da17f23800951fb969f99e6c19ac9b0`
MD5	`b3d95d1b4f0955a3ded5bb4521b556aa`
BLAKE2b-256	`93869e5deced559dd6575db9dd064c889f54dd18ddbffe0666290f58101ca786`

See more details on using hashes here.

ipe 0.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quickstart Guide: Isolating Path Effect for Latent Circuit Discovery

Resources

Brief Background

What are we looking for?

Why looking for paths?

Usage

Installation

Via pip

Via git repository

Models

Experiment Setup

Exeriment Run and Visualization

Running The Experiment

Visualizing the Paths

Inspecting a Path

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes