Auto-interpreting LLM features with a structured language
Project description
Semantic Regex
Auto-Interpreting LLM Features with a Structured Language
Overview
semantic-regex is a Python package for interpreting neural network features using the semantic regex language for automatic interpretability. Given an input list of tokens and another list of their activation values, it can either: (1) generate the full prompt for generating a semantic regex, and/or (2) pass the prompt to dspy to generate the semantic regex result. The semantic regex language is designed to capture the diverse activation patterns of LLM features, while providing the additional affordances of a structured language.
This package accompanies the research paper:
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Angie Boggust, Donghao Ren, Yannick Assogba, Dominik Moritz, Arvind Satyanarayan, Fred Hohman
arXiv, 2025.
Paper, GitHub, Python package, Viewer
Installation
Install the package using pip:
pip install semantic-regex
Or for development, clone the repository and install with uv:
git clone https://github.com/apple/ml-semantic-regex.git
cd semantic-regex
uv sync
Quick Start
The general flow is get tokens (batch_tokens) and activations (batch_activations), generate the prompt (generate_semantic_regex_prompt), and then generate the semantic regex (generate_semantic_regex).
To start, you can bring your own tokens and activations, or load them using an optoinal Neuronpedia API.
from semantic_regex import get_neuronpedia_data, generate_semantic_regex_prompt, generate_semantic_regex
import dspy
# Step 1a: Bring your own tokens and activations
batch_tokens = [
["The", "quick", "brown", "fox", "jumps"],
["A", "fast", "red", "car", "speeds"],
["She", "ran", "quickly", "through", "forest"]
]
batch_activations = [
[0.1, 0.9, 0.2, 0.1, 0.1], # "quick" activates strongly
[0.1, 0.8, 0.2, 0.1, 0.1], # "fast" activates strongly
[0.1, 0.1, 0.9, 0.1, 0.1] # "quickly" activates strongly
]
# Step 1b: Or get them from Neuronpedia
batch_tokens, batch_activations = get_neuronpedia_data(
model_id="gpt2-small",
layer="0-res-jb",
feature_index=21896
)
# Step 2: Generate prompt data with parameters
prompt_data = generate_semantic_regex_prompt(
batch_tokens=batch_tokens,
batch_activations=batch_activations,
activation_threshold=0.3,
n_data_examples=3,
show_breaks=True,
seed=42
)
## Optionally view the prompt
prompt = prompt_data["prompt"]
# Step 3: Use with dspy to generate semantic regex
lm = dspy.LM('openai/gpt-4o-mini') # or any other supported model
result = generate_semantic_regex(
prompt_data=prompt_data,
lm=lm,
temperature=0.7,
logging=True # Print the prompt and generated regex
)
## Output of the form: [:field speed:]
print(f"Generated semantic regex: {result['description']}")
API Reference
generate_semantic_regex_prompt()
Generate a semantic regex prompt with metadata from tokens and activations.
Parameters:
batch_tokens(List[List[str]]): List of token sequencesbatch_activations(List[List[float]]): List of corresponding activation sequencesactivation_threshold(float, default=0.3): Minimum activation threshold for highlightingn_data_examples(int, default=10): Number of examples to include in promptn_tokens_per_sample(int, default=32): Number of tokens per example snippetsampling_method(str, default='top'): Sampling strategy - 'top', 'random', or 'quantile'show_breaks(bool, default=True): Whether to show line breaks in examplesseed(int, default=42): Random seed for reproducibility
Returns:
dict: Dictionary containing:prompt(str): Complete prompt string that can be used with any language modelparameters(dict): All parameters used for generation (for reproducibility)
generate_semantic_regex()
Generate a semantic regex pattern using DSPy for model-agnostic generation.
Parameters:
prompt_data(dict): Dictionary fromgenerate_semantic_regex_prompt()with 'prompt' and 'parameters'lm(Optional[dspy.LM], default=None): DSPy language model instancetemperature(float, default=1.0): Sampling temperature for the language modellogging(bool, default=False): Whether to print the prompt and generated regex
Returns:
dict: Dictionary containing:description(str): Generated semantic regex patternprompt(str): The original prompt usedlm(dspy.LM): The language model usedparameters(dict): All parameters used for generation (prompt + LM parameters)
get_neuronpedia_data()
Get tokens and activations from a Neuronpedia feature.
Parameters:
model_id(str): Model identifier (e.g., 'gpt2-small')layer(str): Layer identifier (e.g., '0-res-jb')feature_index(int): Feature index number
Returns:
Tuple[List[List[str]], List[List[float]]]: (batch_tokens, batch_activations) ready for prompt generation
Note: Requires the neuronpedia package to be installed separately.
Semantic Regex Language
The package generates prompts that help language models create patterns using a structured language:
[:symbol X:]- matches exact phrase X[:lexeme X:]- matches phrase X and its syntactic variants[:field X:]- matches phrase X and its semantic variantsS1 S2- matches sequence where S1 is followed by S2S1|S2- matches either S1 or S2S?- matches S or nothing (optional)@{:context C:}(S)- matches S only in context C
Testing
Run the test suite:
uv run pytest
uv run pytest --capture=no # Show print statements
Run specific test functions:
uv run pytest tests/test_api.py::test_basic_functionality
uv run pytest tests/test_api.py::simple_test
Development
Setup
- Clone the repository:
git clone https://github.com/apple/ml-semantic-regex.git
cd semantic-regex
-
Install uv, see https://docs.astral.sh/uv/getting-started/installation/.
-
Run the test suite:
uv run pytest
uv run pytest --capture=no # Show print statements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_regex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semantic_regex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f51660a9f19fd1a86a949391cf944cc3507ef539cf469b8565867003c51d6c2b
|
|
| MD5 |
f9b1859f197f1facac1fc1892a954fbc
|
|
| BLAKE2b-256 |
575045a1a05a06998ee52b6bb46598c8c6fd242c80b0bca854c506141c4a57ef
|
Provenance
The following attestation bundles were made for semantic_regex-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on apple/ml-semantic-regex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semantic_regex-0.1.0-py3-none-any.whl -
Subject digest:
f51660a9f19fd1a86a949391cf944cc3507ef539cf469b8565867003c51d6c2b - Sigstore transparency entry: 700081555
- Sigstore integration time:
-
Permalink:
apple/ml-semantic-regex@e36ad18a57578850130f9379f6ba5d07e930b217 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/apple
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@e36ad18a57578850130f9379f6ba5d07e930b217 -
Trigger Event:
release
-
Statement type: