Selective Prompt Anchoring (SPA) for LLMs
Project description
Selective Prompt Anchoring (SPA)
Selective Prompt Anchoring (SPA) is a model-agnostic algorithm designed for large language models (LLMs) that provides fine-grained control over text generation.
This is the official repository for the 📄 ICML 2025 paper: Selective Prompt Anchoring for Code Generation.
🤔 Why use SPA?
In human communication, nuanced emphasis and fine-grained implications are often conveyed through variations in volume, tone, or pauses. Conveying such subtleties in text-based communication with AI is challenging with plain text prompts.
✨ SPA enables users to assign importance, emphasis, or weights to specific parts of input text when prompting language models. SPA brings this capability to text-based AI communication by allowing users to "anchor" (the name is inspired by anchoring effect in psychology) certain words or phrases in the prompt, causing the model to pay more attention to them during generation. With SPA, users can flexibly steer LLMs' attention through a general, easy-to-use API.
🔍 Note: While we currently work on text-to-text generation and evaluating on code generation in our paper, the concept can be applied to other tasks (e.g., classification) or with other modalities (image).
💡 How SPA Works
SPA creates two parallel processing paths:
- The original prompt
- A modified prompt with anchored tokens masked
During token generation, SPA compares logits from both paths and adjusts final probabilities based on the anchoring strength, causing the model to emphasize the anchored concepts while maintaining coherent generation.
💻 Installation
From PyPI (Recommended)
pip install anchoring
From Source
# Clone the repository
git clone https://github.com/your-username/selective-prompt-anchoring.git
cd selective-prompt-anchoring
# Install dependencies
pip install -e .
⚡ Quick Start with Pipeline
The pipeline API provides a simple & general interface for using SPA:
from transformers import pipeline
import anchoring # The pipeline is automatically registered on import
# Create pipeline
spa_pipe = pipeline(
"selective-prompt-anchoring",
model="meta-llama/Llama-3.1-8B-Instruct",
anchoring_strength=3.0,
modulated_by_prob=False,
use_attention_mask=True,
device_map="auto"
)
# Simple text prompt with global anchors
prompt = "How is the weather today?"
global_anchors = ['today']
output = spa_pipe(prompt, anchors=global_anchors, max_new_tokens=512)
print(output["generated_text"])
Or you can stream the output
SPA supports streaming for real-time generation:
# Get streaming output
for token in spa_pipe(prompt, anchors=global_anchors, max_new_tokens=100, stream=True):
print(token, end="", flush=True)
print()
🛠️ Alternative: Direct Usage with model.generate()
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from anchoring import SPALogitsProcessor, spa_tokenize
# Load model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
# Define anchors and prompt
global_anchors = ['today']
prompt = "How is the weather today?"
# Tokenize with SPA
main_inputs, aux_inputs, mask_token = spa_tokenize(
prompt_with_anchors=prompt,
global_anchors=global_anchors,
tokenizer=tokenizer,
device=model.device
)
# Create SPA logits processor
spa_processor = SPALogitsProcessor(
aux_model=model,
aux_input_ids=aux_inputs,
strength=3.0,
modulated_by_prob=False,
use_attention_mask=True,
mask_token=mask_token,
tokenizer=tokenizer
)
# Generate text with SPA
output_sequences = model.generate(
input_ids=main_inputs,
attention_mask=torch.ones_like(main_inputs),
logits_processor=[spa_processor],
max_new_tokens=100,
do_sample=True,
temperature=0.7
)
# Decode and print
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print(generated_text)
Batch Processing Examples
# Define a list of prompts
prompts = ["What's the weather <anchor>today</anchor>?", "What's the weather <anchor>tomorrow</anchor>?"]
# Or with chat format
prompts = [
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather <anchor>today</anchor>?"}
],
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather <anchor>tomorrow</anchor>?"}
]
]
# Process all prompts
outputs = spa_pipe(prompts, anchors=['weather'], max_new_tokens=100)
for output in outputs:
print(output["generated_text"])
📝 Input Formats
Our code supports multiple input formats, allowing developers to conveniently represent anchors in prompts or messages. Developers can use inline paired tags, <anchor> </anchor>, or a global anchor list to denote anchored text.
They can also work with chat messages in a list, following the OpenAI API standard, or simply use a prompt string.
1️⃣ String with Global Anchors
prompt = "How is the weather today?"
global_anchors = ['today']
2️⃣ String with Inline Anchors
prompt = "What's the weather <anchor>today</anchor>? Think <anchor>step by step</anchor>."
3️⃣ Chat Messages with Message-Level Anchors
prompt = [
{
"role": "system",
"content": "You are a helpful assistant.",
"anchors": ["You", "assistant"]
},
{
"role": "user",
"content": "What's the weather today?",
"anchors": ["today"]
},
]
4️⃣ Chat Messages with Inline Anchors
prompt = [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What's the weather <anchor>today</anchor>?"
},
]
⚙️ Key Parameters
SPA-Specific Parameters
-
strength(default: 1.4): Controls the influence of anchored text.1.0: No effect (normal generation)0.0: Completely ignore anchored text>1.0: Emphasize anchored text (higher values = stronger emphasis)<0.0: Avoid using anchored text (negative values = stronger avoidance)
-
modulated_by_prob(default: True): When True, the anchoring strength is modulated by token probability.- Enable for more stable results, especially with higher anchoring strengths
- Disable for more precise control at lower strengths
-
use_attention_mask(default: True): When True, uses attention masking for anchor tokens, enhancing the effect of anchoring.
Standard Generation Parameters
SPA supports all standard Hugging Face generation parameters, such as:
max_new_tokens: Maximum number of tokens to generatedo_sample: Whether to use sampling for generationtemperature: Controls randomness (higher = more random)top_p: Top-p sampling parameter (nucleus sampling)top_k: Top-k sampling parametermin_new_tokens: Minimum number of tokens to generate
For more parameters, please check the official Huggingface Transformers' generation documentation.
🧩 Practical Hyperparameter Settings
-
strength(Anchoring Strength):- When you want to increase the model's attention/text emphasis
- If
modulated_by_prob = True, you can give a relatively high value of anchoring strength (e.g., 20). - If
modulated_by_prob = False, we recommend a value less than 2. - If you are pursuing an optimal value, you can easily tune this value through grid search on your benchmark. Our experiment demonstrates that this value follows a simple pattern (as value increases, performance first improves, then declines), and it is easy to tune by dozens of examples.
- If
- For reducing (
0 < anchoring_strength < 1) or reversing (anchoring_strength < 0), please set the value based on your concrete needs.
- When you want to increase the model's attention/text emphasis
-
modulated_by_prob(Weight influence by token probabilities): We recommend settingmodulated_by_prob=Truefor stable results. Set it as False if you aim for precise control or have other development needs. -
use_attention_mask(whether to use attention mask or just special token masking): SetTrueby default for more reliable performance, unless you detect any performance issue, you can set it asFalse, SPA supports a backup masking strategy by special tokens.
Model Compatibility
SPA is a model-agnostic algorithm. Our implementation inherits the Huggingface Transformers generation API. It should work for any LLM in Huggingface model collections. Please follow the corresponding model documentation for detailed instructions.
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
📚 Citation
If you use SPA in your research, please cite:
@misc{selective-prompt-anchoring,
author = {Yuan Tian, Tianyi Zhang},
title = {Selective Prompt Anchoring for Code Generation},
year = {2025},
conference={ICML'25},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anchoring-0.1.0.tar.gz.
File metadata
- Download URL: anchoring-0.1.0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fb80e0d2f38cbdbfd262a01c494a7c5360219d91becc32f88114283e26b4ffa
|
|
| MD5 |
1d4081e144f09d1862a6262ef8d13003
|
|
| BLAKE2b-256 |
2f1081ebaf351eafba251372aa8e5ab4baab4def30fd8ede59adee3bcdcb7c17
|
File details
Details for the file anchoring-0.1.0-py3-none-any.whl.
File metadata
- Download URL: anchoring-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47805716dc066585b0d1e54243ff5da8d9cf87211a951a8d0ca81c2eee57175b
|
|
| MD5 |
0bae275fdbee37716485aa3652665912
|
|
| BLAKE2b-256 |
cb578e7537f977dc48dd73e184ed8825925983fa7e7d475a5ffe3143d1e7e840
|