Skip to main content

Selective Prompt Anchoring (SPA) for LLMs

Project description

Selective Prompt Anchoring (SPA)

Selective Prompt Anchoring (SPA) is a model-agnostic algorithm designed for large language models (LLMs) that provides fine-grained control over text generation.

This is the official repository for the 📄 ICML 2025 paper: Selective Prompt Anchoring for Code Generation.

🤔 Why use SPA?

In human communication, nuanced emphasis and fine-grained implications are often conveyed through variations in volume, tone, or pauses. Conveying such subtleties in text-based communication with AI is challenging with plain text prompts.

✨ SPA enables users to assign importance, emphasis, or weights to specific parts of input text when prompting language models. SPA brings this capability to text-based AI communication by allowing users to "anchor" (the name is inspired by anchoring effect in psychology) certain words or phrases in the prompt, causing the model to pay more attention to them during generation. With SPA, users can flexibly steer LLMs' attention through a general, easy-to-use API.

🔍 Note: While we currently work on text-to-text generation and evaluating on code generation in our paper, the concept can be applied to other tasks (e.g., classification) or with other modalities (image).

💡 How SPA Works

SPA creates two parallel processing paths:

  1. The original prompt
  2. A modified prompt with anchored tokens masked

During token generation, SPA compares logits from both paths and adjusts final probabilities based on the anchoring strength, causing the model to emphasize the anchored concepts while maintaining coherent generation.

💻 Installation

From PyPI (Recommended)

pip install anchoring

From Source

# Clone the repository
git clone https://github.com/your-username/selective-prompt-anchoring.git
cd selective-prompt-anchoring

# Install dependencies
pip install -e .

⚡ Quick Start with Pipeline

The pipeline API provides a simple & general interface for using SPA:

from transformers import pipeline
import anchoring  # The pipeline is automatically registered on import

# Create pipeline
spa_pipe = pipeline(
    "selective-prompt-anchoring",
    model="meta-llama/Llama-3.1-8B-Instruct",
    anchoring_strength=3.0,
    modulated_by_prob=False,
    use_attention_mask=True,
    device_map="auto"
)

# Simple text prompt with global anchors
prompt = "How is the weather today?"
global_anchors = ['today']

output = spa_pipe(prompt, anchors=global_anchors, max_new_tokens=512)
print(output["generated_text"])

Or you can stream the output

SPA supports streaming for real-time generation:

# Get streaming output
for token in spa_pipe(prompt, anchors=global_anchors, max_new_tokens=100, stream=True):
    print(token, end="", flush=True)
print()

🛠️ Alternative: Direct Usage with model.generate()

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from anchoring import SPALogitsProcessor, spa_tokenize

# Load model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Define anchors and prompt
global_anchors = ['today']
prompt = "How is the weather today?"

# Tokenize with SPA
main_inputs, aux_inputs, mask_token = spa_tokenize(
    prompt_with_anchors=prompt,
    global_anchors=global_anchors,
    tokenizer=tokenizer,
    device=model.device
)

# Create SPA logits processor
spa_processor = SPALogitsProcessor(
    aux_model=model, 
    aux_input_ids=aux_inputs, 
    strength=3.0,
    modulated_by_prob=False,
    use_attention_mask=True,
    mask_token=mask_token,
    tokenizer=tokenizer
)

# Generate text with SPA
output_sequences = model.generate(
    input_ids=main_inputs,
    attention_mask=torch.ones_like(main_inputs),
    logits_processor=[spa_processor],
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7
)

# Decode and print
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print(generated_text)
Batch Processing Examples
# Define a list of prompts
prompts = ["What's the weather <anchor>today</anchor>?", "What's the weather <anchor>tomorrow</anchor>?"]

# Or with chat format
prompts = [
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather <anchor>today</anchor>?"}
    ],
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather <anchor>tomorrow</anchor>?"}
    ]
]

# Process all prompts
outputs = spa_pipe(prompts, anchors=['weather'], max_new_tokens=100)
for output in outputs:
    print(output["generated_text"])

📝 Input Formats

Our code supports multiple input formats, allowing developers to conveniently represent anchors in prompts or messages. Developers can use inline paired tags, <anchor> </anchor>, or a global anchor list to denote anchored text. They can also work with chat messages in a list, following the OpenAI API standard, or simply use a prompt string.

1️⃣ String with Global Anchors
prompt = "How is the weather today?"
global_anchors = ['today']
2️⃣ String with Inline Anchors
prompt = "What's the weather <anchor>today</anchor>? Think <anchor>step by step</anchor>."
3️⃣ Chat Messages with Message-Level Anchors
prompt = [
    {
        "role": "system", 
        "content": "You are a helpful assistant.", 
        "anchors": ["You", "assistant"]  
    },
    {
        "role": "user",
        "content": "What's the weather today?", 
        "anchors": ["today"]
    },
]
4️⃣ Chat Messages with Inline Anchors
prompt = [
    {
        "role": "system", 
        "content": "You are a helpful assistant."
    },
    {
        "role": "user", 
        "content": "What's the weather <anchor>today</anchor>?"
    },
]

⚙️ Key Parameters

SPA-Specific Parameters

  • strength (default: 1.4): Controls the influence of anchored text.

    • 1.0: No effect (normal generation)
    • 0.0: Completely ignore anchored text
    • >1.0: Emphasize anchored text (higher values = stronger emphasis)
    • <0.0: Avoid using anchored text (negative values = stronger avoidance)
  • modulated_by_prob (default: True): When True, the anchoring strength is modulated by token probability.

    • Enable for more stable results, especially with higher anchoring strengths
    • Disable for more precise control at lower strengths
  • use_attention_mask (default: True): When True, uses attention masking for anchor tokens, enhancing the effect of anchoring.

Standard Generation Parameters

SPA supports all standard Hugging Face generation parameters, such as:

  • max_new_tokens: Maximum number of tokens to generate
  • do_sample: Whether to use sampling for generation
  • temperature: Controls randomness (higher = more random)
  • top_p: Top-p sampling parameter (nucleus sampling)
  • top_k: Top-k sampling parameter
  • min_new_tokens: Minimum number of tokens to generate

For more parameters, please check the official Huggingface Transformers' generation documentation.

🧩 Practical Hyperparameter Settings

  1. strength (Anchoring Strength):

    • When you want to increase the model's attention/text emphasis
      • If modulated_by_prob = True, you can give a relatively high value of anchoring strength (e.g., 20).
      • If modulated_by_prob = False, we recommend a value less than 2.
      • If you are pursuing an optimal value, you can easily tune this value through grid search on your benchmark. Our experiment demonstrates that this value follows a simple pattern (as value increases, performance first improves, then declines), and it is easy to tune by dozens of examples.
    • For reducing (0 < anchoring_strength < 1) or reversing (anchoring_strength < 0), please set the value based on your concrete needs.
  2. modulated_by_prob (Weight influence by token probabilities): We recommend setting modulated_by_prob=True for stable results. Set it as False if you aim for precise control or have other development needs.

  3. use_attention_mask (whether to use attention mask or just special token masking): Set True by default for more reliable performance, unless you detect any performance issue, you can set it as False, SPA supports a backup masking strategy by special tokens.

Model Compatibility

SPA is a model-agnostic algorithm. Our implementation inherits the Huggingface Transformers generation API. It should work for any LLM in Huggingface model collections. Please follow the corresponding model documentation for detailed instructions.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use SPA in your research, please cite:

@misc{selective-prompt-anchoring,
  author = {Yuan Tian, Tianyi Zhang},
  title = {Selective Prompt Anchoring for Code Generation},
  year = {2025},
  conference={ICML'25},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anchoring-0.1.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anchoring-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file anchoring-0.1.0.tar.gz.

File metadata

  • Download URL: anchoring-0.1.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for anchoring-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5fb80e0d2f38cbdbfd262a01c494a7c5360219d91becc32f88114283e26b4ffa
MD5 1d4081e144f09d1862a6262ef8d13003
BLAKE2b-256 2f1081ebaf351eafba251372aa8e5ab4baab4def30fd8ede59adee3bcdcb7c17

See more details on using hashes here.

File details

Details for the file anchoring-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: anchoring-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for anchoring-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47805716dc066585b0d1e54243ff5da8d9cf87211a951a8d0ca81c2eee57175b
MD5 0bae275fdbee37716485aa3652665912
BLAKE2b-256 cb578e7537f977dc48dd73e184ed8825925983fa7e7d475a5ffe3143d1e7e840

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page