Selective Prompt Anchoring (SPA) for LLMs

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Selective Prompt Anchoring (SPA)

Selective Prompt Anchoring (SPA) is a model-agnostic algorithm designed for large language models (LLMs) that provides fine-grained control over text generation.

This is the official repository for the 📄 ICML 2025 paper: Selective Prompt Anchoring for Code Generation.

🤔 Why use SPA?

In human communication, nuanced emphasis and fine-grained implications are often conveyed through variations in volume, tone, or pauses. Conveying such subtleties in text-based communication with AI is challenging with plain text prompts.

✨ SPA enables users to assign importance, emphasis, or weights to specific parts of input text when prompting language models. SPA brings this capability to text-based AI communication by allowing users to "anchor" (the name is inspired by anchoring effect in psychology) certain words or phrases in the prompt, causing the model to pay more attention to them during generation. With SPA, users can flexibly steer LLMs' attention through a general, easy-to-use API.

🔍 Note: While we currently work on text-to-text generation and evaluating on code generation in our paper, the concept can be applied to other tasks (e.g., classification) or with other modalities (image).

💡 How SPA Works

SPA creates two parallel processing paths:

The original prompt
A modified prompt with anchored tokens masked

During token generation, SPA compares logits from both paths and adjusts final probabilities based on the anchoring strength, causing the model to emphasize the anchored concepts while maintaining coherent generation.

💻 Installation

From PyPI (Recommended)

pip install anchoring

From Source

# Clone the repository
git clone https://github.com/your-username/selective-prompt-anchoring.git
cd selective-prompt-anchoring

# Install dependencies
pip install -e .

⚡ Quick Start with Pipeline

The pipeline API provides a simple & general interface for using SPA:

from transformers import pipeline
import anchoring  # The pipeline is automatically registered on import

# Create pipeline
spa_pipe = pipeline(
    "selective-prompt-anchoring",
    model="meta-llama/Llama-3.1-8B-Instruct",
    anchoring_strength=3.0,
    modulated_by_prob=False,
    use_attention_mask=True,
    device_map="auto"
)

# Simple text prompt with global anchors
prompt = "How is the weather today?"
global_anchors = ['today']

output = spa_pipe(prompt, anchors=global_anchors, max_new_tokens=512)
print(output["generated_text"])

Or you can stream the output

SPA supports streaming for real-time generation:

# Get streaming output
for token in spa_pipe(prompt, anchors=global_anchors, max_new_tokens=100, stream=True):
    print(token, end="", flush=True)
print()

🛠️ Alternative: Direct Usage with `model.generate()`

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from anchoring import SPALogitsProcessor, spa_tokenize

# Load model and tokenizer
model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Define anchors and prompt
global_anchors = ['today']
prompt = "How is the weather today?"

# Tokenize with SPA
main_inputs, aux_inputs, mask_token = spa_tokenize(
    prompt_with_anchors=prompt,
    global_anchors=global_anchors,
    tokenizer=tokenizer,
    device=model.device
)

# Create SPA logits processor
spa_processor = SPALogitsProcessor(
    aux_model=model, 
    aux_input_ids=aux_inputs, 
    strength=3.0,
    modulated_by_prob=False,
    use_attention_mask=True,
    mask_token=mask_token,
    tokenizer=tokenizer
)

# Generate text with SPA
output_sequences = model.generate(
    input_ids=main_inputs,
    attention_mask=torch.ones_like(main_inputs),
    logits_processor=[spa_processor],
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7
)

# Decode and print
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print(generated_text)

Batch Processing Examples

# Define a list of prompts
prompts = ["What's the weather <anchor>today</anchor>?", "What's the weather <anchor>tomorrow</anchor>?"]

# Or with chat format
prompts = [
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather <anchor>today</anchor>?"}
    ],
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather <anchor>tomorrow</anchor>?"}
    ]
]

# Process all prompts
outputs = spa_pipe(prompts, anchors=['weather'], max_new_tokens=100)
for output in outputs:
    print(output["generated_text"])

📝 Input Formats

Our code supports multiple input formats, allowing developers to conveniently represent anchors in prompts or messages. Developers can use inline paired tags, <anchor> </anchor>, or a global anchor list to denote anchored text. They can also work with chat messages in a list, following the OpenAI API standard, or simply use a prompt string.

1️⃣ String with Global Anchors

prompt = "How is the weather today?"
global_anchors = ['today']

2️⃣ String with Inline Anchors

prompt = "What's the weather <anchor>today</anchor>? Think <anchor>step by step</anchor>."

3️⃣ Chat Messages with Message-Level Anchors

prompt = [
    {
        "role": "system", 
        "content": "You are a helpful assistant.", 
        "anchors": ["You", "assistant"]  
    },
    {
        "role": "user",
        "content": "What's the weather today?", 
        "anchors": ["today"]
    },
]

4️⃣ Chat Messages with Inline Anchors

prompt = [
    {
        "role": "system", 
        "content": "You are a helpful assistant."
    },
    {
        "role": "user", 
        "content": "What's the weather <anchor>today</anchor>?"
    },
]

⚙️ Key Parameters

SPA-Specific Parameters

strength (default: 1.4): Controls the influence of anchored text.
- 1.0: No effect (normal generation)
- 0.0: Completely ignore anchored text
- >1.0: Emphasize anchored text (higher values = stronger emphasis)
- <0.0: Avoid using anchored text (negative values = stronger avoidance)
modulated_by_prob (default: True): When True, the anchoring strength is modulated by token probability.
- Enable for more stable results, especially with higher anchoring strengths
- Disable for more precise control at lower strengths
use_attention_mask (default: True): When True, uses attention masking for anchor tokens, enhancing the effect of anchoring.

Standard Generation Parameters

SPA supports all standard Hugging Face generation parameters, such as:

max_new_tokens: Maximum number of tokens to generate
do_sample: Whether to use sampling for generation
temperature: Controls randomness (higher = more random)
top_p: Top-p sampling parameter (nucleus sampling)
top_k: Top-k sampling parameter
min_new_tokens: Minimum number of tokens to generate

For more parameters, please check the official Huggingface Transformers' generation documentation.

🧩 Practical Hyperparameter Settings

strength (Anchoring Strength):
- When you want to increase the model's attention/text emphasis
  - If modulated_by_prob = True, you can give a relatively high value of anchoring strength (e.g., 20).
  - If modulated_by_prob = False, we recommend a value less than 2.
  - If you are pursuing an optimal value, you can easily tune this value through grid search on your benchmark. Our experiment demonstrates that this value follows a simple pattern (as value increases, performance first improves, then declines), and it is easy to tune by dozens of examples.
- For reducing (0 < anchoring_strength < 1) or reversing (anchoring_strength < 0), please set the value based on your concrete needs.
modulated_by_prob (Weight influence by token probabilities): We recommend setting modulated_by_prob=True for stable results. Set it as False if you aim for precise control or have other development needs.
use_attention_mask (whether to use attention mask or just special token masking): Set True by default for more reliable performance, unless you detect any performance issue, you can set it as False, SPA supports a backup masking strategy by special tokens.

Model Compatibility

SPA is a model-agnostic algorithm. Our implementation inherits the Huggingface Transformers generation API. It should work for any LLM in Huggingface model collections. Please follow the corresponding model documentation for detailed instructions.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use SPA in your research, please cite:

@misc{selective-prompt-anchoring,
  author = {Yuan Tian, Tianyi Zhang},
  title = {Selective Prompt Anchoring for Code Generation},
  year = {2025},
  conference={ICML'25},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

May 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anchoring-0.1.0.tar.gz (19.6 kB view details)

Uploaded May 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anchoring-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded May 13, 2025 Python 3

File details

Details for the file anchoring-0.1.0.tar.gz.

File metadata

Download URL: anchoring-0.1.0.tar.gz
Upload date: May 13, 2025
Size: 19.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for anchoring-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5fb80e0d2f38cbdbfd262a01c494a7c5360219d91becc32f88114283e26b4ffa`
MD5	`1d4081e144f09d1862a6262ef8d13003`
BLAKE2b-256	`2f1081ebaf351eafba251372aa8e5ab4baab4def30fd8ede59adee3bcdcb7c17`

See more details on using hashes here.

File details

Details for the file anchoring-0.1.0-py3-none-any.whl.

File metadata

Download URL: anchoring-0.1.0-py3-none-any.whl
Upload date: May 13, 2025
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for anchoring-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47805716dc066585b0d1e54243ff5da8d9cf87211a951a8d0ca81c2eee57175b`
MD5	`0bae275fdbee37716485aa3652665912`
BLAKE2b-256	`cb578e7537f977dc48dd73e184ed8825925983fa7e7d475a5ffe3143d1e7e840`

See more details on using hashes here.

anchoring 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Selective Prompt Anchoring (SPA)

🤔 Why use SPA?

💡 How SPA Works

💻 Installation

From PyPI (Recommended)

From Source

⚡ Quick Start with Pipeline

Or you can stream the output

🛠️ Alternative: Direct Usage with model.generate()

📝 Input Formats

⚙️ Key Parameters

SPA-Specific Parameters

Standard Generation Parameters

🧩 Practical Hyperparameter Settings

Model Compatibility

📜 License

📚 Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🛠️ Alternative: Direct Usage with `model.generate()`