A tool for manipulating the internal neural activations of language models

These details have not been verified by PyPI

Project links

Project description

Neural State Manipulator

A Python library for recording and manipulating the internal neural activations of language models to control their generation behavior.

This is an alternative for representation engineering and folks no longer need to fine-tune or prompt a model to explicitly control a Large Language Model's generation behavior.

Description

Neural State Manipulator allows researchers and developers to:

Identify and access internal layers in transformer-based language models
Record activation patterns associated with specific behaviors or styles
Apply these patterns during generation to influence model outputs
Identify and manipulate concept-specific neurons

This package is useful for research in interpretability, controlled text generation, and model behavior steering.

Use Cases

Behavioral Steering

Trait Modulation: Adjust abstract traits like honesty, refusal tendencies, or creativity by applying activation patterns derived from contrasting examples.
Role-Playing: Induce specific personas (e.g., a helpful assistant vs. a sarcastic character) by steering hidden states toward predefined behavioral patterns.

Factual and Contextual Adjustment

Controlled Fact Editing: Alter memorized facts by manipulating internal neural activations. This modifies the model's internal representations, affecting downstream responses.
Contextual Consistency: Ensure responses align with specific themes (e.g., happiness or urgency) by injecting activation patterns at inference time.

Safety and Alignment

Refusal Behavior Control: Increase or decrease the likelihood of a model refusing requests, useful for content moderation or ethical guardrails.
Trait Validation: Test whether a model distinguishes between genuine and feigned traits (e.g., "honesty" vs. "feigned honesty") by comparing activation patterns, enhancing transparency.

Efficiency and Flexibility

Low-Cost Adaptation: Avoid resource-intensive fine-tuning by applying lightweight neural state manipulation during inference. This is particularly useful for edge deployments or scenarios requiring rapid iteration.
Coefficient Tuning: Adjust the strength of a trait (e.g., dialing humor from subtle to exaggerated) via scalar coefficients attached to activation patterns.

Installation

pip install neural-state-manipulator

Requirements

Python 3.8+
PyTorch 2.0.0+
Transformers 4.30.0+
Numpy 1.20.0+
BitsAndBytes 0.40.0+

Quick Start

Here's a simple example to get you started:

from neural_state_manipulator import NeuralStateManipulator, list_manipulable_layers

# Optional: List manipulable layers in a model
layers = list_manipulable_layers("meta-llama/Llama-3.2-1B-Instruct")

# Initialize the manipulator with your model
manipulator = NeuralStateManipulator("meta-llama/Llama-3.2-1B-Instruct")

# Select specific layers for manipulation
# prefer final layers for example if a model has 30 layers, use layers 28, 29, 30 and same for any layer and here I am using the 15th layer since the model has only 16 layers
target_layers = ['model.layers.15.post_attention_layernorm', 'model.layers.15.mlp.gate_proj']

# Record behavior patterns
formal_text = """
The implementation of artificial intelligence in industrial settings has been
demonstrated to enhance operational efficiency by an average of 27%, according to
recent comprehensive analyses. The methodology employed in these studies involved
rigorous examination of productivity metrics across diverse manufacturing environments.
"""
manipulator.record_behavior_pattern(formal_text, "formal_writing", target_layers)

# Generate text with the formal pattern applied
prompt = "Write about climate change:"
output = manipulator.generate_with_manipulation(
    prompt, "formal_writing", influence_strength=0.5, 
    max_new_tokens=300, temperature=0.7
)
print(output)

Advanced Usage

Recording Different Behavior Patterns

# Record a creative writing style
creative_text = """
Sunlight danced through the leaves, painting the forest floor with shimmering gold.
The ancient trees whispered secrets to one another, their roots intertwined beneath
the soil like old friends holding hands. A fox, clever and quick, darted between shadows.
"""
manipulator.record_behavior_pattern(creative_text, "creative_writing", target_layers)

# Generate with the creative pattern
creative_output = manipulator.generate_with_manipulation(
    prompt, "creative_writing", influence_strength=0.5,
    max_new_tokens=300, temperature=0.7
)

Identifying and Using Concept Neurons

# Define examples of technical and non-technical text
technical_texts = [
    "The algorithm complexity is O(n log n) in the average case.",
    "Quantum computing utilizes qubits rather than classical bits.",
    "The framework implements dependency injection via constructor parameters."
]
non_technical_texts = [
    "The sunset painted the sky with brilliant colors.",
    "She enjoyed her coffee while reading the morning newspaper.",
    "The dog chased the ball across the grassy field."
]

# Identify neurons associated with technical language
manipulator.capture_concept_neurons(
    technical_texts, non_technical_texts, "technical_concept"
)

# Generate text with technical concept neurons stimulated
technical_output = manipulator.generate_with_concept_neurons(
    "Explain how solar panels work:", "technical_concept", 
    influence_strength=0.5, max_new_tokens=300
)

Other Manipulation Methods

# Generate without manipulation (baseline)
plain_output = manipulator.generate_plain(prompt, max_new_tokens=300)

# Generate with erasure (subtracting a behavior pattern)
erasure_output = manipulator.generate_with_erasure(
    prompt, "formal_writing", erasure_strength=0.5, 
    target_layers=target_layers, max_new_tokens=300
)

# Generate with amplification (enhancing a behavior pattern)
amplified_output = manipulator.generate_with_amplification(
    prompt, "creative_writing", amplification_factor=0.5,
    target_layers=target_layers, max_new_tokens=300
)

# Generate with interpolation (blending baseline and behavior)
interpolated_output = manipulator.generate_with_interpolation(
    prompt, "formal_writing", interpolation_factor=0.5,
    target_layers=target_layers, max_new_tokens=300
)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{neural_state_manipulator,
  author = {Your Name},
  title = {Neural State Manipulator: A Tool for Manipulating Internal Neural Activations of Language Models},
  year = {2025},
  url = {https://github.com/yourusername/neural-state-manipulator}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_state_manipulator-0.1.0.tar.gz (13.2 kB view details)

Uploaded Apr 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neural_state_manipulator-0.1.0-py3-none-any.whl (12.2 kB view details)

Uploaded Apr 6, 2025 Python 3

File details

Details for the file neural_state_manipulator-0.1.0.tar.gz.

File metadata

Download URL: neural_state_manipulator-0.1.0.tar.gz
Upload date: Apr 6, 2025
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for neural_state_manipulator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9e84491053ecb0eaf04a277e33699117fb63e52720f2853e090d14a4192efe4d`
MD5	`18ef68f40cb15c42078bf5c9e0e24c1d`
BLAKE2b-256	`8a9e0c092d42f6dfeb54c147bd8fc37e097454f948e6c25810608ac539b5745e`

See more details on using hashes here.

File details

Details for the file neural_state_manipulator-0.1.0-py3-none-any.whl.

File metadata

Download URL: neural_state_manipulator-0.1.0-py3-none-any.whl
Upload date: Apr 6, 2025
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for neural_state_manipulator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`60b8258829fe606b91dffb131d0a5e952c7aa5cb82e5589b74bce44f12fa5110`
MD5	`49d31e365351adf3cc8f1060d7478525`
BLAKE2b-256	`17aba7c457a68bb5724b5ab3eed731875ba8fada36fd1c80b6fdcdba176298bf`

See more details on using hashes here.

neural-state-manipulator 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Neural State Manipulator

Description

Use Cases

Behavioral Steering

Factual and Contextual Adjustment

Safety and Alignment

Efficiency and Flexibility

Installation

Requirements

Quick Start

Advanced Usage

Recording Different Behavior Patterns

Identifying and Using Concept Neurons

Other Manipulation Methods

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes