Skip to main content

Python activation-steering library for PyTorch and Hugging Face-style language models.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

pysteer Python activation steering library logo

Python activation steering for LLMs and transformer language models.

pysteer package on PyPI pysteer supported Python versions MPL 2.0 license pysteer continuous integration status pysteer GitHub issues

pysteer: Python Activation Steering for LLMs

pysteer is a lightweight Python library for activation steering, representation engineering, and inference-time model steering in PyTorch transformer language models. It learns steering artifacts from labeled prompt/response examples, then applies interventions to intermediate activations without fine-tuning or modifying model weights.

The package is designed for researchers and developers working on LLM control, mechanistic interpretability, AI safety experiments, and activation engineering workflows with Hugging Face-style models.

Why Use pysteer

  • Steer LLM behavior at inference time without retraining the model.
  • Compare multiple activation-steering methods behind one Executor API.
  • Build prompt-routed, adaptive, or gradient-derived steering workflows.
  • Extend the steering engine with custom derivation and runtime strategies.
  • Keep activation hooks scoped with a context-managed runtime wrapper.

Features

  • Training-time activation extraction from selected transformer layers.
  • Built-in steering methods: CMD, CPCA, ACTS-CMD, ACTS-CPCA, MBS-CMD, Angular Steering, Adaptive Activation Steering, COLD-Kernel, and COLD-Steer.
  • A registry-based extension layer for adding new derivation/runtime methods without editing Executor.
  • A context-managed runtime wrapper that keeps steering hooks scoped to the calls where they are intended.
  • Sphinx documentation with autodoc, Napoleon docstrings, API reference pages, and an open target.

Use Cases

  • LLM activation steering and behavior control from labeled examples.
  • Representation engineering experiments on residual stream activations.
  • Mechanistic interpretability prototypes that compare steering directions.
  • Inference-time intervention workflows where model weights should stay frozen.
  • Custom activation-engineering methods for PyTorch transformer models.

Installation

Install from PyPI:

python -m pip install pysteer

Install from a local checkout for development:

python -m pip install -e ".[dev,docs]"

Install only the runtime dependencies when working from source without an editable install:

python -m pip install -r REQUIREMENTS.txt

Install documentation dependencies only when building the docs:

python -m pip install -r docs/requirements.txt

Minimal Example

The core entry point is pysteer.Executor. Training data uses prompt, response, and reference columns, where reference identifies the desired or positive response class for contrastive steering methods.

import pandas as pd

from pysteer import Executor

train_df = pd.DataFrame(
    [
        {"prompt": "Question", "response": "Helpful answer", "reference": 1},
        {"prompt": "Question", "response": "Unhelpful answer", "reference": 0},
    ]
)

executor = Executor(
    model=model,
    tokenizer=tokenizer,
    train_df=train_df,
    method="cmd",
    layers_to_extract=[12, 16, 20],
    alpha=0.5,
)

wrapper = executor.representation_extractor()

with wrapper as steered_model:
    output = steered_model.generate(**inputs, max_new_tokens=64)

Built-in unsupervised methods expect prompt, response, and reference. Routed methods add their own grouping columns, such as task_id, mbs_layer, or ACT grouping identifiers.

Training rows are validated before hooks are attached. reference must contain only 0 and 1, and every contrastive training scope needs at least one positive and one negative row. For standard methods the scope is the full dataframe; ACTS validates each integer-like task_id; MBS-CMD validates each selected mbs_layer; ACT validates each normalized ACT group.

Supported Steering Methods

pysteer ships with a default registry of activation-steering methods:

  • cmd: Contrastive Mean Difference steering vectors.
  • cpca: Contrastive PCA steering directions.
  • acts_cmd: ACTS prompt-routed CMD steering.
  • acts_cpca: ACTS prompt-routed CPCA steering.
  • mbs_cmd: layer-balanced CMD steering.
  • angular: Angular Steering with plane rotations.
  • act: Adaptive Activation Steering with prompt clustering and probes.
  • cold_kernel: gradient-derived COLD-Kernel steering directions.
  • cold_steer: inference-efficient COLD-Steer alias.

Architecture

The library separates steering into four concerns:

  • Derivation: how an artifact is learned from activations.
  • Artifact: the vector, plane, routing table, probe, or richer object produced.
  • Site: where the artifact reads or writes model state.
  • Runtime policy: when and how the intervention is applied.

The steering_engine package contains the extension API:

  • domain.py defines declarative data structures such as ActivationSite, InterventionSpec, SteeringArtifact, and SteeringMethodSpec.
  • components.py defines protocols for readers, derivers, runtime strategies, schedules, controllers, and compilers.
  • registry.py provides SteeringMethodRegistry and MethodDefinition.
  • defaults.py registers the built-in methods.

See docs/activation_steering_architecture.md for the design rationale and taxonomy.

Extending Methods

Register a new method with a vector factory and a runtime strategy builder:

from steering_engine import MethodDefinition, SteeringMethodRegistry
from steering_engine.domain import DerivationFamily, InterventionKind
from steering_engine.domain import RuntimeFamily, SteeringMethodSpec

registry = SteeringMethodRegistry()
registry.register(
    MethodDefinition(
        spec=SteeringMethodSpec(
            method_id="my_method",
            label="My Method",
            derivation_family=DerivationFamily.CUSTOM,
            runtime_family=RuntimeFamily.STATIC,
            intervention_kind=InterventionKind.ADD,
        ),
        vector_factory=lambda ctx: MyVectorDeriver(...),
        strategy_builder=lambda deriver, ctx: MyRuntimeStrategy(...),
    )
)

Documentation

Build the Sphinx HTML documentation:

make -C docs html

Build and open it in your default browser:

make -C docs open

On Windows without make:

docs\make.bat html
docs\make.bat open

The generated site is written to docs/_build/html/index.html.

Contributing

See CONTRIBUTING.md for development setup, local checks, and the preferred extension path for new steering methods. Security reports should follow SECURITY.md.

Evaluation Data

pysteer focuses on the generic steering engine and expects callers to provide their own training dataframes for application-specific evaluations.

License

This project is licensed under the Mozilla Public License 2.0. See LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysteer_adaptation-0.1.1.tar.gz (372.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysteer_adaptation-0.1.1-py3-none-any.whl (100.0 kB view details)

Uploaded Python 3

File details

Details for the file pysteer_adaptation-0.1.1.tar.gz.

File metadata

  • Download URL: pysteer_adaptation-0.1.1.tar.gz
  • Upload date:
  • Size: 372.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysteer_adaptation-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a257611a6850189da6d8a12e3076eda1a09bb0dd139b3cfd294319033fedc91d
MD5 595b8ecd4bd7f449ece9558fc5a72415
BLAKE2b-256 39b86cdcb70d6a7993d183ab169fb83ea6962b55e75c088f0c98d2f95657f08d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysteer_adaptation-0.1.1.tar.gz:

Publisher: publish.yml on mattiapiazzalunga/pysteer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysteer_adaptation-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pysteer_adaptation-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 47d707bea66b1ef78088f9916644d449355670752710e498e1ef37aa47790f7a
MD5 bcd6bde438cd225104af77c0ab7d056e
BLAKE2b-256 8d25d1b8906c6465c5b4f2ea24e41eb913c5c816863b0539eca4e3f25fe1e9ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysteer_adaptation-0.1.1-py3-none-any.whl:

Publisher: publish.yml on mattiapiazzalunga/pysteer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page