Local activation steering + trait monitoring TUI for HuggingFace transformers

These details have not been verified by PyPI

Project links

Homepage

Project description

saklas

Activation steering and trait monitoring for HuggingFace transformer models. Extract steering vectors, apply them during generation with per-call alpha control, and monitor how activations shift across behavioral probes.

Three interfaces: a Python API for scripted experiments and batch sweeps, an OpenAI-compatible API server for drop-in use with any OpenAI SDK client, and a terminal UI for interactive exploration.

Python API

from saklas import SaklasSession, DataSource, ResultCollector

with SaklasSession("google/gemma-2-2b-it", device="cuda") as session:
    # Extract a steering vector
    happy_profile = session.extract("happy")       # uses curated dataset
    session.steer("happy", happy_profile)           # register (no alpha yet)

    # Generate with steering
    result = session.generate(
        "What makes a good day?",
        alphas={"happy": 0.2},
    )
    print(result.text)
    print(result.readings)  # probe monitor data

    # A/B comparison — just omit alphas
    unsteered = session.generate("What makes a good day?")

    # Sweep alphas
    collector = ResultCollector()
    for alpha in [0, 0.05, 0.1, 0.15, 0.2, 0.25]:
        session.clear_history()
        result = session.generate(
            "Describe a sunset.",
            alphas={"happy": alpha},
        )
        collector.add(result, alpha=alpha)
    collector.to_csv("sweep_results.csv")

Key concepts

Vectors are registered without alphas. session.steer(name, profile) stores the vector. session.generate(input, alphas={"name": 0.5}) applies it for that generation only. Alpha directly represents the fraction of mean hidden-state norm (e.g. 0.5 = 50% perturbation at high-signal layers). No persistent hooks live on the model between calls.

Orthogonalization is per-call. session.generate(input, alphas={...}, orthogonalize=True) applies Gram-Schmidt to the active vectors for that generation only.

Thinking mode is per-call. For models that support it (Qwen 3.5, QwQ, Gemma 4, etc.), session.generate(input, thinking=True) enables the model's built-in reasoning trace. Thinking delimiters are detected automatically from the chat template — no hardcoded tokens. Thinking tokens are separated from the response — result.text contains only the final answer, while streaming via generate_stream yields TokenEvent objects with thinking=True for the reasoning trace.

Multiple vectors compose naturally:

session.steer("happy", happy_profile)
session.steer("formal", formal_profile)

# Apply both
result = session.generate("Hello.", alphas={"happy": 0.2, "formal": 0.1})

# Apply only one
result = session.generate("Hello.", alphas={"happy": 0.2})

# Apply none
result = session.generate("Hello.")

SaklasSession reference

session = SaklasSession(
    model_id,                        # HuggingFace model ID or local path
    device="auto",                   # "auto", "cuda", "mps", "cpu"
    quantize=None,                   # "4bit", "8bit", or None
    probes=None,                     # list of categories, or None for all
    system_prompt=None,              # default system prompt
    max_tokens=1024,                 # max tokens per generation
    cache_dir=None,                  # vector cache directory
)

# Vector extraction
profile = session.extract("happy")                  # curated dataset
profile = session.extract("empathy", baseline="apathy")  # contrastive
profile = session.extract([("pos", "neg"), ...])     # raw pairs
profile = session.extract(DataSource.csv("pairs.csv"))
profile = session.load_profile("saved.safetensors")
session.save_profile(profile, "output.safetensors")

# Model-generated contrastive pairs
pairs = session.generate_pairs("curiosity")  # list[(str, str)]

# Vector registry
session.steer("name", profile)     # register
session.unsteer("name")            # remove
session.vectors                    # dict of registered profiles

# Generation
result = session.generate("prompt", alphas={"name": 0.5}, orthogonalize=False)
result = session.generate("prompt", thinking=True)  # enable reasoning trace
for token in session.generate_stream("prompt", alphas={"name": 0.5}):
    if token.thinking:
        print(f"[think] {token.text}", end="", flush=True)
    else:
        print(token.text, end="", flush=True)

# Monitoring
session.monitor("honest")                   # curated probe
session.monitor("custom", custom_profile)    # from profile
session.unmonitor("honest")

# State
session.config.temperature = 0.8   # also: top_p, max_new_tokens, system_prompt
session.history                    # conversation messages
session.last_result                # most recent GenerationResult
session.model_info                 # model metadata
session.stop()                     # interrupt generation
session.rewind()                   # drop last exchange
session.clear_history()            # clear conversation

Structured output

result = session.generate("prompt", alphas={"happy": 0.2})
result.text              # decoded output
result.tokens            # token IDs
result.token_count       # number of tokens
result.tok_per_sec       # generation speed
result.elapsed           # seconds
result.vectors           # {"happy": 0.2} — snapshot of alphas used
result.readings          # {"probe_name": ProbeReadings} if probes active
result.to_dict()         # plain Python types, JSON-serializable

DataSource formats

from saklas import DataSource

ds = DataSource.curated("happy")                                   # bundled
ds = DataSource.json("pairs.json")                                 # saklas schema
ds = DataSource.csv("pairs.csv", positive_col="pos", negative_col="neg")
ds = DataSource.huggingface("user/dataset", split="train[:100]")   # requires datasets
ds = DataSource(pairs=[("positive text", "negative text")])

ResultCollector

collector = ResultCollector()
collector.add(result, concept="happy", alpha=0.2, run_id=1)

collector.to_dicts()               # list of flat dicts
collector.to_jsonl("results.jsonl")
collector.to_csv("results.csv")
collector.to_dataframe()           # requires pandas

Probe readings flatten to columns: probe_honest_mean, probe_honest_std, etc. Vector alphas flatten to vector_happy_alpha.

API Server

Serve a steered model as an OpenAI-compatible HTTP endpoint. Works with the OpenAI Python/JS SDK, LangChain, curl, or anything that speaks the OpenAI API.

pip install -e ".[serve]"
saklas serve google/gemma-2-9b-it --steer cheerful:0.2 --port 8000

Usage with OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

# Uses server-default steering (cheerful=0.2 from --steer flag)
resp = client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

# Override steering per-request
resp = client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"steer": {"alphas": {"cheerful": 0.4}, "orthogonalize": True, "thinking": True}},
)

# Streaming
for chunk in client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Serve CLI options

Flag	Default	Description
`model`	required	HuggingFace model ID or local path
`--host`	`0.0.0.0`	Bind address
`--port`	`8000`	Bind port
`-q`, `--quantize`	None	`4bit` or `8bit`
`-d`, `--device`	`auto`	`auto`, `cuda`, `mps`, `cpu`
`-p`, `--probes`	all	Probe categories to bootstrap
`-s`, `--system-prompt`	None	Default system prompt
`-m`, `--max-tokens`	`1024`	Max tokens per generation
`--steer`	None	Pre-load vector, repeatable. `name:alpha` or `name`
`--cors`	None	CORS origin, repeatable

Endpoints

OpenAI-compatible:

GET /v1/models — list loaded model
GET /v1/models/{model_id} — model details
POST /v1/chat/completions — chat (streaming + non-streaming)
POST /v1/completions — text completion (streaming + non-streaming)

Vector management:

GET /v1/saklas/vectors — list registered vectors
POST /v1/saklas/vectors/extract — extract a new vector (streams progress via SSE)
POST /v1/saklas/vectors/load — load from .safetensors file
DELETE /v1/saklas/vectors/{name} — remove a vector

Probe management:

GET /v1/saklas/probes — list active probes + last readings
GET /v1/saklas/probes/defaults — available default probes by category
POST /v1/saklas/probes/{name} — activate a probe
DELETE /v1/saklas/probes/{name} — deactivate a probe

Session management:

GET /v1/saklas/session — current config, model info, default alphas
PATCH /v1/saklas/session — update temperature, top_p, max_tokens, system_prompt
POST /v1/saklas/session/clear — clear conversation history
POST /v1/saklas/session/rewind — undo last exchange

Full API docs available at http://localhost:8000/docs when the server is running.

Probe readings are returned as an extra probe_readings field in generation responses — standard clients ignore it, aware clients get inline monitoring data.

Terminal UI

saklas google/gemma-2-9b-it
saklas mistralai/Mistral-7B-Instruct-v0.3 -q 4bit
saklas meta-llama/Llama-3.1-8B-Instruct --probes emotion personality

CLI options

Flag	Description
`model`	HuggingFace model ID or local path
`-q`, `--quantize`	`4bit` or `8bit` (CUDA only)
`-d`, `--device`	`auto` (default), `cuda`, `mps`, or `cpu`
`-p`, `--probes`	Categories: `all`, `none`, `emotion`, `personality`, `safety`, `cultural`, `gender`
`-s`, `--system-prompt`	System prompt
`-m`, `--max-tokens`	Max tokens per generation (default: 1024)
`-c`, `--cache-dir`	Vector cache directory
`-x`, `--clear-custom`	Clear user-extracted vectors and generated statements, then exit
`-X`, `--clear-all`	Clear all cached artifacts (curated probes, layer means, everything), then exit

Layout

+--------------------+----------------------------------+------------------+
|  VECTORS           |                                  |  TRAIT MONITOR   |
|  > happy  +0.10    |          Chat                    |  Emotion         |
|    formal +0.06    |                                  |    happy #### .42|
|                    |                                  |    sad   ##- -.15|
|  CONFIG            |                                  |  Personality     |
|  temp ####- 0.7    |                                  |    honest ### .31|
|  top-p #### 0.9    |                                  |                  |
|                    |  Type a message...               |                  |
+--------------------+----------------------------------+------------------+

Keybindings

Key	Action
`Tab` / `Shift+Tab`	Cycle panel focus
`Left` / `Right`	Adjust alpha
`Up` / `Down`	Navigate vectors / probes
`Enter`	Toggle vector on/off
`Backspace` / `Delete`	Remove selected vector or probe
`Ctrl+O`	Toggle orthogonalization
`Ctrl+T`	Toggle thinking mode (models that support it)
`Ctrl+A`	A/B compare (steered vs unsteered)
`Ctrl+R`	Regenerate last response (interrupts if generating)
`Ctrl+S`	Cycle trait sort mode
`[` / `]`	Adjust temperature
`{` / `}`	Adjust top-p
`Escape`	Stop generation
`Ctrl+Q`	Quit

Chat commands

Command	Description
`/steer "concept" [alpha]`	Extract and register steering vector
`/steer "concept" - "baseline" [alpha]`	Contrastive steering
`/probe "concept"`	Add monitoring probe
`/probe "concept" - "baseline"`	Contrastive probe
`/clear`	Clear history and reset probes
`/rewind`	Undo last exchange
`/sys <prompt>`	Set system prompt
`/temp <value>`	Set temperature
`/top-p <value>`	Set top-p
`/max <value>`	Set max tokens

All commands that touch the model (/steer, /probe) or modify history (/clear, /rewind) interrupt any in-progress generation and execute once it stops. Sending a new message mid-generation also stops the current response and submits immediately after.

Concepts matching built-in probe names use curated datasets automatically. Otherwise, pairs are generated by the loaded model and cached under saklas/datasets/cache/ — subsequent extractions of the same concept (even with a different model) reuse the cached statements.

Probe library

28 probes across 5 categories, each backed by ~60 curated contrastive pairs:

Category	Probes
Emotion	happy, angry, fearful, surprised, disgusted, excited, sad, calm
Personality	honest, creative, formal, verbose, authoritative, confident, uncertain
Safety	sycophantic, refusal, deceptive, hallucinating
Cultural	western, hierarchical, direct, contextual, religious, traditional
Gender	masculine, agentic, paternal

Probes are extracted on first run and cached per model under saklas/probes/cache/.

Install

pip install saklas             # base
pip install saklas[serve]      # + fastapi + uvicorn (for API server)
pip install saklas[research]   # + datasets + pandas (for API)

Requires Python 3.11+, PyTorch 2.2+. Works on Linux, macOS, and Windows.

From source

pip install -e .                   # base install
pip install -e ".[dev]"            # + pytest
pip install -e ".[serve]"          # + fastapi + uvicorn
pip install -e ".[research]"       # + datasets + pandas

Quantization and flash-attn (experimental)

The cuda and bnb extras install bitsandbytes and/or flash-attn for 4-bit/8-bit quantization and fused attention. These depend on platform-specific CUDA toolchains and may not build cleanly on all systems. Support is only guaranteed for the vanilla (unquantized) install.

pip install saklas[bnb]       # bitsandbytes only
pip install saklas[cuda]      # bitsandbytes + flash-attn (Linux only, needs CUDA_HOME)

From source, flash-attn requires build isolation disabled:

pip install torch psutil setuptools wheel && pip install -e ".[cuda]" --no-build-isolation

Supported architectures

53 architectures via model.py:_LAYER_ACCESSORS. Adding a new one = one function entry.

Llama (1-4), Mistral (1, 4), Ministral (1, 3), Mixtral, Gemma (1-4), Phi (1-3), PhiMoE, Qwen (1-3.5), Qwen2-MoE, Qwen3-MoE, Qwen3.5-MoE, Cohere (1-2), DeepSeek (V2-V3), StarCoder2, OLMo (1-3), OLMoE, GLM (3-4), Granite, GraniteMoE, Nemotron, StableLM, GPT-2, GPT-Neo, GPT-J, GPT-BigCode, GPT-NeoX, GPT-OSS, Bloom, Falcon, Falcon-H1, MPT, DBRX, OPT, RecurrentGemma.

How it works

Steering vectors

Representation Engineering (Zou et al., 2023): For each contrastive pair, captures attention-weighted hidden states at every layer. Computes pos-neg differences, extracts the first principal component per layer via batched SVD. Each layer is scored by explained variance ratio. The result is a multi-layer profile — no manual layer selection. Scores weight each layer's contribution during generation.

Custom steering vectors

When you steer on a concept that isn't in the curated probe library, saklas generates its own contrastive pairs using the loaded model, then extracts a vector from them. The pipeline:

Statement generation — the model writes contrastive statement pairs in batches, each batch seeded by a different specificity lens (unique facts/lore, physical traits, social dynamics, inner life, concrete routines). The prompt forces concept-specific detail — names, terminology, sensory descriptions that only apply to the target concept — and explicitly rejects generic statements that could work for anything similar.
Caching — generated pairs are saved under saklas/datasets/cache/ keyed by concept name. Pairs are model-independent, so a different model reuses the same cached statements.
Extraction — pairs feed into the standard contrastive PCA pipeline (per-layer SVD, explained variance scoring).

This means /steer "anything" works — personality traits, religions, animals, emotions, fictional characters, "man who ate too much spaghetti." The vector captures what's distinctive about the concept, not generic associations.

To regenerate cached statements (e.g. after a prompt update), use saklas -x to clear user-extracted vectors and statement caches.

Monitor

After generation, a separate forward pass over the generated text produces attention-weighted hidden states at every layer — the same pooling used during probe extraction. This ensures probe scores are computed against the same kind of representation the probes were trained on.

Each layer's hidden state is mean-centered — subtracting the per-layer mean computed from 45 neutral prompts — to remove baseline projection bias that otherwise makes raw cosine similarities uninformative. Score-weighted cosine similarities against probe vectors produce one value per probe per generation. Probe history accumulates across generations, enabling sparklines and running statistics.

Layer means are computed once per model and cached as _LAYERMEANS.safetensors alongside probe vectors.

Tests

pytest tests/ -v                   # all tests
pytest tests/test_results.py tests/test_datasource.py tests/test_server.py -v  # no GPU needed
pytest tests/test_smoke.py -v      # CUDA required

CUDA tests download google/gemma-2-2b-it (~5 GB) on first run. Non-CUDA tests (test_results.py, test_datasource.py, test_server.py) run anywhere.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.4.5

Apr 18, 2026

1.4.4

Apr 18, 2026

1.4.3

Apr 17, 2026

1.4.2

Apr 17, 2026

1.4.1

Apr 16, 2026

1.4.0

Apr 16, 2026

1.3.1

Apr 14, 2026

1.3.0

Apr 14, 2026

1.2.0

Apr 13, 2026

1.1.2

Apr 13, 2026

1.1.1

Apr 13, 2026

1.1.0

Apr 13, 2026

This version

1.0.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saklas-1.0.0.tar.gz (213.4 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saklas-1.0.0-py3-none-any.whl (222.5 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file saklas-1.0.0.tar.gz.

File metadata

Download URL: saklas-1.0.0.tar.gz
Upload date: Apr 13, 2026
Size: 213.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for saklas-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0f15974fd38366fad1e132e0ffc64f6e55b4e1c3cff67273ab249d8432594fea`
MD5	`34db65f2b340d1f2945ae4f1e51da553`
BLAKE2b-256	`e5390930e40e0d9cd1ca942c6b1cdf7ab8be26311f51a3680d3aeb201dc40823`

See more details on using hashes here.

File details

Details for the file saklas-1.0.0-py3-none-any.whl.

File metadata

Download URL: saklas-1.0.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 222.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for saklas-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9932edafa4f5290c5864822df7e8ae86932eeab6850573d959f9ab2a2184e8d6`
MD5	`d341ef0731b5c62efe75f2fac9823177`
BLAKE2b-256	`248c1cf53423fa0251cf0a52b6dca3d5d1ac5b2db6eba6048a234db0b157c7d9`

See more details on using hashes here.

saklas 1.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

saklas

Python API

Key concepts

SaklasSession reference

Structured output

DataSource formats

ResultCollector

API Server

Usage with OpenAI SDK

Serve CLI options

Endpoints

Terminal UI

CLI options

Layout

Keybindings

Chat commands

Probe library

Install

From source

Quantization and flash-attn (experimental)

Supported architectures

How it works

Steering vectors

Custom steering vectors

Monitor

Tests

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes