Activation steering and trait monitoring for HuggingFace transformers — Python library, OpenAI-compatible server, and terminal UI

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a9lim

These details have not been verified by PyPI

Project description

saklas

Saklas does activation steering on local HuggingFace models — extract a direction from contrastive pairs (angry vs. calm, formal vs. casual, whatever), add it to the hidden states at generation time, dial the strength with one number. Weights never change; nothing persists between calls. The core idea is Representation Engineering (Zou et al., 2023), and repeng got there first as a library. If you want a clean minimal steering library, go use repeng.

What saklas adds on top of the steering itself:

A trait monitor — 21 probes that score every generated token on affect, epistemic stance, register, alignment, and social/cultural axes, with live sparklines and per-token highlighting so you can see where in a response the model's register shifted
A terminal UI with live alpha knobs, A/B comparison against unsteered baselines, and the monitor built in — the whole thing runs on a MacBook with MPS
A dual-protocol HTTP server that speaks both OpenAI /v1/* and Ollama /api/* on the same port, so you can point Open WebUI, Enchanted, or any Ollama/OpenAI client at it and get steered completions with probe readings piggybacked on the response
Persona cloning from a text sample — saklas vector clone transcripts.txt -N hunter extracts a steering vector for that voice, no contrastive pairs needed
Vector comparison — saklas vector compare angry.calm happy.sad -m MODEL gives you cosine similarity between any two steering profiles, or a full N×N entanglement matrix across your probe library
A concept pack system on HuggingFace model repos, with GGUF import/export for interchange with repeng/llama.cpp tooling

Three ways to use it:

saklas tui <model> — terminal UI with live alpha knobs, probe readings, and A/B comparison
saklas serve <model> — HTTP server speaking OpenAI + Ollama wire formats on the same port
SaklasSession — Python API for scripted experiments, batch sweeps, and embedding in your own pipelines

Runs on CUDA and Apple Silicon MPS (the full TUI runs interactively on a MacBook). CPU works but is slow. Tested on Qwen, Gemma, Ministral, gpt-oss, Llama, and GLM. Many more architectures are wired up in model.py:_LAYER_ACCESSORS but untested — they may work, may need a tweak, or may explode. Reports welcome.

Credits and prior art

Saklas implements the contrastive-PCA extraction procedure from the Representation Engineering paper (Zou et al., 2023). It also owes a large debt to repeng by Theia Vogel, which was the first widely-available practical implementation and has become the reference point for the community. I wrote the first version of saklas without knowing repeng existed, which is slightly embarrassing, but it does mean the two projects come at the problem from different angles — repeng is library-first and lean, saklas is TUI-first with a monitoring/probing layer on top. If you care about raw steering performance and clean composability, repeng is probably what you want. If you want something you can poke at interactively, see per-token probe readings, or drop in front of an existing chat UI, read on.

Quick start

pip install saklas
saklas tui google/gemma-3-4b-it

First run downloads the model, extracts the 21 bundled probes (one-time, cached to disk), and drops you into the TUI. Try /steer angry 0.3 — saklas resolves that to the bundled angry.calm axis with α = +0.3 and the model leans angry. Type /steer calm 0.3 and you get the same vector at α = −0.3. Ctrl+Y paints each token by how strongly any probe lit up on it. Ctrl+A does A/B comparison against the unsteered baseline.

Want it as an API server instead?

pip install saklas[serve]
saklas serve google/gemma-3-4b-it --steer cheerful:0.2

Or from Python:

from saklas import SaklasSession

with SaklasSession.from_pretrained("google/gemma-3-4b-it") as s:
    name, profile = s.extract("angry.calm")          # bundled bipolar pack
    s.steer(name, profile)                           # register (no alpha yet)
    print(s.generate("What makes a good day?", steering={name: 0.3}).text)

Install

pip install saklas             # library + TUI
pip install saklas[serve]      # + FastAPI/uvicorn for the API server
pip install saklas[gguf]       # + gguf package for llama.cpp interchange
pip install saklas[research]   # + datasets/pandas for dataset loading and DataFrames

Requires Python 3.11+ and PyTorch 2.2+. Runs on Linux, macOS, and Windows. CPU works but is slow — CUDA or Apple Silicon MPS is recommended for anything interactive. The full TUI with a 4B parameter model runs fine on a MacBook Pro with MPS.

From source:

git clone https://github.com/a9lim/saklas
cd saklas
pip install -e ".[dev]"        # + pytest

How it works

Steering vectors

Give saklas paired examples of a concept (angry sentences on one side, calm on the other, in similar situations). It runs each through the model, captures hidden states at the last content token of every layer, and diffs the two sides. The leading principal component of that diff — at every layer — is the direction in hidden-state space that "points toward angry." That's one steering vector.

At generation time, saklas hooks every relevant layer and adds alpha × direction to the hidden state, then immediately rescales each position back to its original magnitude. Norm preservation keeps the residual stream on its natural trajectory — high-α rotations land cleanly instead of being attenuated by downstream layers reacting to inflated norms. The hook is removed once generation finishes.

Alphas are backbone-normalized — per-layer PCA shares are baked into the stored tensor magnitudes at extraction time, so the same numeric α means roughly the same intensity across architectures. Rule of thumb: α ≈ 0.1–0.3 is a subtle nudge, 0.3–0.6 is clearly visible, past 0.6 is a coherence experiment, ~0.75 is the cliff.

Multiple vectors compose naturally — register them all, pass whatever alpha map you want per call. Co-layer directions sum into a single in-place hook per layer.

Custom concepts

When you steer on something not in the library, the loaded model writes its own contrastive pairs. It first generates 9 broad situational domains for the axis (for deer.wolf: "predation and threat assessment", "territorial defense", etc.), then samples 5 first-person contrastive pairs per domain. An anti-allegory clause keeps non-human axes literal — deer.wolf yields sensory-animal POV, not timid-person-vs-aggressive-person. Human-register axes still land in human-register domains because the framework is concept-adaptive.

This means /steer "anything" works — religions, animals, fictional characters, whatever you can name.

Trait monitor

Alongside generation, saklas captures the hidden state at every probe layer, every step — via a hook attached before generation and detached after. No second forward pass. Those captures are mean-centered against a neutral baseline and scored via magnitude-weighted cosine similarity against every active probe. History accumulates across generations in the TUI as sparklines. In the library you get result.readings as a dict of ProbeReadings.

Vector comparison

Profile.cosine_similarity(other) computes magnitude-weighted cosine similarity between two steering profiles over their shared layers. The CLI exposes this as saklas vector compare with three modes: single-target ranked comparison against all installed profiles, pairwise comparison, and N×N similarity matrices. The TUI has /compare for interactive use.

This is how you spot axis entanglement — e.g. creative.conventional and hallucinating.grounded extract near-identical directions on some models (weighted cosine +0.78 on gemma-4-e4b-it). That's a model-level property, not a probe design error.

The probe library

21 probes across 6 categories, each backed by 45 curated contrastive pairs. Most are bipolar (angry.calm, masculine.feminine); two are monopolar (agentic, manipulative).

Category	Probes
Affect	angry.calm, happy.sad
Epistemic	confident.uncertain, honest.deceptive, hallucinating.grounded
Alignment	agentic, refusal.compliant, sycophantic.blunt, manipulative
Register	formal.casual, direct.indirect, verbose.concise, creative.conventional, humorous.serious, warm.clinical, technical.accessible
Social stance	authoritative.submissive, high_context.low_context
Cultural	masculine.feminine, religious.secular, traditional.progressive

Pole aliasing: /steer angry 0.5 → angry.calm at α = +0.5. /steer calm 0.5 → angry.calm at α = −0.5. Works for any installed bipolar pack.

Probes extract on first run per model and cache to ~/.saklas/vectors/default/<concept>/<safe_model_id>.safetensors.

Terminal UI

saklas tui google/gemma-2-9b-it
saklas tui mistralai/Mistral-7B-Instruct-v0.3 -q 4bit
saklas tui meta-llama/Llama-3.1-8B-Instruct -p affect register

Three panels: vector registry on the left (live alpha knobs), chat in the center, trait monitor on the right (sparklines per probe). Tab cycles focus; arrow keys navigate and adjust.

Flags

Flag	Description
`model`	HuggingFace ID or local path (optional if supplied by `-c`)
`-q`, `--quantize`	`4bit` or `8bit` (CUDA only)
`-d`, `--device`	`auto` (default), `cuda`, `mps`, `cpu`
`-p`, `--probes`	Categories: `all`, `none`, `affect`, `epistemic`, `alignment`, `register`, `social_stance`, `cultural`
`-c`, `--config`	Load setup YAML (repeatable; later files override earlier)
`-s`, `--strict`	With `-c`: fail on missing vectors

Keybindings

Key	Action
`Tab` / `Shift+Tab`	Cycle panel focus
`Left` / `Right`	Adjust alpha
`Up` / `Down`	Navigate vectors / probes
`Enter`	Toggle vector on/off
`Backspace` / `Delete`	Remove selected vector or probe
`Ctrl+T`	Toggle thinking mode
`Ctrl+A`	A/B compare (steered vs. unsteered)
`Ctrl+R`	Regenerate last response
`Ctrl+S`	Cycle trait sort mode
`Ctrl+Y`	Per-token probe highlighting
`[` / `]`	Adjust temperature
`{` / `}`	Adjust top-p
`Escape`	Stop generation
`Ctrl+Q`	Quit

Chat commands

Command	Description
`/steer <name> [alpha]`	Extract and register a steering vector
`/alpha <name> <val>`	Adjust an already-registered vector's alpha
`/unsteer <name>`	Remove a registered vector
`/probe <name>`	Add a monitoring probe (seeds per-token highlight)
`/unprobe <name>`	Remove a monitoring probe
`/compare <a> [b]`	Cosine similarity (1-arg: ranked vs all; 2-arg: pairwise)
`/extract <name>`	Extract to disk without wiring
`/regen`	Regenerate the last assistant turn
`/clear`	Clear conversation history
`/rewind`	Undo last exchange
`/sys <prompt>`	Set system prompt
`/temp <v>` / `/top-p <v>` / `/max <n>`	Sampling defaults
`/seed [n\|clear]`	Default sampling seed
`/save <name>` / `/load <name>`	Snapshot/restore conversation + alphas
`/export <path>`	JSONL with per-token probe readings
`/model`	Model + device + active state
`/why`	Top layers + tokens for selected probe
`/help`	List commands and keybindings

Python API

from saklas import SaklasSession, SamplingConfig, Steering, Profile, DataSource, ResultCollector

with SaklasSession.from_pretrained("google/gemma-3-4b-it", device="auto") as session:
    name, profile = session.extract("angry.calm")   # bundled bipolar pack; returns Profile
    session.steer(name, profile)                    # register (no alpha yet)

    result = session.generate(
        "What makes a good day?",
        steering={name: 0.2},
        sampling=SamplingConfig(temperature=0.7, max_tokens=256, seed=42),
    )
    print(result.text)
    print(result.readings)                          # live probe readings

    # Scoped steering with pole resolution
    with session.steering({"calm": 0.4}):           # bare pole → angry.calm @ -0.4
        print(session.generate("Describe a rainy afternoon.").text)

    # Compare vectors
    other_name, other_profile = session.extract("happy.sad")
    print(profile.cosine_similarity(other_profile))                  # aggregate
    print(profile.cosine_similarity(other_profile, per_layer=True))  # per-layer

    # Alpha sweep
    collector = ResultCollector()
    for alpha in [-0.2, -0.1, 0, 0.1, 0.2]:
        session.clear_history()
        r = session.generate("Describe a sunset.", steering={name: alpha})
        collector.add(r, alpha=alpha)
    collector.to_csv("sweep.csv")

Registration is state, steering is per-call. session.steer("name", profile) stores the vector. session.generate(input, steering={"name": 0.5}) applies it for that generation only. No persistent hooks. Omit steering for a clean baseline.

Composition is native. Pass multiple names in steering={}; nested with session.steering(...) blocks flatten with inner-wins semantics.

Sampling is per-call via SamplingConfig: temperature, top_p, top_k, max_tokens, seed, stop, logit_bias, presence_penalty, frequency_penalty, logprobs.

Thinking mode auto-detects for models that support it (Qwen 3.5, QwQ, Gemma 4, gpt-oss). Delimiters are detected from the chat template, no hardcoded tokens.

Events. session.events is a synchronous EventBus. Subscribe to VectorExtracted, SteeringApplied, SteeringCleared, ProbeScored, GenerationStarted, GenerationFinished.

SaklasSession reference

session = SaklasSession.from_pretrained(
    model_id, device="auto", quantize=None, probes=None,
    system_prompt=None, max_tokens=1024,
)

# Extraction
name, profile = session.extract("curiosity")                # fresh monopolar
name, profile = session.extract("angry.calm")               # bundled bipolar
name, profile = session.extract("happy", baseline="sad")    # explicit
name, profile = session.extract(DataSource.csv("pairs.csv"))

# Persona cloning
name, profile = session.clone_from_corpus("transcripts.txt", "hunter", n_pairs=90)

# Registry
session.steer("name", profile)
session.unsteer("name")

# Generation
result = session.generate("prompt", steering={"name": 0.5},
                          sampling=SamplingConfig(temperature=0.8))
for tok in session.generate_stream("prompt", steering={"name": 0.5}):
    print(tok.text, end="", flush=True)

# Scoped steering
with session.steering({"wolf": 0.5}):     # -> deer.wolf @ -0.5
    session.generate("prompt")

# Vector comparison
similarity = profile.cosine_similarity(other_profile)
per_layer = profile.cosine_similarity(other_profile, per_layer=True)

# Monitor
session.probe("honest")
session.unprobe("honest")

# State
session.history; session.last_result; session.last_per_token_scores
session.stop(); session.rewind(); session.clear_history()

GenerationResult

result.text              # decoded output (thinking is separate)
result.tokens            # token IDs
result.token_count; result.tok_per_sec; result.elapsed
result.finish_reason     # "stop" | "length" | "stop_sequence"
result.vectors           # {"angry.calm": 0.2} — alphas snapshot
result.readings          # {"probe_name": ProbeReadings}
result.to_dict()

API server

saklas serve speaks both OpenAI /v1/* and Ollama /api/* on the same port. Works with the OpenAI Python/JS SDKs, LangChain, Open WebUI, Enchanted, Msty, ollama-python, or anything that talks either wire format.

pip install saklas[serve]
saklas serve google/gemma-2-9b-it --steer cheerful:0.2 --port 8000

OpenAI SDK

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

resp = client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"steering": {"cheerful": 0.4}},    # per-request override
)

Ollama

Point any Ollama client at http://localhost:8000 and it works. Steering goes through the steer field in options:

curl -N http://localhost:8000/api/chat -d '{
  "model": "gemma2",
  "messages": [{"role": "user", "content": "Write me a haiku."}],
  "options": {"steer": {"cheerful": 0.3, "formal.casual": -0.2}}
}'

Saklas-native routes

/saklas/v1/* resource tree with sessions, vector/probe management, one-shot probe scoring, and a bidirectional WebSocket for token+probe co-streaming. Full interactive docs at http://localhost:8000/docs.

Flags

Flag	Default	Description
`model`	required	HuggingFace ID or local path
`-H`, `--host`	`0.0.0.0`	Bind address
`-P`, `--port`	`8000`	Bind port
`-S`, `--steer`	—	Pre-load a vector, repeatable. `name:alpha`
`-C`, `--cors`	—	CORS origin, repeatable
`-k`, `--api-key`	None	Bearer auth. Falls back to `$SAKLAS_API_KEY`.

Not supported: tool calling, strict JSON mode, embeddings. Designed for trusted networks — see SECURITY.md.

Concept packs

All state under ~/.saklas/ (override via SAKLAS_HOME). Each concept is a folder with pack.json, statements.json, and per-model tensors (safetensors or GGUF). Packs are distributed as HuggingFace model repos.

Pack-less install handles repos with no pack.json — repeng-style GGUF-only control-vector repos install with zero prep: saklas pack install jukofyork/creative-writing-control-vectors-v3.0.

Pack management

saklas pack install <target> [-s] [-a NS/NAME] [-f]
saklas pack refresh <selector> [-m MODEL]
saklas pack clear <selector> [-m MODEL] [-y]
saklas pack rm <selector> [-y]
saklas pack ls [selector] [-j] [-v]
saklas pack search <query> [-j] [-v]
saklas pack push <selector> [-a OWNER/NAME] [-pm MODEL] [-snt] [-d] [-f]
saklas pack export gguf <selector> [-m MODEL] [-o PATH] [--model-hint HINT]

Vector operations

saklas vector extract <concept> | <pos> <neg> [-m MODEL] [-f]
saklas vector merge <name> <components> [-m] [-f] [-s]
saklas vector clone <corpus-file> -N NAME [-m MODEL] [-n N_PAIRS] [--seed S] [-f]
saklas vector compare <concepts...> -m MODEL [-v] [-j]

Selectors: <name>, <ns>/<name>, tag:<tag>, namespace:<ns>, default, all. Bare names resolve cross-namespace and error on ambiguity.

Supported architectures

Tested: Qwen, Gemma, Ministral, gpt-oss, Llama, GLM.

Wired up but untested: Mistral, Mixtral, Phi 1–3, PhiMoE, Cohere 1–2, DeepSeek V2–V3, StarCoder2, OLMo 1–3 + OLMoE, Granite + GraniteMoE, Nemotron, StableLM, GPT-2 / Neo / J / BigCode / NeoX, Bloom, Falcon / Falcon-H1, MPT, DBRX, OPT, Recurrent Gemma.

Adding a new architecture is one function entry. See CONTRIBUTING.md.

Tests

pytest tests/                      # everything
pytest tests/test_server.py        # CPU-only
pytest tests/test_smoke.py         # GPU required

GPU tests download google/gemma-3-4b-it (~8 GB) on first run. Works on CUDA and Apple Silicon MPS.

Contributing and security

See CONTRIBUTING.md for dev setup and the walkthrough for adding architectures. Security: SECURITY.md.

License

AGPL-3.0-or-later. See LICENSE.

If you use saklas in published research, please cite the Representation Engineering paper (Zou et al., 2023) and — if you want to be thorough about prior art — repeng.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a9lim

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.5

Apr 18, 2026

1.4.4

Apr 18, 2026

1.4.3

Apr 17, 2026

1.4.2

Apr 17, 2026

This version

1.4.1

Apr 16, 2026

1.4.0

Apr 16, 2026

1.3.1

Apr 14, 2026

1.3.0

Apr 14, 2026

1.2.0

Apr 13, 2026

1.1.2

Apr 13, 2026

1.1.1

Apr 13, 2026

1.1.0

Apr 13, 2026

1.0.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saklas-1.4.1.tar.gz (292.1 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saklas-1.4.1-py3-none-any.whl (283.4 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file saklas-1.4.1.tar.gz.

File metadata

Download URL: saklas-1.4.1.tar.gz
Upload date: Apr 16, 2026
Size: 292.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saklas-1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`a7af29d0db0410cb8bf982fd3268f2767167cc711e0ca93fd754c60ac91285d1`
MD5	`7aa2442343712e12a9a05947c3b9f1e6`
BLAKE2b-256	`35ca877eff43b8229d98762c3709a51fe22e4db54a726dce63e7cad036998285`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saklas-1.4.1.tar.gz:

Publisher: release.yml on a9lim/saklas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saklas-1.4.1.tar.gz
- Subject digest: a7af29d0db0410cb8bf982fd3268f2767167cc711e0ca93fd754c60ac91285d1
- Sigstore transparency entry: 1316915106
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: a9lim/saklas@1b802fc8a6d2cf3699573732a5c4a462f12c3367
- Branch / Tag: refs/heads/main
- Owner: https://github.com/a9lim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1b802fc8a6d2cf3699573732a5c4a462f12c3367
- Trigger Event: push

File details

Details for the file saklas-1.4.1-py3-none-any.whl.

File metadata

Download URL: saklas-1.4.1-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 283.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saklas-1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3b843fe8e1d8d65274126a0e40bc20dc479076e1a19641447f137911c9c137f`
MD5	`9f378ee3753cbc327f8f6a41b3461d13`
BLAKE2b-256	`b60cf75f977e0bfe3e47bc0c3f95c3e3d2b1613836e1e77060cbb36ad483ac2b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saklas-1.4.1-py3-none-any.whl:

Publisher: release.yml on a9lim/saklas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saklas-1.4.1-py3-none-any.whl
- Subject digest: b3b843fe8e1d8d65274126a0e40bc20dc479076e1a19641447f137911c9c137f
- Sigstore transparency entry: 1316915115
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: a9lim/saklas@1b802fc8a6d2cf3699573732a5c4a462f12c3367
- Branch / Tag: refs/heads/main
- Owner: https://github.com/a9lim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1b802fc8a6d2cf3699573732a5c4a462f12c3367
- Trigger Event: push

saklas 1.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

saklas

Credits and prior art

Quick start

Install

How it works

Steering vectors

Custom concepts

Trait monitor

Vector comparison

The probe library

Terminal UI

Flags

Keybindings

Chat commands

Python API

SaklasSession reference

GenerationResult

API server

OpenAI SDK

Ollama

Saklas-native routes

Flags

Concept packs

Pack management

Vector operations

Supported architectures

Tests

Contributing and security

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance