Activation steering and trait monitoring for HuggingFace transformers — Python library, OpenAI-compatible server, and terminal UI

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a9lim

These details have not been verified by PyPI

Project description

saklas

Activation steering and trait monitoring for HuggingFace transformer models. Extract steering vectors from contrastive pairs, apply them during generation with per-call alpha control, and watch how activations shift across behavioral probes — all without touching model weights.

Three frontends over one engine:

saklas <model> — interactive terminal UI for exploring vectors, live probe readings, and A/B comparison
saklas serve <model> — OpenAI-compatible HTTP server; drop-in for the OpenAI SDK, LangChain, curl
SaklasSession — Python library for scripted experiments, batch sweeps, and embedding steering into your pipelines

53 architectures supported out of the box. Steering vectors compose. Alphas are per-call — no persistent hooks, no model mutation. Probe history accumulates across generations.

Quick start

pip install saklas
saklas google/gemma-3-4b-it

That's the whole thing. The first run downloads the model, extracts the 21 bundled probes against it (a one-time cost, cached to disk), and drops you into the TUI. Hit /steer angry 0.3 — saklas resolves that to the bundled angry.calm axis with α=+0.3 so the model leans toward the angry pole. Type /steer calm 0.3 and you get the same vector at α=−0.3. Or [ / ] to nudge temperature, or Ctrl+A to A/B compare the steered output against the unsteered baseline.

Want to try it as an API server instead?

pip install saklas[serve]
saklas serve google/gemma-3-4b-it --steer cheerful:0.2

Or from Python:

from saklas import SaklasSession

with SaklasSession("google/gemma-3-4b-it") as s:
    name, profile = s.extract("angry.calm")          # bundled bipolar pack
    s.steer(name, profile)
    print(s.generate("What makes a good day?", alphas={name: 0.3}).text)

Install

pip install saklas             # library + TUI
pip install saklas[serve]      # + FastAPI/uvicorn for the API server
pip install saklas[research]   # + datasets/pandas for dataset loading and DataFrame export

Requires Python 3.11+ and PyTorch 2.2+. Runs on Linux, macOS, and Windows. CPU works but is slow — CUDA or Apple Silicon MPS is recommended for anything interactive.

Quantization (experimental). The bnb and cuda extras pull in bitsandbytes and flash-attn for 4-bit/8-bit loading and fused attention. These depend on platform-specific CUDA toolchains and don't build cleanly everywhere; only the vanilla install is officially supported.

pip install saklas[bnb]        # bitsandbytes only
pip install saklas[cuda]       # bitsandbytes + flash-attn (Linux + CUDA_HOME required)

From source.

git clone https://github.com/a9lim/saklas
cd saklas
pip install -e ".[dev]"        # + pytest

How it works

Steering vectors

Saklas uses Representation Engineering (Zou et al., 2023): for each contrastive pair, capture the last-content-token hidden state at every layer, diff the positive and negative sides, and take the first principal component per layer via SVD. Every layer gets a direction and a score (explained variance ratio); there is no manual layer selection.

Alpha is normalized per-profile so the same numeric value means the same intensity across backbones: α≈0.5 sits in the coherent-nuanced band on every bundled architecture, α≈1.0 is past the collapse cliff. Vectors are registered without alphas and applied per-call, so nothing persists on the model between generations.

Multiple vectors compose naturally — they register into a single manager that, per generation, installs a single in-place hidden-state hook per active layer (co-layer directions sum). Hooks are transient: composed before generation, removed after.

Custom concepts

When you steer on a concept that isn't in the curated probe library, the loaded model writes its own contrastive pairs. Generation is seeded across multiple "specificity lenses" (unique facts, physical traits, social dynamics, inner life, routines) and the prompt explicitly rejects generic pairs that could apply to anything similar. Pairs cache at ~/.saklas/vectors/local/<concept>/statements.json and are model-independent, so they're reused across models.

This means /steer "anything" works — religions, animals, fictional characters, "man who ate too much spaghetti." The vector captures what's distinctive about the concept, not generic associations.

Trait monitor

After each generation, saklas runs a separate forward pass over the generated text, pools hidden states from the last content token (matching probe extraction), mean-centers them against a cached per-layer baseline (computed from 45 neutral prompts), and scores against each probe via score-weighted cosine similarity. History accumulates across generations, enabling sparklines and running statistics.

Layer means are cached at ~/.saklas/models/<safe_model_id>/layer_means.safetensors and auto-invalidate when ~/.saklas/neutral_statements.json changes hash.

Probe library

21 probes across 6 categories, each backed by 45 curated contrastive pairs (topically disjoint, not minimal-word-swap — see CLAUDE.md for the generation discipline). Most probes are bipolar: the name carries both poles (angry.calm, masculine.feminine), the positive pole activates on α>0 and the negative pole on α<0. Monopolar probes have no named opposite.

Category	Probes
Affect	angry.calm, fearful.brave, happy.sad
Epistemic	confident.uncertain, honest.deceptive, hallucinating.grounded
Alignment	agentic, refusal.compliant, sycophantic.blunt, manipulative
Register	formal.casual, direct.indirect, verbose.concise, creative.conventional
Social stance	authoritative.submissive, hierarchical.egalitarian, high_context.low_context
Cultural	masculine.feminine, western.eastern, religious.secular, traditional.progressive

Bipolar probes are extracted from Speaker A IS X / Speaker B IS Y contrastive pairs, so the negative direction is a real coherent pole rather than "absence of X." /steer angry - calm and /steer angry.calm resolve to the same vector — each pole is slugged independently (non-alphanumerics collapse to _), joined by the bipolar separator ..

The bundled pairs are generated by saklas itself. scripts/regenerate_bundled_statements.py loads a capable instruct model (gemma-4-31b-it by default) and calls the same SaklasSession.generate_pairs pipeline the TUI uses when you /steer a novel concept — same system prompt, same five-domain seeds, same parser. Shipping the pack this way is both a calibration target and an end-to-end demonstration: the on-model generation path is robust enough that it's what populates saklas/data/vectors/ in the first place. Regenerate with python scripts/regenerate_bundled_statements.py --purge and the whole bundled pack is rebuilt from scratch by the same code you run every day.

Pole aliasing. Typing a single-pole name resolves to the full composite with the correct sign flip: /steer angry 0.5 is an alias for /steer angry.calm 0.5, and /steer calm 0.5 is an alias for /steer angry.calm -0.5. This works for any installed bipolar pack — bundled, HF-pulled, or user-authored — so /steer bob/wolf 0.4 resolves to bob/deer.wolf at α=-0.4 if that's what's installed. Collisions (e.g. alice/angry exists alongside default/angry.calm) raise the same ambiguity error as any namespace collision; disambiguate with ns/name.

Probes extract on first run against a new model and cache to ~/.saklas/vectors/default/<concept>/<safe_model_id>.safetensors.

Supported architectures

53 families via model.py:_LAYER_ACCESSORS. Adding a new one = one function entry. See CONTRIBUTING.md.

Llama (1–4), Mistral, Ministral, Mixtral, Gemma (1–4), Phi (1–3), PhiMoE, Qwen (1–3.5), Qwen-MoE variants, Cohere (1–2), DeepSeek (V2–V3), StarCoder2, OLMo (1–3), OLMoE, GLM (3–4), Granite, GraniteMoE, Nemotron, StableLM, GPT-2/Neo/J/BigCode/NeoX/OSS, Bloom, Falcon, Falcon-H1, MPT, DBRX, OPT, RecurrentGemma.

Terminal UI

saklas google/gemma-2-9b-it
saklas mistralai/Mistral-7B-Instruct-v0.3 -q 4bit
saklas meta-llama/Llama-3.1-8B-Instruct -p affect register

Layout

+-------------------------+----------------------------+------------------------+
|  VECTORS                |                            |  TRAIT MONITOR         |
|  > angry.calm  +0.30    |          Chat              |  Affect                |
|    formal_cas  +0.10    |                            |    angry.calm #### .42 |
|                         |                            |    happy.sad  ##- -.15 |
|  CONFIG                 |                            |  Epistemic             |
|  temp ####- 0.7         |                            |    honest_dec ### .31 |
|  top-p #### 0.9         |                            |                        |
|                         |  Type a message...         |                        |
+-------------------------+----------------------------+------------------------+

Three panels: the vector registry on the left (with live alpha knobs), the chat on the center, the trait monitor on the right (sparklines per probe, sorted by current magnitude or delta). Tab cycles focus; arrow keys navigate and adjust.

TUI flags

Flag	Description
`model`	HuggingFace ID or local path (optional if supplied by `-c`)
`-q`, `--quantize`	`4bit` or `8bit` (CUDA only)
`-d`, `--device`	`auto` (default), `cuda`, `mps`, `cpu`
`-p`, `--probes`	Categories: `all`, `none`, `affect`, `epistemic`, `alignment`, `register`, `social_stance`, `cultural`
`-c`, `--config`	Load setup YAML (repeatable; later files override earlier)
`-s`, `--strict`	With `-c`: fail on missing vectors instead of warning

System prompt, temperature, top-p, and max tokens are set interactively via slash commands — see below.

Keybindings

Key	Action
`Tab` / `Shift+Tab`	Cycle panel focus
`Left` / `Right`	Adjust alpha
`Up` / `Down`	Navigate vectors / probes
`Enter`	Toggle vector on/off
`Backspace` / `Delete`	Remove selected vector or probe
`Ctrl+T`	Toggle thinking mode (for models that support it)
`Ctrl+A`	A/B compare (steered vs unsteered)
`Ctrl+R`	Regenerate last response
`Ctrl+S`	Cycle trait sort mode
`Ctrl+Y`	Toggle per-token probe highlighting (uses current trait selection)
`[` / `]`	Adjust temperature
`{` / `}`	Adjust top-p
`Escape`	Stop generation
`Ctrl+Q`	Quit

Chat commands

Command	Description
`/steer "concept" [alpha]`	Extract and register a steering vector
`/steer "concept" - "baseline" [alpha]`	Contrastive steering against a baseline concept
`/probe "concept"`	Add a monitoring probe
`/probe "concept" - "baseline"`	Contrastive probe
`/clear`	Clear conversation history
`/rewind`	Undo last exchange
`/sys <prompt>`	Set system prompt
`/temp <value>`	Set temperature
`/top-p <value>`	Set top-p
`/max <value>`	Set max tokens per generation

Commands that touch the model or modify history (/steer, /probe, /clear, /rewind) interrupt any in-progress generation and execute once it stops. Sending a new message mid-generation also interrupts and submits immediately.

Python API

from saklas import SaklasSession, DataSource, ResultCollector

with SaklasSession("google/gemma-3-4b-it", device="auto") as session:
    # Load the bundled angry.calm bipolar pack
    name, profile = session.extract("angry.calm")
    session.steer(name, profile)             # register (no alpha yet)

    # Generate with steering (positive α = angry pole, negative α = calm pole)
    result = session.generate(
        "What makes a good day?",
        alphas={name: 0.2},
    )
    print(result.text)
    print(result.readings)                   # probe monitor data

    # A/B comparison — omit alphas to get the unsteered baseline
    baseline = session.generate("What makes a good day?")

    # Alpha sweep across both poles
    collector = ResultCollector()
    for alpha in [-0.2, -0.1, 0, 0.1, 0.2]:
        session.clear_history()
        result = session.generate("Describe a sunset.", alphas={name: alpha})
        collector.add(result, alpha=alpha)
    collector.to_csv("sweep.csv")

Runnable examples in examples/:

sweep_alpha.py — sweep one vector's alpha and dump probe readings
ab_compare.py — A/B a prompt with and without steering

Key concepts

Registration is state, alphas are per-call. session.steer("name", profile) stores the vector in the registry. session.generate(input, alphas={"name": 0.5}) applies it for that generation only. No persistent hooks live on the model between calls.

Composition is native. Pass multiple names in alphas={}; co-layer directions sum into a single in-place hook per layer.

Thinking mode is per-call. For models that support it (Qwen 3.5, QwQ, Gemma 4, gpt-oss, etc.), session.generate(input, thinking=True) enables the reasoning trace. Delimiters are detected automatically from the chat template — no hardcoded tokens. result.text contains only the final answer; streaming yields TokenEvent objects with thinking=True for the reasoning trace.

Alphas are backbone-normalized. The same numeric value means the same intensity across architectures. Start at 0.1–0.3 for subtle nudges, 0.4–0.6 for clear shifts, and treat anything past 0.8 as a coherence experiment.

_, ac = session.extract("angry.calm")
_, fc = session.extract("formal.casual")
session.steer("angry.calm", ac)
session.steer("formal.casual", fc)

session.generate("Hello.", alphas={"angry.calm": 0.2, "formal.casual": 0.1})  # both
session.generate("Hello.", alphas={"angry.calm": -0.2})                        # steer toward calm
session.generate("Hello.")                                                    # neither

SaklasSession reference

session = SaklasSession(
    model_id,                # HuggingFace ID or local path
    device="auto",           # "auto", "cuda", "mps", "cpu"
    quantize=None,           # "4bit", "8bit", or None
    probes=None,             # list of categories, or None for all
    system_prompt=None,
    max_tokens=1024,
)

# Vector extraction — returns (canonical_name, profile). For bipolar
# extraction the canonical name is f"{pos}.{neg}"; each pole is slugged
# (hyphens and whitespace collapsed to underscores) and joined with the
# bipolar separator `.`.
name, profile = session.extract("curiosity")                     # fresh monopolar (generates pairs)
name, profile = session.extract("angry.calm")                    # bundled bipolar pack
name, profile = session.extract("happy", baseline="sad")         # explicit bipolar → "happy.sad"
name, profile = session.extract([("pos", "neg"), ...])           # raw pairs
name, profile = session.extract(DataSource.csv("pairs.csv"))
session.save_profile(profile, "out.safetensors")
profile = session.load_profile("out.safetensors")

pairs = session.generate_pairs("curiosity")                # list[(str, str)]

# Registry
session.steer("name", profile)
session.unsteer("name")
session.vectors                                            # dict of registered profiles

# Generation (blocking)
result = session.generate(
    "prompt",
    alphas={"name": 0.5},
    thinking=False,
    seed=None,
    stop=None,
    logprobs=None,
)

# Streaming
for tok in session.generate_stream("prompt", alphas={"name": 0.5}):
    print(f"[think] {tok.text}" if tok.thinking else tok.text, end="", flush=True)

# Monitor
session.monitor("honest")
session.monitor("custom", custom_profile)
session.unmonitor("honest")

# State
session.config.temperature = 0.8    # also top_p, max_new_tokens, system_prompt
session.history                     # conversation messages
session.last_result                 # most recent GenerationResult
session.stop()                      # interrupt generation
session.rewind()                    # drop last exchange
session.clear_history()

GenerationResult

result.text              # decoded output (response only — thinking is separate)
result.tokens            # token IDs
result.token_count
result.tok_per_sec
result.elapsed
result.finish_reason     # "stop" | "length" | "stop_sequence"
result.vectors           # {"angry.calm": 0.2} — snapshot of alphas used
result.readings          # {"probe_name": ProbeReadings} if probes active
result.to_dict()         # JSON-serializable

DataSource formats

from saklas import DataSource

DataSource.curated("angry.calm")                             # bundled
DataSource.json("pairs.json")                                # saklas schema
DataSource.csv("pairs.csv", positive_col="pos", negative_col="neg")
DataSource.huggingface("user/dataset", split="train[:100]")  # needs datasets
DataSource(pairs=[("positive", "negative")])

ResultCollector

collector = ResultCollector()
collector.add(result, concept="angry.calm", alpha=0.2, run_id=1)

collector.to_dicts()
collector.to_jsonl("results.jsonl")
collector.to_csv("results.csv")
collector.to_dataframe()           # needs pandas

Probe readings flatten to columns: probe_honest.deceptive_mean, probe_honest.deceptive_std, etc. Vector alphas flatten to vector_angry.calm_alpha.

OpenAI- and Ollama-compatible API server

Serve a steered model as an HTTP endpoint speaking both the OpenAI /v1/* protocol and the Ollama /api/* protocol on the same port. Works with the OpenAI Python/JS SDKs, LangChain, LlamaIndex, curl, Open WebUI, Enchanted, Msty, ollama-python, LangChain's ChatOllama, or anything that speaks either wire format.

pip install saklas[serve]
saklas serve google/gemma-2-9b-it --steer cheerful:0.2 --port 8000

With the OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

# Server-default steering (--steer cheerful:0.2)
resp = client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

# Override steering per-request via extra_body
resp = client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "steer": {
            "alphas": {"cheerful": 0.4},
            "thinking": True,
        }
    },
)

# Streaming
for chunk in client.chat.completions.create(
    model="google/gemma-2-9b-it",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Serve flags

Flag	Default	Description
`model`	required	HuggingFace ID or local path
`-H`, `--host`	`0.0.0.0`	Bind address
`-P`, `--port`	`8000`	Bind port
`-q`, `--quantize`	None	`4bit` or `8bit`
`-d`, `--device`	`auto`	`auto`, `cuda`, `mps`, `cpu`
`-p`, `--probes`	`all`	Probe categories to bootstrap
`-S`, `--steer`	—	Pre-load a vector, repeatable. `name:alpha` or `name`
`-C`, `--cors`	—	CORS origin, repeatable
`-k`, `--api-key`	None	Bearer auth token. Falls back to `$SAKLAS_API_KEY`. Unset = open.

Endpoints

OpenAI-compatible

GET /v1/models, GET /v1/models/{id}
POST /v1/chat/completions (streaming + non-streaming)
POST /v1/completions (streaming + non-streaming)

Vector management

GET /v1/saklas/vectors
POST /v1/saklas/vectors/extract (streams progress via SSE)
POST /v1/saklas/vectors/load
DELETE /v1/saklas/vectors/{name}

Probe management

GET /v1/saklas/probes
GET /v1/saklas/probes/defaults
POST /v1/saklas/probes/{name}
DELETE /v1/saklas/probes/{name}

Session management

GET /v1/saklas/session
PATCH /v1/saklas/session — update temperature, top_p, max_tokens, system_prompt
POST /v1/saklas/session/clear
POST /v1/saklas/session/rewind

Full interactive docs at http://localhost:8000/docs while the server is running.

OpenAI parity and limits

Chat/completions accept the full standard parameter surface: stop (string or list), seed, logit_bias, presence_penalty, frequency_penalty, logprobs + top_logprobs, stream_options.include_usage, max_completion_tokens, plus accept-and-ignore for user, n, response_format, and messages[].name. Responses include real usage counts, accurate finish_reason, and the first streaming chunk emits {role: "assistant"} per OpenAI convention. Error responses follow the OpenAI shape with type/param/code fields.

Probe readings piggyback as an extra probe_readings field in generation responses — standard clients ignore it, aware clients get inline monitoring data.

The server is stateless by default — each request carries its full message list, and neither conversation history nor probe accumulators persist across requests. The /v1/saklas/session/* routes are stateful by design for single-user workflows. Concurrent requests queue FIFO against a single generation lock.

Not supported: tool calling, strict JSON/json_schema mode, /v1/embeddings.

Ollama protocol (`/api/*`)

Point any Ollama client at http://localhost:8000 and it just works — no config shim, no proxy. saklas serve mounts the full Ollama route surface alongside the OpenAI routes on the same port, sharing one generation lock, one bearer-auth dependency, and one underlying session.

saklas serve google/gemma-2-9b-it --steer cheerful:0.2 --port 8000

# Open WebUI / Enchanted / any Ollama client: point at http://localhost:8000
# The loaded model appears under both its HF id (google/gemma-2-9b-it) and
# its Ollama alias (gemma2, gemma2:latest, gemma2:9b) in /api/tags.

# Raw curl — NDJSON streaming, matches Ollama wire format exactly
curl -N http://localhost:8000/api/chat -d '{
  "model": "gemma2",
  "messages": [{"role": "user", "content": "Write me a haiku."}],
  "options": {
    "temperature": 0.8,
    "top_k": 50,
    "repeat_penalty": 1.1,
    "steer": {"cheerful": 0.3, "formal.casual": -0.2}
  }
}'

Steering through Ollama clients. The non-standard steer field inside options carries saklas alphas — clients that don't know about it leave it alone, clients that want it get per-request control. Both flat ({"steer": {"name": alpha}}) and nested ({"steer": {"alphas": {...}, "thinking": true}}) forms are accepted. Merged over any server-side --steer defaults; zero-alphas are stripped.

Advertised endpoints: /api/version, /api/tags, /api/ps, /api/show, /api/chat, /api/generate, /api/pull (no-op success for the loaded model, 404 otherwise), HEAD / for liveness.

Option translation. temperature, top_p, top_k, seed, num_predict, stop, presence_penalty, frequency_penalty, repeat_penalty, and think all pipe through to the underlying session. repeat_penalty maps to saklas's presence_penalty via ln(repeat_penalty) — exact for positive logits, matching Ollama's "divide by penalty" semantics without the unbounded count weighting that plain frequency_penalty would introduce. Unrecognized options (min_p, mirostat*, num_ctx, typical_p, etc.) are logged at debug level and silently dropped.

Model aliasing. A saklas server hosts exactly one model. /api/tags advertises it under its HF id plus a hybrid alias set: an authoritative override table for popular families (where Ollama's catalogue rounds sizes differently — Gemma-2-2b is 2.6B params but Ollama calls it gemma2:2b), with <family>:<size> inference from model_info as a fallback for new architectures. By default the model field on incoming requests is accepted regardless of match, so clients with stale dropdowns don't 404 — set SAKLAS_OLLAMA_STRICT=1 to reject mismatches with a 404 instead.

Thinking. Streams as message.thinking on /api/chat and top-level thinking on /api/generate, matching Ollama's current schema. Open WebUI renders it as a collapsible reasoning panel automatically.

Not supported (Ollama protocol): /api/push, /api/create, /api/copy, /api/delete, /api/embeddings, /api/embed (all return 501). Saklas doesn't manage models the Ollama way — it loads one HF model at startup and serves it. The context field on /api/generate responses is omitted (not an empty list) because saklas can't round-trip Ollama's tokenized continuation state honestly.

The server is designed for trusted networks — see SECURITY.md for the threat model before exposing it beyond your local machine.

Managing concept packs

Saklas stores all state under ~/.saklas/ (override via SAKLAS_HOME):

~/.saklas/
  neutral_statements.json                  # user-editable (copy-on-miss from package)
  vectors/
    default/<concept>/                     # bundled probes
    local/<concept>/                       # user-authored + merged
    <hf_owner>/<concept>/                  # HF-pulled
  models/<safe_model_id>/layer_means.safetensors

Each concept is a folder with pack.json (metadata + file hashes), statements.json (the contrastive pairs), and zero or more <safe_model_id>.safetensors tensor files (one per model the concept has been extracted against). Tensors are extracted lazily — a pack without tensors is fine; it'll extract on first use.

Packs are distributed as HuggingFace model repos (not datasets — safetensors is model-hub-native, and base_model frontmatter gives reverse-link discoverability from the base model's hub page). Pin any install to a git tag, branch, or commit SHA with @revision; pinned installs are preserved on refresh — pinning means pinning.

Commands

saklas install <target> [-s] [-a NS/NAME] [-f]   # from HF coord (ns/name[@rev]) or folder path
saklas refresh <selector> [-m MODEL]              # re-pull from source
saklas refresh neutrals                           # reserved: rewrite neutral_statements.json
saklas clear <selector> [-m MODEL] [-y]           # delete per-model tensors, keep statements
saklas uninstall <selector> [-y]                  # fully remove concept folder
saklas list [selector] [-i] [-j] [-v]             # includes HF hub by default
saklas merge <name> <components> [-m] [-f] [-s]   # merge: saklas merge bard default/angry.calm:0.3,user/arch:0.4

Selectors (shared grammar): <name>, <ns>/<name>, tag:<tag>, namespace:<ns>, default, all. Bare names resolve across namespaces and error on ambiguity.

install -s / --statements-only keeps only statements.json and drops any tensors that arrived with the pack. The concept folder stays a legitimate standalone pack — tensors re-extract on first use against whatever model you load. Useful when you want the pairs but prefer to extract locally.

refresh neutrals is a reserved form that overwrites ~/.saklas/neutral_statements.json with the bundled package copy. Run this after upgrading across a release that changes the bundled neutrals — materialize_bundled is copy-on-miss so existing users keep their old file by default. Layer means auto-recompute on next session init via the hash check.

clear vs uninstall: clear deletes tensors but keeps statements.json and pack.json (so the concept remains selectable and will re-extract on demand). uninstall removes the whole folder. Uninstalling a bundled concept is allowed — it respawns on the next session init via materialize_bundled. Broad selectors (all, namespace:) require -y on both commands.

list queries the HF hub by default and merges results with local installs. Pass -i for installed-only, -j for JSON output, -v to include descriptions inline.

Python library

All of the above is also available programmatically:

from saklas import cache_ops
from saklas.cli_selectors import parse as sel_parse

cache_ops.install("a9lim/angry.calm@v1.2", as_=None, force=False, statements_only=False)
cache_ops.refresh(sel_parse("tag:affect"), model_scope="google/gemma-2-9b-it")
cache_ops.delete_tensors(sel_parse("angry.calm"), model_scope=None)
cache_ops.uninstall(sel_parse("angry.calm"), yes=False)
cache_ops.list_concepts(sel_parse("tag:affect"), hf=True, installed_only=False)

Tests

pytest tests/                      # everything
pytest tests/test_server.py tests/test_results.py tests/test_datasource.py  # CPU-only
pytest tests/test_smoke.py         # GPU required

GPU tests (test_smoke.py, test_session.py) download google/gemma-3-4b-it (~8 GB) on first run and accept either CUDA or Apple Silicon MPS. Everything else runs anywhere.

Contributing

See CONTRIBUTING.md for dev setup, test layout, and the walkthrough for adding a new architecture. Security issues: SECURITY.md.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a9lim

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.5

Apr 18, 2026

1.4.4

Apr 18, 2026

1.4.3

Apr 17, 2026

1.4.2

Apr 17, 2026

1.4.1

Apr 16, 2026

1.4.0

Apr 16, 2026

1.3.1

Apr 14, 2026

This version

1.3.0

Apr 14, 2026

1.2.0

Apr 13, 2026

1.1.2

Apr 13, 2026

1.1.1

Apr 13, 2026

1.1.0

Apr 13, 2026

1.0.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saklas-1.3.0.tar.gz (229.8 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saklas-1.3.0-py3-none-any.whl (225.1 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file saklas-1.3.0.tar.gz.

File metadata

Download URL: saklas-1.3.0.tar.gz
Upload date: Apr 14, 2026
Size: 229.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saklas-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`f6b6a295a06ce13215f7ea2760a58f1deb19d11a922f0547ef06893e690a0bb9`
MD5	`3d0a63945df3c643147f3f25f8381d09`
BLAKE2b-256	`a27d78ca5eea8721a8e0ad2fef4724bb40542680e965c00f1f26ec6f2016849e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saklas-1.3.0.tar.gz:

Publisher: release.yml on a9lim/saklas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saklas-1.3.0.tar.gz
- Subject digest: f6b6a295a06ce13215f7ea2760a58f1deb19d11a922f0547ef06893e690a0bb9
- Sigstore transparency entry: 1291857628
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: a9lim/saklas@168d701f966ae45e131ecc22d5a2a52b650ae391
- Branch / Tag: refs/heads/main
- Owner: https://github.com/a9lim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@168d701f966ae45e131ecc22d5a2a52b650ae391
- Trigger Event: push

File details

Details for the file saklas-1.3.0-py3-none-any.whl.

File metadata

Download URL: saklas-1.3.0-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 225.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saklas-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`caecdb78293f73fa2a9ca46972d76c500a747194123bc8b2b40faba50daa1a5d`
MD5	`c5dad845739994e0c29fe1fb9a40e11f`
BLAKE2b-256	`076f66a3ff81ab1b509f61a70e6eb0a662be1074ba8f45e7af5d349cb8d598b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saklas-1.3.0-py3-none-any.whl:

Publisher: release.yml on a9lim/saklas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saklas-1.3.0-py3-none-any.whl
- Subject digest: caecdb78293f73fa2a9ca46972d76c500a747194123bc8b2b40faba50daa1a5d
- Sigstore transparency entry: 1291857701
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: a9lim/saklas@168d701f966ae45e131ecc22d5a2a52b650ae391
- Branch / Tag: refs/heads/main
- Owner: https://github.com/a9lim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@168d701f966ae45e131ecc22d5a2a52b650ae391
- Trigger Event: push

saklas 1.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

saklas

Quick start

Install

How it works

Steering vectors

Custom concepts

Trait monitor

Probe library

Supported architectures

Terminal UI

Layout

TUI flags

Keybindings

Chat commands

Python API

Key concepts

SaklasSession reference

GenerationResult

DataSource formats

ResultCollector

OpenAI- and Ollama-compatible API server

With the OpenAI SDK

Serve flags

Endpoints

OpenAI parity and limits

Ollama protocol (/api/*)

Managing concept packs

Commands

Python library

Tests

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Ollama protocol (`/api/*`)