A coding agent with a new context-engineering framework: bounded, deterministic, reconstructed context (slice architecture) — built for long-horizon work.
Project description
sliceagent
A coding agent with a new context-engineering framework, built for long-horizon work. Its core bet is a different memory model from every mainstream agent:
Don't accumulate the transcript — reconstruct a small, deterministic working state every turn.
Mainstream agents accumulate a growing message history and LLM-summarize it when it nears the context window ("transcript + compaction"). sliceagent never accumulates: each turn it rebuilds a bounded Active Memory Slice from ground truth — the live files, the last error (verbatim), a counted action tally, recent actions, and retrieved context — and sends only that.
Contents: Why · What it can do · How it works · Install · Quickstart · Usage · Benchmarks · Under the hood · License
Why
- Bounded by construction — per-turn context stays flat regardless of session length (no grow-to-window sawtooth).
- Faithful — context is re-read from ground truth, not a lossy summary of the conversation.
- Auditable — you can print the exact, small input the model saw each turn and know why it decided.
- Cheap at scale — validated: on long/iterative tasks the slice cut tokens up to ~60–80% and wall-clock ~70% vs a transcript loop, with identical test pass rates.
This is the opposite of the field's default ("bigger windows + summarize"): remember less, reconstruct precisely.
What it can do
sliceagent is an interactive terminal coding agent. Point it at a repo, describe the task in plain language, and it investigates, edits, and verifies.
- Edit code — create, modify, and refactor files. Edits are workspace-confined and reversible with
/undo. - Run commands — execute shell commands, launch background processes, and drive interactive terminals (REPLs, servers,
ssh) through a sandbox —localby default,dockerfor full isolation. - Investigate — grep and search the tree, read line-numbered context, and trace a bug from its live error. Deterministic; no embeddings, no index to stale.
- Search the web — fetch a page or run a keyless search when a task needs current information.
- Delegate — fan out large, decomposable work to subagents, each on its own bounded slice, returning a summary instead of a transcript.
- Extend — add tools via MCP servers, prompt-packs via skills (
SKILL.md), or full plugins — all through one registry. - Remember across sessions — durable lessons are distilled and auto-surfaced when relevant (via memem); park a topic and
/resumeit later. - Stay in control — three permission modes with a hard floor on catastrophic commands; secrets are scrubbed from anything it runs or logs.
How it works — the brain model
sliceagent's memory is organized like a brain: fast, lossy perception of the live world; a small working memory for the current task; a hippocampus that records what just happened; and a neocortex that distills durable lessons. Every turn reconstructs a bounded working set from these — it never replays a growing transcript.
| Region | Module | Role |
|---|---|---|
| Sensory cortex — live perception | sensory_cortex.py |
Re-derives the world each turn: git state, project facts, repo map. Never stored or recalled. |
| Prefrontal cortex — working memory | pfc.py |
The carried Slice: bounded, provenance-tagged state (findings, plan, change-set), sealed at each turn boundary. |
| Hippocampus — episodic memory | hippocampus.py |
Losslessly records each turn; recall_history pages a specific past turn back in on demand. |
| Neocortex — long-term memory | neocortex.py |
Distills successful episodes into durable cross-session lessons, auto-surfaced when relevant. |
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ PFC │ │ Sensory Cortex │ │ Hippocampus │ │ Neocortex │
│ pfc.py │ │ sensory_cortex.py │ │ hippocampus.py │ │ neocortex.py │
│ working memory │ │ live perception │ │ episodic memory │ │ durable lessons │
└─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘
│ │ │ │
└─────────────────────┴──────────┬──────────┴─────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────┐
│ GLOBAL WORKSPACE — this turn's seed │
│ seed.py make_build_slice() / build() │
│ + prompt.py (SYSTEM_PROMPT, stable cache prefix) │
└────────────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────┐
│ LLM turn │
│ tool calls accumulate within-turn │
└───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ PFC updated │
│ pfc.py slice_sink() folds events back │
└─────────────────────────────────────────┘
↻ next turn: the PFC slice carries forward —
everything else re-derives live from disk.
Each turn, seed.py faults in exactly what the turn references — the carried PFC slice, live sensory-cortex views, and any relevant neocortex lessons — and hands the model that bounded Seed. The model acts; observations fold back into working memory; at the turn boundary the episode is sealed into the hippocampus; on success, the neocortex consolidates it into a durable lesson. Net effect: per-turn context stays flat no matter how long the session runs.
Status
Early, but the core bet is validated — see the measured head-to-head benchmarks below. The production build is Python and aligns with memem.
Install
One command — Linux, macOS, WSL2:
curl -fsSL https://raw.githubusercontent.com/TT-Wang/sliceagent/main/install.sh | sh
The installer handles everything: uv, its own Python 3.12, ripgrep, and sliceagent — in an isolated tool env, no sudo, no prerequisites, no conflicts with any Python you already have (conda base at 3.10? Rosetta-Intel conda on an M-series Mac? Doesn't matter). Then:
sliceagent # first run drops you straight into guided setup, then start chatting
Alternative: install from PyPI yourself (you manage the Python — needs ≥ 3.11)
uv tool install --python 3.12 "sliceagent[tui]" # uv — fetches Python itself
pipx install "sliceagent[tui]" # pipx
pip install "sliceagent[tui]" # plain pip (use a venv)
If pip refuses with Requires-Python >=3.11: conda create -n sliceagent python=3.12 -y && conda activate sliceagent, then pip install. ripgrep is recommended (code search degrades gracefully without it).
Footprint is light (no torch). pip install -e . works for a clone too. Homebrew / Docker arrive in v0.2.
Quickstart
sliceagent # that's it — first run walks you through setup (provider, API key, live-tested), then you're in
Setup happens once, in-process, and writes ~/.sliceagent/config.toml (0600) so every later run just starts. Re-configure anytime with sliceagent init (add/switch providers). Prefer env vars? Export both LLM_API_KEY and AGENT_MODEL (plus LLM_BASE_URL for non-OpenAI endpoints) — there is no default model; sliceagent never picks one for you. Discover every setting with sliceagent config --list.
→ Full walkthrough in QUICKSTART.md · CONTRIBUTING.md · CHANGELOG.md
Usage
Run sliceagent in your project and type what you want in plain language. It rebuilds its working context, investigates, edits (auto-applied or confirmed, per your mode), and can run your tests to verify. A turn looks like:
❯ why does retry_with_backoff drop the last attempt? fix it
🔍 grep "retry_with_backoff" 📖 read errors.py:40-72 ✎ edit errors.py
┌─ assistant ─────────────────────────────────────────────┐
│ The loop exits on `attempt == max` before the final │
│ sleep+retry, so the last attempt never runs. Changed the │
│ bound to `attempt <= max` and added a regression test. │
└──────────────────────────────────────────────────────────┘
✓ done · 4 steps · 6.1k tokens
Attach a file or path to your message with @: @src/errors.py explain the backoff.
In-session commands (type /help for the full list):
| Command | What it does |
|---|---|
/model · /reasoning |
switch model / reasoning effort (persists) |
/mode |
permission mode: baby-sitter (confirm each edit + command) · teenager (default; confirm risky ones) · let-it-go (auto-run all but catastrophic) |
/undo |
revert the last edit(s) |
/cwd <path> |
change the workspace root mid-session |
/cost |
tokens and estimated $ spent this session |
/skills · /tools · /mcp · /plugins · /agents |
list what's available to the agent |
/threads · /resume |
switch between, or resume, parked topics |
/learn <note> |
save a durable lesson yourself |
/plan |
draft a plan before it starts editing |
Ctrl-C · exit |
interrupt the turn · quit |
Configuration. sliceagent config --list prints every setting. Set them persistently in ~/.sliceagent/config.toml (written by init), or override any one via an environment variable:
| Setting | Default | Purpose |
|---|---|---|
AGENT_MODEL |
(required) | the model id to run |
AGENT_POLICY |
teenager |
permission mode |
AGENT_SANDBOX |
local |
local or docker (isolated) |
AGENT_MAX_STEPS |
60 |
per-turn step ceiling |
SLICEAGENT_VAULT |
~/.sliceagent/vault |
where episodic memory + task state persist (cross-session memory is on by default) |
AGENT_VERIFY_CMD |
(unset) | test command used as the verification oracle |
Benchmarks
The bet — flat per-turn cost from reconstruction, at capability parity — is measured, not asserted. All runs use gpt-5.5.
The moat: per-turn input stays flat while a transcript grows. Head-to-head vs Kimi Code (a strong transcript-based agent) on hard multi-turn tasks:
| Scenario | sliceagent peak input | Kimi Code peak input | ratio |
|---|---|---|---|
| long-horizon debug | 7.5k | 64.5k | 8.6× |
| large-file bug | 7.7k | 37.0k | 4.8× |
| multi-file refactor | 5.9k | 28.2k | 4.8× |
Across a broader 22-scenario set: median peak input 10k (sliceagent) vs 23k (Kimi Code) — and sliceagent's per-turn input barely moves (2.6k → 7.5k over 50 steps) while the transcript climbs 16k → 64k.
Capability is at parity on these samples. 22/22 vs 21/22 passed on the parity set; on 3 SWE-bench Verified instances sliceagent resolved 1/3 (scored by the official harness); TerminalBench-core standalone accuracy 0.625 (N=16).
Same work, far fewer tokens. On SWE-bench Lite vs a transcript agent, same instances: 26 steps / 284k tokens vs 63 steps / 838k — ~2.4× fewer steps, ~3× fewer tokens (both resolved 0/3 — underdetermined instances, equal capability).
Numbers are small-N and honestly reported: the consistent, reproducible signal is the flat per-turn cost, not a capability leap. The win shows up in multi-turn real use (where a transcript grows), not single-turn SWE-bench (which structurally can't show it).
Under the hood
The core is openai-free (only llm.py/cli.py import the SDK), so the whole loop is testable offline with a fake LLM. Layout under src/sliceagent/:
- moat:
pfc.py(theSlicedataclass, typed tiers,slice_sink) +seed.py(the reconstruction seammake_build_slice) +prompt.py(SYSTEM_PROMPT),loop.py(run_turn/run_step— stateless core over contracts). - contracts:
interfaces.py(LLMClient/ToolHost/Retriever/Oracle),events.py(the loop's only output path),hooks.py(policy seam:OracleHook/PermissionHook/BudgetHook). - engineering:
access.py+scheduler.py(resource-conflict model → safe parallel tools),errors.py(error classification + retry/backoff),sandbox.py(execution backend),policy.py(permission chain). - default impls:
tools.py(LocalToolHost),llm.py(OpenAILLM),code_index.py(RipgrepCodeIndex) +retriever.py(NullRetriever),oracle.py,cli.py(event-sink host).
The loop dispatches events; the host composes sinks (slice-updater, durable log, terminal). Ships a local ToolHost (workspace-confined file ops + sandboxed shell) and a ripgrep-backed CodeIndex (falls back to NullRetriever when rg isn't on PATH).
Safety (P1.5). Two independent layers:
- Safe execution (
tools.py+sandbox.py): file ops are confined to the workspace root — path traversal out of it is rejected — and shell runs through aSandboxbackend.BaseSandboxowns output capping; backends implement_exec():LocalSandbox(subprocess, cwd-confined, timeout, secret-env scrubbing so model-run commands can't read your API keys) andDockerSandbox(container — workspace bind-mounted same-path, network off by default, only configured env enters). Pick viaAGENT_SANDBOX/[sandbox]. Code-as-action stays backend-portable viasandbox.python_cmd. - Authorization (
policy.py): an orderedPolicyChainbehind thePermissionHook. Three modes viaAGENT_POLICY, all of which block catastrophic commands (rm -rf /,sudo,curl … | sh, writes to/etc, key/cred reads, force-push):teenager(default — auto-applies file edits, asks before shell commands),baby-sitter(asks before every edit and command; "always" memorizes for the session),let-it-go(runs everything except the catastrophic floor). A non-interactive/headless run auto-proceeds on a confirm-mode (still catastrophic-gated); legacyguard/ask/readonly/allowstill resolve. Hooks can also mutate viaprepare_messages(inject context before the LLM call) andtransform_tool_result(rewrite/redact output before it enters the slice).
Subagents (spawn_subagent). The slice thesis applied recursively: for large, decomposable work the model delegates a self-contained sub-task to a child agent. The child runs its OWN loop with a fresh slice in the SAME workspace, then returns only a compact summary — the parent's slice never sees the child's transcript, so parent context stays bounded no matter how much the child did. It's a ToolHost wrapper (subagent.py), so the loop is unchanged (one tool call → a summary string); depth-capped (AGENT_SUBAGENT_DEPTH, default 1) against runaway recursion, and the child runs under the same permission policy. Verified live: the model delegated two modules to two children that produced correct code, with the parent slice holding only the two spawn_subagent summaries.
Code-as-action (execute_code). Beyond one-call-per-tool, the model can write a single Python script that performs many file/shell actions and prints one short result — collapsing N tool round-trips into one turn (the strongest context reducer). The script runs in the LocalSandbox (cwd-confined, secret-scrubbed, timed-out) with a no-import helper API (read_file/write_file/append_file/str_replace/list_files/run); the workspace is on sys.path so freshly-written modules import cleanly. Only stdout returns. Files it reads/edits via the helpers are folded back into the OPEN FILES working set (paths parsed from the script), so code-as-action coheres with the slice instead of bypassing it — the agent doesn't re-read what a script already touched. It carries the same trust level as run_command (arbitrary execution) and is gated by the same policy (readonly blocks it). RPC-back-to-parent for parent-only tools (memem/MCP) is the documented upgrade.
Extensions (MCP · skills · plugins). sliceagent extends through one tool registry that every source feeds:
- MCP (
mcp_client.py): declare servers in[mcp_servers.*]; their tools appear asmcp__server__tool(official MCP SDK, stdio). - Skills (
skills.py):SKILL.mdprompt-packs (see above) discovered from.sliceagent/skills. - Plugins (
plugins.py): a directory withplugin.toml+ an__init__.pyexposingregister(ctx). Throughctxa plugin contributes tools/skills/MCP-servers/hooks into the existing seams — no privileged surface; plugin tools run through the same sandbox + policy + scheduler. Discovered from.sliceagent/plugins(+[plugins].dirs). Seeexamples/plugins/hello.
Code-discovery tier (CodeIndex). code_index.py fills the RELATED CODE tier from a real repo: each turn it ripgreps the working tree for the identifiers in the task plus the current error (which usually names the missing symbol), ranks files by how many distinct query terms they hit, and returns line-numbered context windows — deterministic, no embeddings, no network. repo_map() gives a compact file→definitions skeleton for orientation (not folded into every turn, to keep context bounded). tree-sitter is the precision upgrade for definition extraction (drop-in at _defs_in()); v1 uses ripgrep + regex.
Memory tier (memem) — a closed read/write loop. memory.py plugs memem in as the cross-session Memory (the RELEVANT MEMORY tier). It's behind the Memory interface; memem indexes a curated lesson vault, not source code (code discovery is the separate CodeIndex above).
- Read: each task recalls relevant lessons via memem's hybrid retrieval into the slice.
- Write (
neocortex.py): after a task succeeds, consolidation distills a durable lesson from what happened andremember()s it — so a future similar task recalls it. This is what makes sliceagent memory-native. It's an event sink, signal-dense by construction: it mines only a validated episode (a successful turn in which an error was hit and then cleared — no error / no success / no lesson), dedups within a session, and prints💡 learned: ….AGENT_MINE=deterministic(default — cheap, no extra LLM call) |llm(one-shot distillation for a crisper lesson) |off.
Configure via sliceagent.toml (persistent; see sliceagent.toml.example) or env vars (one-off overrides). Precedence: env > project sliceagent.toml > user ~/.sliceagent/config.toml > default. Keys: AGENT_POLICY (baby-sitter/teenager/let-it-go), AGENT_MINE, AGENT_SUBAGENT_DEPTH, AGENT_MODEL, SLICEAGENT_VAULT (memory location), AGENT_VERIFY_CMD (tests as the Oracle), AGENT_MAX_TOKENS, SHOW_SLICE=1; plus [skills], [mcp_servers], [plugins] sections.
Architecture (build / plug / integrate)
The discipline: own the thin differentiated core, keep the thick commodity periphery on well-known building blocks.
- Build (the moat): the slice loop, the typed memory tiers + per-tier compaction, the reconstruction. Plus thin glue: permission gate, verification orchestration, subagents, resume.
- Plug: memem as the retrieval + cross-session memory engine (behind a
Retrieverinterface). - Integrate: LLM SDKs, tree-sitter (repo map), ripgrep (search), a container sandbox, MCP (tool breadth), a TUI lib, SWE-bench (evals).
The differentiator, in one line
Deterministic reconstruction from ground truth — vs the incumbents' accumulate-then-LLM-summarize.
License
MIT — see LICENSE. Third-party components and their licenses are listed in NOTICE.
Security policy + threat model: SECURITY.md.
Acknowledgments
sliceagent's design was informed by two excellent open-source agents: Hermes (MIT) and Kimi Code. A few peripheral utilities are ported from Hermes (see NOTICE); most of the rest are patterns we studied and reimplemented on our own terms. With thanks to their authors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sliceagent-0.1.10.tar.gz.
File metadata
- Download URL: sliceagent-0.1.10.tar.gz
- Upload date:
- Size: 550.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c030d67f1c1d06ecfcface3ba9b9129f73502cdf46c3b30eaa11149f08353e2c
|
|
| MD5 |
f57c295cf0fbab0c1f9f4411b7179b8c
|
|
| BLAKE2b-256 |
61f30287afb91c3bf3699a85bcd64c909c725401c639f5a370d393769b67449e
|
Provenance
The following attestation bundles were made for sliceagent-0.1.10.tar.gz:
Publisher:
publish.yml on TT-Wang/sliceagent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sliceagent-0.1.10.tar.gz -
Subject digest:
c030d67f1c1d06ecfcface3ba9b9129f73502cdf46c3b30eaa11149f08353e2c - Sigstore transparency entry: 2048139127
- Sigstore integration time:
-
Permalink:
TT-Wang/sliceagent@a84a96d3580aac6ecc879391bfdd6251cfee5608 -
Branch / Tag:
refs/tags/v0.1.10 - Owner: https://github.com/TT-Wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a84a96d3580aac6ecc879391bfdd6251cfee5608 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sliceagent-0.1.10-py3-none-any.whl.
File metadata
- Download URL: sliceagent-0.1.10-py3-none-any.whl
- Upload date:
- Size: 379.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f7c3a8527852bbb38d437ab874c9be37fb3f9568616f8809ac363a184022cc8
|
|
| MD5 |
fbd370cd5815fb58fafa41da3a64ae4b
|
|
| BLAKE2b-256 |
a72bda356f17c1381331fab6a4389854faac14de8a7080eab03aab5208aca67d
|
Provenance
The following attestation bundles were made for sliceagent-0.1.10-py3-none-any.whl:
Publisher:
publish.yml on TT-Wang/sliceagent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sliceagent-0.1.10-py3-none-any.whl -
Subject digest:
1f7c3a8527852bbb38d437ab874c9be37fb3f9568616f8809ac363a184022cc8 - Sigstore transparency entry: 2048139519
- Sigstore integration time:
-
Permalink:
TT-Wang/sliceagent@a84a96d3580aac6ecc879391bfdd6251cfee5608 -
Branch / Tag:
refs/tags/v0.1.10 - Owner: https://github.com/TT-Wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a84a96d3580aac6ecc879391bfdd6251cfee5608 -
Trigger Event:
release
-
Statement type: