A coding agent with a new context-engineering framework: bounded, deterministic, reconstructed context (slice architecture) — built for long-horizon work.

These details have not been verified by PyPI

Project description

sliceagent

A coding agent that reconstructs a small, exact working context every turn — instead of accumulating a chat transcript and summarizing it when it overflows.

That one change is the whole product. Because context is rebuilt from ground truth each turn rather than piled up:

Cheap at scale — per-turn input stays flat no matter how long the session runs; no grow-to-window sawtooth, so tokens (and cost) don't balloon on long tasks.
Less wall-clock — a small, stable prompt each turn means less to send and less to reason over; long, iterative work finishes faster.
No context rot, no compaction — every turn re-reads the live files and the last error verbatim, so nothing drifts into a lossy summary and there is no history to compact.

The field's default is bigger windows + summarize. sliceagent does the opposite: remember less, reconstruct precisely.

Pre-1.0: on 0.x, CLI flags, config keys, and APIs may change between releases; breaking changes are noted in the CHANGELOG.

Contents: How it works · Benchmark · Install & quickstart · Usage · License · Acknowledgements · Contact

How it works

sliceagent's memory is organized like a brain: fast, lossy perception of the live world; a small working memory for the current task; a hippocampus that records what just happened; and a neocortex that distills durable lessons. Every turn reconstructs a bounded working set from these — it never replays a growing transcript.

Region	Role
Sensory cortex — live perception	Re-derives the world each turn: git state, project facts, repo map. Never stored or recalled.
Prefrontal cortex — working memory	The carried Slice: bounded, provenance-tagged state (findings, plan, change-set), sealed at each turn boundary.
Hippocampus — episodic memory	Losslessly records each turn; pages a specific past turn back in on demand.
Neocortex — long-term memory	Distills successful episodes into durable cross-session lessons, auto-surfaced when relevant.

┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│        PFC        │ │  Sensory Cortex   │ │    Hippocampus    │ │     Neocortex     │
│  working memory   │ │  live perception  │ │  episodic memory  │ │  durable lessons  │
└─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘
          │                     │                     │                     │
          └─────────────────────┴──────────┬──────────┴─────────────────────┘
                                           ▼
      ┌────────────────────────────────────────────────────────────────────────┐
      │            GLOBAL WORKSPACE  —  this turn's reconstructed seed           │
      │        (carried slice + live views + relevant lessons + prompt)         │
      └────────────────────────────────────────────────────────────────────────┘
                                           ▼
                         ┌───────────────────────────────────┐
                         │             LLM turn              │
                         │ tool calls accumulate within-turn │
                         └───────────────────────────────────┘
                                           ▼
                      ┌─────────────────────────────────────────┐
                      │   PFC updated · turn sealed to memory   │
                      └─────────────────────────────────────────┘

  ↻  next turn: only the PFC slice carries forward —
     everything else re-derives live from disk.

Each turn faults in exactly what the turn references — the carried slice, live views, and any relevant lessons — and hands the model that bounded Seed. The model acts; observations fold back into working memory; at the turn boundary the episode is sealed into the hippocampus; on success the neocortex distills a durable lesson. Net effect: per-turn context stays flat no matter how long the session runs.

Benchmark

On public benchmarks, sliceagent matches Codex's solve rate while using 2.5× fewer tokens and 1.3× less cost on ColBench, and up to 149× smaller peak input on long sessions.

Two questions decide whether reconstructing context every turn actually works: does it stay as capable as a transcript agent, and does it keep per-turn cost flat as a session grows? All three benchmarks are head-to-head vs OpenAI Codex on the same model (gpt-5.5).

1. In-turn capability — Terminal-Bench 2.0 (public)

A TB2.0 task is a single turn, so it's a clean test of raw within-turn ability. On the 32 tasks both agents completed cleanly:

metric	sliceagent	OpenAI Codex
pass rate	18 / 32 (56%)	18 / 32 (56%)
wins (exclusive)	4	4
median steps / task	10	27

Dead even, 4 wins each — reconstruct-every-turn matches a state-of-the-art agent on in-turn tasks with no capability tax.

2. Multi-turn — ColBench (public: Meta SWEET-RL)

Collaborative coding over multiple rounds with a simulated human — the memory model does matter here. 20 backend tasks, both gpt-5.5 at high:

metric	sliceagent	OpenAI Codex	% of Codex
solved	20 / 20	20 / 20	parity
peak input · median	5,191	13,415	39%
peak input · mean	5,188	13,424	39%
input tokens · total	284k	760k	37%
↳ served from cache	50%	57%	—
output tokens · total	24.3k	8.8k	276%
total tokens	308k	769k	40%
rounds · median	3	3	—
wall · median/turn	26s	26s	100%
cost (cache-aware $)	$0.44	$0.55	80%

Same capability — 2.6× smaller per-turn context, 2.5× fewer tokens, 1.3× cheaper, at parity wall.

3. Long-horizon multi-turn — self-designed coding scenarios

Iterative coding sessions where a transcript really piles up. Each scenario is a fixed sequence of 6–10 dependent, pre-written turns revealed one at a time — later turns build on (and regress) earlier work, so no agent can one-shot them and byte-identical turns go to both agents. Both gpt-5.5 at high. Reproduce with benchmarks/run.py (the scenarios are in benchmarks/multiturn_coding/):

metric	sliceagent	OpenAI Codex	% of Codex
solved	3 / 3	3 / 3	parity
peak input · median	16,330	2,075,987	0.8%
peak input · mean	16,734	2,056,732	0.8%
input tokens · total	2.21M	26.4M	8%
↳ served from cache	84%	91%	—
output tokens · total	63.4k	355.4k	18%
total tokens	2.28M	26.7M	9%
wall · total	1,069s	1,761s	61%
cost (cache-aware $)	$1.30	$9.43	14%

Per task — note how the transcript agent's peak input scales with the session while the slice stays flat:

scenario	agent	solved	peak input	total tokens	wall
s1 long-horizon debug (6 turns)	sliceagent	✓	14,769	499k	257s
	Codex	✓	1,655,714	5.17M	465s
s2 dependency-DAG scheduler (10 turns)	sliceagent	✓	19,104	924k	461s
	Codex	✓	2,075,987	10.0M	656s
s3 interval-set algebra (10 turns)	sliceagent	✓	16,330	854k	351s
	Codex	✓	2,438,496	11.5M	641s

Same capability — 109–149× smaller peak context, 11.7× fewer tokens, 7.3× cheaper, 1.6× faster. On s3, Codex's transcript reached a 2.44M-token single-request peak while sliceagent held 16k — a 149× gap that widens the longer the session runs.

How the cost numbers are calculated (exact token counts × published rates)

Cache-aware, at gpt-5.5 list rates — $1.25 / 1M fresh input, $0.125 / 1M cached input (a 10× discount on the prompt prefix the provider serves from its cache), $10 / 1M output:

cost = fresh_in × $1.25/M  +  cached_in × $0.125/M  +  output × $10/M
       where  fresh_in = total input − cached input

Applied to the summed token counts from the runs above:

ColBench (N = 20, all tasks summed)

line item	sliceagent	OpenAI Codex
fresh input	140,403 × $1.25/M = $0.176	330,719 × $1.25/M = $0.413
cached input	143,125 × $0.125/M = $0.018	429,719 × $0.125/M = $0.054
output	24,265 × $10/M = $0.243	8,786 × $10/M = $0.088
total	$0.436	$0.555

Self-designed long-horizon (N = 3, summed)

line item	sliceagent	OpenAI Codex
fresh input	347,238 × $1.25/M = $0.434	2,293,552 × $1.25/M = $2.867
cached input	1,866,752 × $0.125/M = $0.233	24,075,264 × $0.125/M = $3.009
output	63,372 × $10/M = $0.634	355,365 × $10/M = $3.554
total	$1.301	$9.430

One honest wrinkle worth naming: Codex's append-only transcript actually earns a higher cache-hit rate (57% vs 50% on ColBench, 91% vs 84% here) — a long stable prefix caches well. It still costs more, because its raw input volume is an order of magnitude larger; a cheaper per-token rate can't outrun many more tokens. That's the whole point of the slice: fewer tokens to bill in the first place.

Line items are rounded to the nearest $0.001; each total is the exact sum of unrounded per-token costs, and matches the cost row in the tables above.

The pattern across all three: capability holds, and the cost gap grows with session length — exactly the flat-per-turn-cost thesis. "Solved" is solution correctness, scored identically for both agents. ColBench is public; the long-horizon scenarios are reproducible under benchmarks/.

These are early, small-scale results — modest task counts (N = 32 / 20 / 3), single trial per task, one model, one opponent. Treat them as a directional signal, not a settled claim. We're actively expanding to larger and more varied test sets, more trials, and more baselines, and will update these numbers as that work lands.

Install & quickstart

One command — Linux, macOS, WSL2:

curl -fsSL https://raw.githubusercontent.com/TT-Wang/sliceagent/main/install.sh | sh

Windows — one command in PowerShell, fully native (no WSL, no admin):

irm https://raw.githubusercontent.com/TT-Wang/sliceagent/main/install.ps1 | iex

(Installs uv + sliceagent, and Git Bash + ripgrep if you don't have them — the agent's shell commands run under Git Bash, same as other coding agents. Interactive PTY sessions (terminal_open) aren't available natively yet; everything else works. Prefer WSL2? The Linux one-liner above works there as-is.)

The installer handles everything: uv, its own Python 3.12, ripgrep, and sliceagent — in an isolated tool env, no sudo, no prerequisites, no conflict with any Python you already have (conda base at 3.10? Rosetta-Intel conda on an M-series Mac? Doesn't matter). Then just:

sliceagent          # first run drops you straight into guided setup, then start chatting

Setup happens once, in-process: pick a provider, paste your API key (shown as ******, live-tested), and it writes ~/.sliceagent/config.toml (0600) so every later run just starts. Re-configure anytime with /config in-session or sliceagent init. There is no default model — sliceagent never picks one for you.

Alternative: install from PyPI yourself (you manage the Python — needs ≥ 3.11)

uv tool install --python 3.12 "sliceagent[tui]"     # uv — fetches Python itself
pipx install "sliceagent[tui]"                      # pipx
pip install "sliceagent[tui]"                       # plain pip (use a venv)

If pip refuses with Requires-Python >=3.11: conda create -n sliceagent python=3.12 -y && conda activate sliceagent, then pip install. Prefer env vars over the wizard? Export both LLM_API_KEY and AGENT_MODEL (plus LLM_BASE_URL for non-OpenAI endpoints). ripgrep is recommended (code search degrades gracefully without it).

Footprint is light (no torch). pip install -e . works for a clone. Homebrew / Docker arrive in v0.2. → Full walkthrough in QUICKSTART.md.

Usage

Run sliceagent in your project and type what you want in plain language. It rebuilds its working context, investigates, edits (auto-applied or confirmed, per your mode), and can run your tests to verify. A turn looks like:

❯ why does retry_with_backoff drop the last attempt? fix it

  🔍 grep "retry_with_backoff"   📖 read errors.py:40-72   ✎ edit errors.py
  ┌─ assistant ─────────────────────────────────────────────┐
  │ The loop exits on `attempt == max` before the final      │
  │ sleep+retry, so the last attempt never runs. Changed the  │
  │ bound to `attempt <= max` and added a regression test.    │
  └──────────────────────────────────────────────────────────┘
  ✓ done · 4 steps · 6.1k tokens

Attach a file or path to your message with @: @src/errors.py explain the backoff.

In-session commands (type /help for the full list):

Command	What it does
`/config` · `/model` · `/reasoning`	add/switch providers · switch model / reasoning effort (persists)
`/mode`	permission mode: baby-sitter (confirm each edit + command) · teenager (default; confirm risky ones) · let-it-go (auto-run all but catastrophic)
`/undo`	revert the last edit(s)
`/cwd <path>`	change the workspace root mid-session
`/cost`	tokens and estimated $ spent this session
`/skills` · `/tools` · `/mcp` · `/plugins` · `/agents`	list what's available to the agent
`/threads` · `/resume`	switch between, or resume, parked topics
`/learn <note>`	save a durable lesson yourself
`/plan`	draft a plan before it starts editing
`Ctrl-C` · `exit`	interrupt the turn · quit

It can edit code (workspace-confined, reversible with /undo), run shell commands and interactive processes through a sandbox (local by default, docker for full isolation), search the tree and the web, delegate decomposable work to subagents (each on its own bounded slice), and remember lessons across sessions. Three permission modes gate it, all with a hard floor on catastrophic commands; secrets are scrubbed from anything it runs or logs.

Configuration. sliceagent config --list prints every setting. Set them persistently in ~/.sliceagent/config.toml (written by init), or override any one via an environment variable:

Setting	Default	Purpose
`AGENT_MODEL`	(required)	the model id to run
`AGENT_POLICY`	`teenager`	permission mode
`AGENT_SANDBOX`	`local`	`local` or `docker` (isolated)
`AGENT_MAX_STEPS`	`60`	per-turn step ceiling
`SLICEAGENT_VAULT`	`~/.sliceagent/vault`	where episodic memory + task state persist (cross-session memory is on by default)
`AGENT_VERIFY_CMD`	(unset)	test command used as the verification oracle

License

MIT — see LICENSE. Third-party components and their licenses are listed in NOTICE. Security policy + threat model: SECURITY.md.

Acknowledgements

sliceagent's design was informed by two excellent open-source agents: Hermes (MIT) and Kimi Code. A few peripheral utilities are ported from Hermes (see NOTICE); most of the rest are patterns we studied and reimplemented on our own terms. Cross-session memory is powered by memem. With thanks to their authors.

Contact

Questions, feedback, or ideas — open an issue or reach out: tongtao.wang@gmail.com. (Security reports: please follow SECURITY.md instead.)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.12

Jul 3, 2026

0.1.11

Jul 2, 2026

0.1.10

Jul 2, 2026

0.1.9

Jul 2, 2026

0.1.8

Jul 2, 2026

0.1.7

Jul 2, 2026

0.1.6

Jul 2, 2026

0.1.5

Jul 2, 2026

0.1.4

Jul 2, 2026

0.1.3

Jul 2, 2026

0.1.2

Jul 2, 2026

0.1.1

Jul 2, 2026

0.1.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sliceagent-0.1.12.tar.gz (1.7 MB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sliceagent-0.1.12-py3-none-any.whl (381.8 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file sliceagent-0.1.12.tar.gz.

File metadata

Download URL: sliceagent-0.1.12.tar.gz
Upload date: Jul 3, 2026
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sliceagent-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`dbd39e139a28f2d62d1034f829bbd5f9d101353067906c6d84c3decf6b70cfb5`
MD5	`59acdf17169637aecedc8f9f2f8cf056`
BLAKE2b-256	`ee1218410a01ce7a2deb3549932e6a8afa7dc960efffddd4f3e2340f50535097`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sliceagent-0.1.12.tar.gz:

Publisher: publish.yml on TT-Wang/sliceagent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sliceagent-0.1.12.tar.gz
- Subject digest: dbd39e139a28f2d62d1034f829bbd5f9d101353067906c6d84c3decf6b70cfb5
- Sigstore transparency entry: 2060218356
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: TT-Wang/sliceagent@c82093be320cbbaa92068a918f4a933466029557
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/TT-Wang
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c82093be320cbbaa92068a918f4a933466029557
- Trigger Event: release

File details

Details for the file sliceagent-0.1.12-py3-none-any.whl.

File metadata

Download URL: sliceagent-0.1.12-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 381.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sliceagent-0.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`faccb166cf9678cdeecb5017f46f1afb7b8357aca949281ed8423fa435574abe`
MD5	`8396509bee930a116ee69841adeae255`
BLAKE2b-256	`a98809789e0346836a25dca2ac4c220104f3a4cc305f4f4b6de946c9816b0424`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sliceagent-0.1.12-py3-none-any.whl:

Publisher: publish.yml on TT-Wang/sliceagent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sliceagent-0.1.12-py3-none-any.whl
- Subject digest: faccb166cf9678cdeecb5017f46f1afb7b8357aca949281ed8423fa435574abe
- Sigstore transparency entry: 2060218450
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: TT-Wang/sliceagent@c82093be320cbbaa92068a918f4a933466029557
- Branch / Tag: refs/tags/v0.1.12
- Owner: https://github.com/TT-Wang
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c82093be320cbbaa92068a918f4a933466029557
- Trigger Event: release

sliceagent 0.1.12

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

sliceagent

How it works

Benchmark

1. In-turn capability — Terminal-Bench 2.0 (public)

2. Multi-turn — ColBench (public: Meta SWEET-RL)

3. Long-horizon multi-turn — self-designed coding scenarios

Install & quickstart

Usage

License

Acknowledgements

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance