Drop-in CLI (and MCP server) for AI coding agents. Structural codebase navigation + persistent memory engine — 97% token savings.

These details have not been verified by PyPI

Project links

Project description

⚡ Token Savior Recall

The MCP server that turns Claude into the only coding agent hitting 100% on a real benchmark. Structural code navigation + persistent memory. −77% active tokens. −76% wall time. Zero losses.

📖 mibayy.github.io/token-savior — project site + benchmark landing 🧪 github.com/Mibayy/tsbench — benchmark source + fixtures

Benchmark — 96 real coding tasks (tiny+v2 default)

	Plain Claude Code	With Token Savior
Score	141 / 180 (78.3%)	192 / 192 (100.0%)
Active tokens / task	17 221	3 929 (−77%)
Wall time / task	110.6 s	26.6 s (−76%)
Wins / Ties / Losses	—	25 / 65 / 0 (90 paired)

Perfect (100%) across all 11 categories: audit, bug_fixing, code_generation, code_review, config_infra, data_analysis, documentation, explanation, git, navigation, refactoring, writing_tests. Zero losses against plain Claude — every task is a win or a tie.

The default config — TS_PROFILE=tiny_plus (15 tools, ~2.5 KT manifest)

TS_CAPTURE_DISABLED=1 + the v2 system prompt that bans Agent sub-agent delegation — reproduces 100% on Opus 4.7 with −54% active tokens vs the legacy lean profile.

Also validated on Sonnet 4.6 (ts 170/180 = 94.4% vs base 156/180 = 86.7%).

Model: Claude Opus 4.7 · Methodology + per-task breakdown: mibayy.github.io/token-savior.

What it does

Claude Code reads whole files to answer questions about three lines, and forgets everything the moment a session ends. Token Savior Recall fixes both. It indexes your codebase by symbol — functions, classes, imports, call graph — so the model navigates by pointer instead of by cat. Measured reduction: 97% fewer chars injected across 170+ real sessions.

On top of that sits a persistent memory engine. Every decision, bugfix, convention, guardrail and session rollup is stored in SQLite WAL + FTS5 + vector embeddings, ranked by Bayesian validity and ROI, and re-injected as a compact delta at the start of the next session. Contradictions are detected at save time; observations decay with explicit TTLs; a 3-layer progressive-disclosure contract keeps lookup cost bounded.

Token savings

Operation	Plain Claude	Token Savior	Reduction
`find_symbol("send_message")`	41M chars (full read)	67 chars	−99.9%
`get_function_source("compile")`	grep + cat chain	4.5K chars	direct
`get_change_impact("LLMClient")`	impossible	16K chars (154 direct + 492 transitive)	new capability
`get_backward_slice(var, line)`	130 lines	12 lines	−92%
`memory_index` (Layer 1)	n/a	~15 tokens/result	Layer 1 shortlist
90-task tsbench (Opus base→ts)	17.2 KT active/task	3.9 KT active/task	−77%
tsbench score (Opus, 96 tasks)	141/180 (78.3%)	192/192 (100.0%)	+22 pts

Full benchmark methodology and per-task results: tsbench.

Memory engine

Capability	How it works
Storage	SQLite WAL + FTS5 + `sqlite-vec` (optional), 12 observation types
Hybrid search	BM25 + vector (`all-MiniLM-L6-v2`, 384d) fused via RRF, FTS fallback graceful
Progressive disclosure	3-layer contract: `memory_index` → `memory_search` → `memory_get`
Citation URIs	`ts://obs/{id}` — reusable across layers, agent-native pointers
Bayesian validity	Each obs carries a validity prior + update rule; stale obs are surfaced, not silently trusted
Contradiction detection	Triggered at save time against existing index; flagged in hook output
Decay + TTL	Per-type TTL (command 60d, research 90d, note 60d) + LRU scoring `0.4·recency + 0.3·access + 0.3·type`
Symbol staleness	Obs linked to symbols are invalidated when the symbol's content hash changes
ROI tracking	Access count × context weight — unused obs age out, high-ROI obs are promoted
MDL distillation	Minimum Description Length grouping compresses redundant observations into conventions
Auto-promotion	note ×5 accesses → convention; warning ×5 → guardrail
Hooks	8 Claude Code lifecycle hooks (SessionStart/Stop/End, PreCompact, PreToolUse ×2, UserPromptSubmit, PostToolUse)
Web viewer	`127.0.0.1:$TS_VIEWER_PORT` — htmx + SSE, opt-in
LLM auto-extraction	Opt-in `TS_AUTO_EXTRACT=1` — PostToolUse tool uses extracted into 0-3 observations via small-model call

vs claude-mem

Two projects share the goal — persistent memory for Claude Code. The axes below are measured, not marketing.

Axis	claude-mem	Token Savior Recall
Bayesian validity	no	yes
Contradiction detection at save	no	yes
Per-type decay + TTL	no	yes
Symbol staleness (content-hash linked obs)	no	yes
ROI tracking + auto-promotion	no	yes
MDL distillation into conventions	no	yes
Code graph / AST navigation	no	yes (90 tools, cross-language)
Progressive disclosure contract	no	yes (3 layers, ~15/60/200 tokens)
Hybrid FTS + vector search (RRF)	no	yes

Token Savior Recall is a superset: it ships the memory engine plus the structural codebase server that gave the project its name.

Install

uvx (no venv, no clone)

uvx token-savior-recall

pip

pip install "token-savior-recall[mcp]"
# Optional hybrid vector search:
pip install "token-savior-recall[mcp,memory-vector]"

Claude Code one-liner

claude mcp add token-savior -- /path/to/venv/bin/token-savior

Development

git clone https://github.com/Mibayy/token-savior
cd token-savior
python3 -m venv .venv
.venv/bin/pip install -e ".[mcp,dev]"
pytest tests/ -q

Configure

{
  "mcpServers": {
    "token-savior-recall": {
      "command": "/path/to/venv/bin/token-savior",
      "env": {
        "WORKSPACE_ROOTS": "/path/to/project1,/path/to/project2",
        "TOKEN_SAVIOR_CLIENT": "claude-code"
      }
    }
  }
}

Optional env: TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID (critical-observation feed), TS_VIEWER_PORT (web viewer), TS_AUTO_EXTRACT=1 + TS_API_KEY (LLM auto-extraction), TOKEN_SAVIOR_PROFILE (full / core / nav / lean / ultra — filters advertised tool set to shrink the per-turn MCP manifest).

Tools (90)

Category counts — full catalog is served via MCP tools/list.

Category	Count
Core navigation	14
Dependencies & graph	9
Git & diffs	5
Safe editing	8
Checkpoints	6
Test & run	6
Config & quality	8
Docker & multi-project	2
Advanced context (slicing, packing, RWR, prefetch, verify)	6
Memory engine	21
Reasoning (plan/decision traces)	3
Stats, budget, health	10
Project management	7

Profiles

TOKEN_SAVIOR_PROFILE filters the advertised tools/list payload while keeping handlers live.

Profile	Advertised	~Tokens	Use case
`auto` (v3.4 — recommended)	15-18	~2 500	Adaptive manifest sized from your real telemetry
`full` (current default)	68	~8 770	All capabilities, debug, power users
`code_mode` (v3.2)	4	~1 500	Multi-tool chains in one `ts_execute` JS sandbox
~~`core`, `nav`, `lean`, `ultra`, `tiny`, `tiny_plus`~~	—	—	Deprecated in v3.4, removed in v4.0 — use `auto`

Bench-mode env vars

For benchmark / cold-start workloads where memory and capture sandboxing add no value, pair the profile with these env vars:

export TOKEN_SAVIOR_PROFILE=lean      # or 'tiny' for max trim
export TS_MEMORY_DISABLE=1            # hide memory_* (-300 t)
export TS_CAPTURE_DISABLED=1          # hide capture_*, skip PostToolUse hook
export TS_HOOK_MINIMAL=1              # SessionStart emits Memory Index only
export TS_NO_HINTS=1                  # drop _hints / _suggestion (~30-50 t/call)

Measured on tsbench (90 tasks, Claude Opus 4.7):

Configuration	Active tokens / task	Score
Plain agent (Read/Grep/Bash, no Token Savior)	17 221	78.3 %
`lean` profile (default since v2.9)	8 928	100.0 %
`lean` + the 5 env vars above	~5 500	100.0 %

Defer-loading via `ts_search`

When the manifest budget is the bottleneck, the new tiny profile exposes only 6 tools (switch_project, find_symbol, get_function_source, get_full_context, search_codebase, ts_search). Other ~60 tools are reachable just-in-time via:

ts_search(query="find dependents of update_user", top_k=5)
# → {"matched_tools": [{"name": "get_dependents", "score": 0.68, ...}, ...]}

Embeddings (Nomic 768d) score every tool description against the query; top-K candidates come back with their full inputSchema so the next turn can call them directly. Mirrors the Tool Attention paper (47.3k → 2.4k tokens / turn at 120 tools, −95 % prefix).

Code Mode — collapse multi-tool chains into one JS sandbox

TOKEN_SAVIOR_PROFILE=code_mode exposes just 4 tools (ts_execute, ts_search, switch_project, list_projects) and lets the model write a JS body that calls 34 internal Token Savior tools through a typed facade. Replaces the standard find_symbol → get_function_source → get_dependents 3-round-trip chain with a single tool call.

# Step 1: discover signatures on demand
ts_search(query="locate symbol and find callers", format="ts")
# → matched_tools: [
#     {"name":"find_symbol", "signature":"find_symbol: (args?: { name?: string; ... }) => Promise<unknown>"},
#     {"name":"get_dependents", "signature":"get_dependents: (args: { name: string; ... }) => Promise<unknown>"},
#   ]

# Step 2: chain them in one round-trip
ts_execute(script="""
  const sym = await tools.find_symbol({ name: "process_payment" });
  const callers = await tools.get_dependents({ name: sym.symbol });
  return { sym, callers };
""")
# → {"value": {...}, "logs": [...], "tool_calls": 2, "duration_ms": 52}

Adapted from Cloudflare's Code Mode for MCP. Sandbox is a Node subprocess with stdio IPC. Each script runs in an isolated context, ~50 ms cold spawn, configurable timeout. Disable entirely with TS_CODE_MODE_DISABLE=1.

Anthropic API users — pair with native context management

For long agent loops, combine Token Savior with Anthropic's native context primitives (Claude API ≥ 2025-09-19):

client = anthropic.Anthropic(default_headers={
    "anthropic-beta": "context-management-2025-06-27,clear-tool-uses-2025-09-19",
})
resp = client.messages.create(
    model="claude-opus-4-7",
    context_management={"edits": [{
        "type": "clear_tool_uses_20250919",
        "trigger": {"type": "input_tokens", "value": 30_000},
        "keep": {"type": "tool_uses", "value": 4},
        "exclude_tools": ["replace_symbol_source", "edit_lines_in_symbol"],
    }]},
    tools=[...],
    messages=[...],
)

Anthropic's cookbook measures −48 % peak context with clearing alone on long agent loops.

Progressive disclosure — memory search

Three layers, increasing cost. Always start at Layer 1. Escalate only if the previous layer paid off. Full contract: docs/progressive-disclosure.md.

Layer	Tool	Tokens/result	When
1	`memory_index`	~15	Always first
2	`memory_search`	~60	If Layer 1 matched
3	`memory_get`	~200	If Layer 2 confirmed

Each Layer 1 row ends with [ts://obs/{id}] — pass it straight to Layer 3.

License

MIT — see LICENSE.

Works with any MCP-compatible AI coding tool. Claude Code · Cursor · Codex CLI · Antigravity · Cline · Continue · Windsurf · Aider · Gemini CLI · Copilot CLI · Zed · any custom MCP client

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.4.1

May 26, 2026

4.4.0

May 26, 2026

4.3.3

May 26, 2026

4.3.2

May 19, 2026

4.3.1

May 19, 2026

4.3.0

May 19, 2026

4.0.0

May 18, 2026

This version

3.5.0

May 18, 2026

3.4.0

May 16, 2026

3.3.0

May 16, 2026

3.2.0

May 16, 2026

3.1.0

May 16, 2026

3.0.0

Apr 30, 2026

2.6.0

Apr 20, 2026

2.0.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_savior_recall-3.5.0.tar.gz (747.1 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

token_savior_recall-3.5.0-py3-none-any.whl (428.1 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file token_savior_recall-3.5.0.tar.gz.

File metadata

Download URL: token_savior_recall-3.5.0.tar.gz
Upload date: May 18, 2026
Size: 747.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for token_savior_recall-3.5.0.tar.gz
Algorithm	Hash digest
SHA256	`946db35c846e1cc792bc7e5f7d358c0249c1fb42e79485481689cafa4f0fd1e3`
MD5	`8d3254cff9cf560f6ba8bdea5fb16844`
BLAKE2b-256	`cd527fd873964e97ab547ace91b2dce3cf047079df31602b388a6d9643f7fdad`

See more details on using hashes here.

File details

Details for the file token_savior_recall-3.5.0-py3-none-any.whl.

File metadata

Download URL: token_savior_recall-3.5.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 428.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for token_savior_recall-3.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c113fda1df266e8de0310e1a0c0b94c5da62bf13f8a0a1bd9a30059c618519a8`
MD5	`e2781eb791582677c6a98378ac35e71e`
BLAKE2b-256	`5e476899f7a332d63efb97d16177e9188481cb351cfe59d7af738aa1ff1ec75d`

See more details on using hashes here.

token-savior-recall 3.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚡ Token Savior Recall

Benchmark — 96 real coding tasks (tiny+v2 default)

What it does

Token savings

Memory engine

vs claude-mem

Install

uvx (no venv, no clone)

pip

Claude Code one-liner

Development

Configure

Tools (90)

Profiles

Bench-mode env vars

Defer-loading via ts_search

Code Mode — collapse multi-tool chains into one JS sandbox

Anthropic API users — pair with native context management

Progressive disclosure — memory search

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Defer-loading via `ts_search`