Skip to main content

Structural codebase MCP server with persistent memory engine — 97% token savings + cross-session recall for Claude Code

Project description

⚡ Token Savior Recall

The MCP server that turns Claude into the only coding agent hitting 100% on a real benchmark. Structural code navigation + persistent memory. −77% active tokens. −76% wall time. Zero losses.

Version Tools Tests Savings Benchmark Vector CI Python 3.11+ MCP

📖 mibayy.github.io/token-savior — project site + benchmark landing 🧪 github.com/Mibayy/tsbench — benchmark source + fixtures


Benchmark — 96 real coding tasks (tiny+v2 default)

Plain Claude Code With Token Savior
Score 141 / 180 (78.3%) 192 / 192 (100.0%)
Active tokens / task 17 221 3 929 (−77%)
Wall time / task 110.6 s 26.6 s (−76%)
Wins / Ties / Losses 25 / 65 / 0 (90 paired)

Perfect (100%) across all 11 categories: audit, bug_fixing, code_generation, code_review, config_infra, data_analysis, documentation, explanation, git, navigation, refactoring, writing_tests. Zero losses against plain Claude — every task is a win or a tie.

The default config — TS_PROFILE=tiny_plus (15 tools, ~2.5 KT manifest)

  • TS_CAPTURE_DISABLED=1 + the v2 system prompt that bans Agent sub-agent delegation — reproduces 100% on Opus 4.7 with −54% active tokens vs the legacy lean profile.

Also validated on Sonnet 4.6 (ts 170/180 = 94.4% vs base 156/180 = 86.7%).

Model: Claude Opus 4.7 · Methodology + per-task breakdown: mibayy.github.io/token-savior.


What it does

Claude Code reads whole files to answer questions about three lines, and forgets everything the moment a session ends. Token Savior Recall fixes both. It indexes your codebase by symbol — functions, classes, imports, call graph — so the model navigates by pointer instead of by cat. Measured reduction: 97% fewer chars injected across 170+ real sessions.

On top of that sits a persistent memory engine. Every decision, bugfix, convention, guardrail and session rollup is stored in SQLite WAL + FTS5 + vector embeddings, ranked by Bayesian validity and ROI, and re-injected as a compact delta at the start of the next session. Contradictions are detected at save time; observations decay with explicit TTLs; a 3-layer progressive-disclosure contract keeps lookup cost bounded.


Token savings

Operation Plain Claude Token Savior Reduction
find_symbol("send_message") 41M chars (full read) 67 chars −99.9%
get_function_source("compile") grep + cat chain 4.5K chars direct
get_change_impact("LLMClient") impossible 16K chars (154 direct + 492 transitive) new capability
get_backward_slice(var, line) 130 lines 12 lines −92%
memory_index (Layer 1) n/a ~15 tokens/result Layer 1 shortlist
90-task tsbench (Opus base→ts) 17.2 KT active/task 3.9 KT active/task −77%
tsbench score (Opus, 96 tasks) 141/180 (78.3%) 192/192 (100.0%) +22 pts

Full benchmark methodology and per-task results: tsbench.


Memory engine

Capability How it works
Storage SQLite WAL + FTS5 + sqlite-vec (optional), 12 observation types
Hybrid search BM25 + vector (all-MiniLM-L6-v2, 384d) fused via RRF, FTS fallback graceful
Progressive disclosure 3-layer contract: memory_indexmemory_searchmemory_get
Citation URIs ts://obs/{id} — reusable across layers, agent-native pointers
Bayesian validity Each obs carries a validity prior + update rule; stale obs are surfaced, not silently trusted
Contradiction detection Triggered at save time against existing index; flagged in hook output
Decay + TTL Per-type TTL (command 60d, research 90d, note 60d) + LRU scoring 0.4·recency + 0.3·access + 0.3·type
Symbol staleness Obs linked to symbols are invalidated when the symbol's content hash changes
ROI tracking Access count × context weight — unused obs age out, high-ROI obs are promoted
MDL distillation Minimum Description Length grouping compresses redundant observations into conventions
Auto-promotion note ×5 accesses → convention; warning ×5 → guardrail
Hooks 8 Claude Code lifecycle hooks (SessionStart/Stop/End, PreCompact, PreToolUse ×2, UserPromptSubmit, PostToolUse)
Web viewer 127.0.0.1:$TS_VIEWER_PORT — htmx + SSE, opt-in
LLM auto-extraction Opt-in TS_AUTO_EXTRACT=1 — PostToolUse tool uses extracted into 0-3 observations via small-model call

vs claude-mem

Two projects share the goal — persistent memory for Claude Code. The axes below are measured, not marketing.

Axis claude-mem Token Savior Recall
Bayesian validity no yes
Contradiction detection at save no yes
Per-type decay + TTL no yes
Symbol staleness (content-hash linked obs) no yes
ROI tracking + auto-promotion no yes
MDL distillation into conventions no yes
Code graph / AST navigation no yes (90 tools, cross-language)
Progressive disclosure contract no yes (3 layers, ~15/60/200 tokens)
Hybrid FTS + vector search (RRF) no yes

Token Savior Recall is a superset: it ships the memory engine plus the structural codebase server that gave the project its name.


Install

uvx (no venv, no clone)

uvx token-savior-recall

pip

pip install "token-savior-recall[mcp]"
# Optional hybrid vector search:
pip install "token-savior-recall[mcp,memory-vector]"

Claude Code one-liner

claude mcp add token-savior -- /path/to/venv/bin/token-savior

Development

git clone https://github.com/Mibayy/token-savior
cd token-savior
python3 -m venv .venv
.venv/bin/pip install -e ".[mcp,dev]"
pytest tests/ -q

Configure

{
  "mcpServers": {
    "token-savior-recall": {
      "command": "/path/to/venv/bin/token-savior",
      "env": {
        "WORKSPACE_ROOTS": "/path/to/project1,/path/to/project2",
        "TOKEN_SAVIOR_CLIENT": "claude-code"
      }
    }
  }
}

Optional env: TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID (critical-observation feed), TS_VIEWER_PORT (web viewer), TS_AUTO_EXTRACT=1 + TS_API_KEY (LLM auto-extraction), TOKEN_SAVIOR_PROFILE (full / core / nav / lean / ultra — filters advertised tool set to shrink the per-turn MCP manifest).


Tools (90)

Category counts — full catalog is served via MCP tools/list.

Category Count
Core navigation 14
Dependencies & graph 9
Git & diffs 5
Safe editing 8
Checkpoints 6
Test & run 6
Config & quality 8
Docker & multi-project 2
Advanced context (slicing, packing, RWR, prefetch, verify) 6
Memory engine 21
Reasoning (plan/decision traces) 3
Stats, budget, health 10
Project management 7

Profiles

TOKEN_SAVIOR_PROFILE filters the advertised tools/list payload while keeping handlers live.

Profile Advertised ~Tokens Use case
full (default) 67 ~8 770 All capabilities
core 54 ~5 800 Daily coding, no memory engine
nav 28 ~3 100 Read-only exploration
lean 52 ~6 940 Memory engine off — used in tsbench v2
ultra 31 ~4 250 Hot tools + ts_extended meta-tool
tiny (new) 6 ~1 070 Defer-loading via ts_search (Tool Attention pattern)

Bench-mode env vars

For benchmark / cold-start workloads where memory and capture sandboxing add no value, pair the profile with these env vars:

export TOKEN_SAVIOR_PROFILE=lean      # or 'tiny' for max trim
export TS_MEMORY_DISABLE=1            # hide memory_* (-300 t)
export TS_CAPTURE_DISABLED=1          # hide capture_*, skip PostToolUse hook
export TS_HOOK_MINIMAL=1              # SessionStart emits Memory Index only
export TS_NO_HINTS=1                  # drop _hints / _suggestion (~30-50 t/call)

Measured on tsbench (90 tasks, Claude Opus 4.7):

Configuration Active tokens / task Score
Plain agent (Read/Grep/Bash, no Token Savior) 17 221 78.3 %
lean profile (default since v2.9) 8 928 100.0 %
lean + the 5 env vars above ~5 500 100.0 %

Defer-loading via ts_search

When the manifest budget is the bottleneck, the new tiny profile exposes only 6 tools (switch_project, find_symbol, get_function_source, get_full_context, search_codebase, ts_search). Other ~60 tools are reachable just-in-time via:

ts_search(query="find dependents of update_user", top_k=5)
# → {"matched_tools": [{"name": "get_dependents", "score": 0.68, ...}, ...]}

Embeddings (Nomic 768d) score every tool description against the query; top-K candidates come back with their full inputSchema so the next turn can call them directly. Mirrors the Tool Attention paper (47.3k → 2.4k tokens / turn at 120 tools, −95 % prefix).

Anthropic API users — pair with native context management

For long agent loops, combine Token Savior with Anthropic's native context primitives (Claude API ≥ 2025-09-19):

client = anthropic.Anthropic(default_headers={
    "anthropic-beta": "context-management-2025-06-27,clear-tool-uses-2025-09-19",
})
resp = client.messages.create(
    model="claude-opus-4-7",
    context_management={"edits": [{
        "type": "clear_tool_uses_20250919",
        "trigger": {"type": "input_tokens", "value": 30_000},
        "keep": {"type": "tool_uses", "value": 4},
        "exclude_tools": ["replace_symbol_source", "edit_lines_in_symbol"],
    }]},
    tools=[...],
    messages=[...],
)

Anthropic's cookbook measures −48 % peak context with clearing alone on long agent loops.


Progressive disclosure — memory search

Three layers, increasing cost. Always start at Layer 1. Escalate only if the previous layer paid off. Full contract: docs/progressive-disclosure.md.

Layer Tool Tokens/result When
1 memory_index ~15 Always first
2 memory_search ~60 If Layer 1 matched
3 memory_get ~200 If Layer 2 confirmed

Each Layer 1 row ends with [ts://obs/{id}] — pass it straight to Layer 3.


Links

License

MIT — see LICENSE.

Works with any MCP-compatible AI coding tool. Claude Code · Cursor · Codex CLI · Antigravity · Cline · Continue · Windsurf · Aider · Gemini CLI · Copilot CLI · Zed · any custom MCP client

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_savior_recall-3.0.0.tar.gz (754.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_savior_recall-3.0.0-py3-none-any.whl (406.1 kB view details)

Uploaded Python 3

File details

Details for the file token_savior_recall-3.0.0.tar.gz.

File metadata

  • Download URL: token_savior_recall-3.0.0.tar.gz
  • Upload date:
  • Size: 754.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for token_savior_recall-3.0.0.tar.gz
Algorithm Hash digest
SHA256 eb7540e24230cc9a6a829c99150de4884260f85f070e7d15ac81c80af947a88b
MD5 8ff81ce58f8a3ef98d998f3a668f53f5
BLAKE2b-256 6deb4c32e9b223385dc7612b386b110396df30503c754be3bbcd246d00a801eb

See more details on using hashes here.

File details

Details for the file token_savior_recall-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for token_savior_recall-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 68a0a3329a3f88c507008c5473000154f014bbceac5252d7e44cc35fbb8d2f24
MD5 8f818967844ea46c45b4a9abff84ba08
BLAKE2b-256 70fbb6fb119cf475c7d0071d6eebde169a774f32f61039043eb123e26fdf0924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page