Bridge-first CLI and runtime for Claude Code that compresses conversations and exposes savings and fallback health
Project description
Tok
Tok is designed to reduce LLM token costs on longer sessions, without changing how you use Claude Code once it's routed through the local bridge.
Savings come primarily from input token compression (prompt/context optimization) with additional savings from response compression. Short sessions (< 8 turns) default to baseline mode since compression overhead exceeds savings. Since most providers charge different rates for input vs output tokens, your actual cost reduction depends on your provider's pricing structure and session length.
Tok is an invisible bridge that sits between Claude Code and the model API. It compresses conversations on the way out and re-hydrates them on the way back. The focused 0.1.x public release story is Claude Code first: you keep using Claude exactly as before while Tok runs underneath and saves tokens automatically.
Who Is Tok For?
- Individual developers using Claude Code who want to reduce token costs
- Teams with shared API budgets looking to stretch their token allowances
- Power users who work on long-running sessions where context accumulates
If you already use Claude Code, Tok is a small add-on: start the bridge and point Claude
at it via ANTHROPIC_BASE_URL=http://localhost:9090.
What Tok Does
Tok intercepts LLM traffic and applies deterministic compression:
- Semantic deduplication: Repeated file reads, search results, and tool outputs are cached and stubbed
- Delta compression: Changed content shows only the diff, not the full payload
- Rolling state: Conversation history is capped at a fixed memory footprint — entries only drop when the cap is reached after very long sessions (practical conversations are effectively unlimited)
- Lossless round-trip: Everything re-hydrates perfectly on the way back
The result is typically lower token volume on sustained sessions, while preserving the bridge-first Claude workflow.
Support Tok
Tok exists because I ran into a real problem and wanted to solve it: getting the same AI results with less wasted context and lower token spend. The goal is to keep Tok open source and useful first.
If Tok helps you, the most helpful support is:
- Star the repo and share it with people who would benefit
- File issues, report regressions, and share benchmark results
- Contribute docs, tests, or fixes
- Use any sponsorship links listed here in the future if you want to help fund ongoing maintenance
Support is appreciated, but not expected. If Tok saves you money or makes your workflow less frustrating, that's why it's here.
Supported Workflow
The first open-source release supports exactly this path:
pip install tok-protocol
tok init # optional: create .tok/ workspace and .env
tok install # setup/migration helper (no wrapper by default)
tok bridge start # starts the bridge on port 9090
ANTHROPIC_BASE_URL=http://localhost:9090 claude
tok bridge status # check bridge health
tok doctor # session diagnostics
tok bridge stop # stop cleanly
tok stats # view savings
Default behavior is explicit. Tok does not override claude unless you opt in with
tok install --wrap-claude.
The main CLI commands for 0.1.x are: tok init, tok install,
tok bridge start|status|logs|stop, tok doctor, and tok stats.
Validation-Only Provider Paths
Tok can be pointed at OpenAI-compatible APIs, but for the focused 0.1.x release those
paths are validation-only and explicitly outside the supported default story. The public
contract is still the Claude Code bridge workflow.
Experimental validation may be useful for:
- OpenRouter and other OpenAI-compatible endpoints
- DeepSeek or Qwen endpoints you already operate
- local inference servers that mimic the Anthropic/OpenAI-style request shape
These paths are not part of the supported 0.1.x onboarding flow, are not surfaced in
the default CLI help, and may change without compatibility guarantees.
What Tok Is / Is Not
Tok is:
- A deterministic compression layer (no lossy LLM summarization)
- A bridge-first CLI optimized for Claude Code
- A safety-first workflow with visible fallback and degradation signals
Tok is not (yet):
- A broad multi-agent framework
- A general-purpose SDK for arbitrary Python applications
- A replacement for your existing tools (it runs invisibly underneath them)
The bridge is the supported public workflow. A Python SDK path exists but is experimental.
Demonstrated Savings
Here's an example of the tok stats output from a long session with heavy tool-result
repetition (207 API calls). This is not typical — it represents an upper bound from
a highly repetitive workload:
This output from a high-repetition session shows an upper-bound example. Your actual savings depend on session length, tool usage patterns, and provider pricing:
- Typical sessions (8+ turns): meaningful input-token savings on sustained work with repeated file reads and search operations
- Short sessions (< 8 turns): Tok defaults to baseline since compression overhead exceeds savings
- Fail-open safety — if compression risks fidelity, Tok falls back to uncompressed
See:
docs/claims_matrix.md— detailed claim evidence and statusdocs/pricing_verification.md— pricing methodologydocs/live_smoke_matrix.md— automated smoke test results
Technical Overview
Tok achieves its compression through several deterministic techniques:
Semantic Deduplication
- Content hashing: Identical tool results are detected via SHA-256 hashes and
replaced with
>>> tool:name|unchanged|cachedstubs - Delta compression: Changed results show only the diff:
>>> tool:name|delta|changed_lines:5 - Error normalization: Similar errors collapse to canonical forms like
|err:enoent|
Macro System (Experimental)
- Pattern recognition: Repeated command sequences are automatically learned as macros
- Cross-session persistence: High-value macros survive bridge restarts
- ROI tracking: Macros with lifetime savings above a threshold are preserved
Note: The macro system is active in the runtime pipeline but not part of the supported 0.1.x surface. Its behavior may change.
Wire Protocol
- BPE-aligned sigils: Single-character fields (
t:,g:,f:) minimize token cost - Structured state:
>>> t:2|g:refactor|f:src/main.py|cmds:pytestencodes context efficiently - Lossless round-trip: Tok state perfectly re-hydrates to original JSON/Markdown
Memory Architecture
- Hot/durable buckets: Recent context vs. long-term knowledge with different decay rates
- Bounded rolling state: Updates are constant-time; memory caps at ~600 hot + ~2000 durable entries — practical sessions never reach the cap
- Fail-open safety: Automatic fallback to baseline if compression risks fidelity
Pointer System (Experimental)
Internal cross-reference tracking for files, functions, and concepts. Not part of the supported 0.1.x surface.
Code Analysis (Sifter)
Internal AST-based extraction for Python code structure. Used by the compression engine but not part of the supported 0.1.x public API.
Tok Syntax Examples
Wire Protocol State
>>> t:3|g:refactor|f:src/main.py|cmds:pytest|e:import_error
- Turn 3, goal is refactor, working on src/main.py, ran pytest, encountered import error
Semantic Deduplication
# Original verbose result:
>>> tool:view_file|path:src/utils.py|unchanged|cached
# Delta compression:
>>> tool:edit_file|path:src/main.py|delta|changed_lines:5
--- a/src/main.py
+++ b/src/main.py
@@ -10,7 +10,7 @@
-def old_function():
+def new_function():
return True
Macro Usage
# Learned macro for testing workflow:
@run_tests(src="src/", coverage=True)
# Expands to: pytest src/ --cov=src --cov-report=html
These examples illustrate the internal wire protocol. Users do not write Tok syntax directly — the bridge handles all encoding and decoding transparently.
Prerequisites
- Python
3.10-3.12(tested for0.1.x) - macOS or Linux
- Claude Code installed and available as
claude - An Anthropic API key (
ANTHROPIC_API_KEY) already configured for Claude Code
Tok is a proxy — it does not manage API keys. It forwards whatever credentials Claude
Code already uses. If claude works without Tok, it will work with Tok.
Provider Posture
The supported 0.1.x product path is Claude Code routed through the local Tok bridge.
Validation-only evidence also exists for some OpenAI-compatible providers, but those paths are not the public contract for this release. Treat them as experimental unless a future release promotes them into the supported surface.
tok install is now a setup/migration helper and does not modify claude by default.
If you want legacy auto-routing behavior, run tok install --wrap-claude.
Install
Public install target:
pip install tok-protocol
If you are working from a local checkout instead of PyPI:
pip install .
Quickstart
Run this exact bridge-first flow:
tok init # optional: create project workspace
tok install
tok bridge start
ANTHROPIC_BASE_URL=http://localhost:9090 claude
tok bridge status
tok doctor
tok bridge stop
tok stats
Optional wrapper mode:
tok install --wrap-claude
source ~/.zshrc # or source ~/.bashrc
claude
The normal happy path is:
tok bridge statussays the bridge is running and Tok is activetok doctorends withRecommendation: keep Tok ontok statsshows saved dollars, saved percent, andWith Tok vs without Tok
Representative output:
Bridge running on :9090 (PID 12345)
Saved $0.0123 • 48.1% saved
Verdict Tok active and helping
Tok active yes
Degraded to baseline no
Fallbacks 0
If you see Degraded to baseline: yes or fallback counts rising, Tok protected the
session by serving requests without compression.
If you enabled wrapper mode and claude is still not found, reload your shell with
source ~/.zshrc or source ~/.bashrc before debugging Tok itself.
First 10 Minutes Troubleshooting
| If you see this | Check this first | Likely fix |
|---|---|---|
tok: command not found |
Was the package installed into the active Python environment? | Re-activate the environment and run pip install tok-protocol again. |
claude: command not found after tok install --wrap-claude |
Was your shell reloaded? | Run source ~/.zshrc or source ~/.bashrc, or open a new shell. |
Bridge not running |
Did tok bridge start succeed? |
Restart with tok bridge start --foreground and inspect tok bridge logs. |
| No savings visible yet | Is the session still very short? | Keep working for a few turns, then run tok doctor and tok stats --last-session, or tok stats for a lifetime view. |
Degraded to baseline: yes |
Did the session fall back for safety? | Run tok doctor first, then follow the steps in docs/troubleshooting.md. |
Clean-Room Install Verification
Use this when validating the package from scratch:
python -m venv .venv
source .venv/bin/activate
pip install tok-protocol
tok --version
tok --help
tok install
tok bridge start --help
tok bridge status --help
tok stats --help
If you are validating a local release artifact instead of PyPI, build and install the
wheel from dist/:
python -m build
python -m venv .venv
source .venv/bin/activate
pip install dist/tok_protocol-0.1.3-py3-none-any.whl
tok --version
tok --help
tok install
tok bridge start --help
tok bridge status --help
tok stats --help
In restricted or offline environments, a local wheel install still requires the published dependencies to be available in the environment or via an internal package mirror.
This is the minimum supported install bar for the first public release.
Bridge Workflow
flowchart LR
C["Claude Code"] --> B["Tok Bridge (:9090)"]
B --> R["Tok Runtime"]
R --> U["Model API"]
S["tok bridge status"] --> B
D["tok doctor"] --> B
T["tok stats"] --> R
To compare the same workflow with no compression:
TOK_MODE=baseline tok bridge start
ANTHROPIC_BASE_URL=http://localhost:9090 claude
tok stats
Baseline prices are calculated using current Openrouter USD rates.
Mode Selection
Tok supports two modes via the TOK_MODE environment variable:
tool-compatible(default): Applies compression with anatural_firstrequest policy. This is the recommended mode and the only supported mode for 0.1.x.baseline: No compression. All requests pass through unchanged. Use for debugging, measuring Tok's impact, or short sessions where compression overhead exceeds savings.
When to Use Baseline
Set TOK_MODE=baseline if:
- You're debugging Tok itself
- You need exact token counts for pricing estimates
- The session is very short (< 5 turns)
- You're testing a new model provider
TOK_MODE=baseline tok bridge start
Switching Modes Mid-Session
You can restart the bridge with a different mode at any time:
tok bridge stop
tok bridge start
The new mode applies to subsequent requests. Existing session state is preserved.
Experimental: Python Submodule APIs
Note: These APIs are experimental. They are not part of the supported
0.1.xcontract, are intentionally absent from the roottoknamespace, and may change without compatibility guarantees.
For advanced evaluation work outside the bridge-first CLI, use explicit submodule imports such as:
tok.runtime.core.RuntimeSessiontok.runtime.types.RuntimeRequesttok.universal_runtime.UniversalTokRuntime
See examples/tok_wrap_example.py and
examples/README.md for the current experimental examples.
Docs Map
Start here, then go deeper only if you need it:
docs/bridge.md: full bridge tutorialdocs/cli-reference.md: command referencedocs/troubleshooting.md: fallback, degraded sessions, logs, savings interpretationdocs/production-readiness.md: advanced runtime defaults and release posturedocs/release-checklist.md: maintainer release checklistdocs/public-release-decision.md: supported workflows, limitations, and release bardocs/maintainers/README.md: roadmap and internal planning docs
Repo Map
The repository is intentionally split by audience and lifecycle:
src/tok/: runtime, bridge, CLI, and library codedocs/: public product docs plus release/reference docsdocs/maintainers/: roadmap, refactoring notes, and maintainer-only planningexamples/: experimental wrapper/API examples outside the default bridge-first pathtests/: unit, integration, replay, and stability coverage
Validation Workflow
After working on the codebase, run the full validation flow using uv run to execute
the core regression suite, lint, and type checks:
uv run pre-commit run --all-files
uv run python -m pytest tests/unit/test_architecture.py tests/unit/validation_metrics.py tests/unit/test_adversarial.py tests/unit/test_memory_growth.py tests/unit/test_bridge_fidelity.py tests/unit/test_encoder_transformer.py tests/unit/test_schema_validation.py tests/unit/test_sifter.py tests/unit/test_error_handling.py -v
uv run ruff check src/tok/ tests/unit
uv run mypy src/tok/
Privacy
Tok runs locally. No data leaves your machine except the model/API calls you would already make.
License
Apache License, Version 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tok_protocol-0.1.3.tar.gz.
File metadata
- Download URL: tok_protocol-0.1.3.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6986e22e70bf21cd195333afcfcd8e3effff91bccb4cace103f48958c0361fdf
|
|
| MD5 |
03688cf57fd58228f301fcc39b587f01
|
|
| BLAKE2b-256 |
aa29f6bea7f38cca5a7789035c01b6bdbfa77f6b31e3c17d7d82595232cc2c8b
|
File details
Details for the file tok_protocol-0.1.3-py3-none-any.whl.
File metadata
- Download URL: tok_protocol-0.1.3-py3-none-any.whl
- Upload date:
- Size: 545.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b269625ded3eb677494a21871271727bab3b9cbf1872ace8ba148205f2ccbb6
|
|
| MD5 |
bc194bcc3fba4348a06b39f33318c93a
|
|
| BLAKE2b-256 |
6eb4c0bf8d5da51df504986fe57d15a180481a6233d76c0b9d09875688eaf94c
|