Self-hosted MCP server for gotcontext.ai — semantic compression, AST-aware code understanding, and context engineering for LLM agents.
Project description
Token Saver 5000
Token Saver 5000 is a local semantic compression system for AI context.
In plain terms: it takes large text/code context, keeps the important parts, and gives you a smaller context that is cheaper to send to models.
What This Project Actually Does
You give it a long document or codebase context.
It builds a semantic graph, ranks importance, and outputs a compressed "skeleton" you can query and expand.
Core outcomes:
- Lower token usage.
- Faster context handling.
- Better control over what information is kept vs omitted.
This is not tied to GitHub workflows. It works for any large context source (files, notes, transcripts, docs, code, or generated text).
Who This Is For
Use this if you:
- Work with large prompts/documents.
- Need to cut token cost.
- Want retrieval-oriented compression (not just naive summarization).
Common use cases:
- RAG context compression before answer generation.
- Long internal docs and wiki pages.
- Customer support transcripts and call notes.
- Legal/policy/contract text review prep.
- Large code and architecture context for agents.
- Multi-turn assistant memory compression.
Do not use this if you only have short prompts and token cost is irrelevant.
Proven Results: 12 Real-World User Journeys
Every journey runs locally. No API keys needed. Verified with 40+ tests.
python scripts/benchmark_cujs.py --verbose
| # | Journey | What Happens | Before | After | Savings |
|---|---|---|---|---|---|
| 1 | Solo Dev: Codebase Compression | Search 13-file project for "auth" → compress 6 matched files | 20,406 tokens | 767 tokens | 96.2% |
| 2 | Long Document Compression | Compress 2,206-line API reference doc | 16,461 tokens | 1,269 tokens | 92.3% |
| 3 | CLI Output Filtering | Filter git diff (320 lines) + pytest (86 lines) + npm install (28 lines) | 4,888 tokens | 777 tokens | 84.1% |
| 4 | Query-Focused Code Search | Ask "how does caching work?" → find 3 relevant files → compress | 20,406 tokens | 439 tokens | 97.8% |
| 5 | Session Recovery | Recover 7 events after conversation compaction | 26,600 tokens | 138 tokens | 99.5% |
| 6 | ROI Justification | 10 compressions on Claude Opus → show savings report | $1.27 saved | 4.4x ROI | $127/mo projected |
| 7 | Tool Schema Compression | 50 MCP tools → 3 meta-tools via SchemaCompressor | 10,376 tokens | 287 tokens | 97.2% |
| 8 | Code-Aware Compression | Compress 10 Python source files for code review | 15,547 tokens | 1,294 tokens | 91.7% |
| 9 | Dialogue Memory (AFM) | 22-message conversation → budget-aware context packing | 968 tokens | 481 tokens | 50.3% |
| 10 | Budget Governance | 10 sessions tracked against per-session/daily/monthly limits | 415,000 tokens | — | alerts |
| 11 | Tee/Recovery | Compress 3 CLI outputs + recover originals on demand | 4,888 tokens | 777 tokens | 84.1% |
| 12 | Team Dashboard Export | 5-member team aggregate → JSON/CSV/Prometheus export | 3,860,000 tokens | 579,000 tokens | 85.0% |
Aggregate: 4,485,940 input tokens → 590,849 output tokens (86.8% savings)
Journey Details
CUJ 1: Solo Developer with a Codebase. You have 13 Python files. You ask about authentication. Token Saver searches for relevant code (finds auth.py, middleware.py, tests), compresses only those files, skips everything irrelevant. 96.2% savings vs reading every file.
CUJ 2: Compressing Architecture Docs. You have a 2,206-line API reference. Token Saver builds a semantic graph, ranks importance via PageRank, generates a 13x compressed skeleton preserving endpoints, parameters, and error codes. 92.3% savings.
CUJ 3: Cleaning Up CLI Noise.
Your AI agent runs git diff, pytest, and npm install. Token Saver auto-detects each
command type and applies the right filter: stats extraction for git, failure focus for pytest,
summary for npm. 84.1% savings (434 lines → 67 lines).
CUJ 4: "How Does Caching Work?" You ask a question about a codebase. Token Saver searches first (finds cache.py, config.py, middleware.py), then compresses only those 3 files instead of all 13. 97.8% savings -- 75% better than compressing everything.
CUJ 5: Surviving Conversation Compaction. After a long Claude Code session, the conversation gets compacted and you lose context. Token Saver's session journal recovers all your prior work (5 ingested files, model config, compression profile, 26,600 tokens saved) in just 138 tokens.
CUJ 6: Proving ROI to Your Manager. After 10 compressions on Claude Opus ($15/MTok), the savings tracker shows: $1.27 saved, 4.4x ROI vs the $29/mo Pro plan, breakeven at 228 operations. Projected $127/month savings. The tool pays for itself on day 1.
CUJ 7: Tool Schema Compression (Proxy Mode). Your MCP server exposes 50 tools. The proxy replaces all individual schemas with 3 meta-tools (search_tools, get_tool_schema, invoke_tool) — agents discover tools on demand instead of loading all schemas upfront. 97.2% savings on tool context.
CUJ 8: Code-Aware Compression. You need AI to review 10 Python files. Token Saver compresses each file using semantic graph + PageRank, preserving function signatures, class structure, and key logic. 91.7% savings — the AI sees the important parts without reading every line.
CUJ 9: Dialogue Memory (AFM). After 22 back-and-forth messages, your context window is filling up. Adaptive Focus Memory ranks each message by importance and recency, keeps critical messages in full, compresses the rest to placeholders. 50.3% savings while preserving conversation coherence.
CUJ 10: Budget Governance. Your team has token budgets per session (100K), per day (500K), and per month (10M). After 10 coding sessions consuming 415K tokens, the budget monitor alerts: session CRITICAL, daily WARNING, monthly OK. No surprise bills.
CUJ 11: Tee/Recovery. The CLI optimizer compresses git diff, pytest, and npm output aggressively. But you need the full pytest output to debug a failure. Tee/recovery saved the original — retrieve it by ID. 84.1% savings with a safety net.
CUJ 12: Team Dashboard Export (Enterprise). Your 5-person team ran 51 sessions, consuming 3.86M tokens. Token Saver compressed to 579K (85% savings). Export the data as JSON for your dashboard, CSV for spreadsheets, or Prometheus metrics for Grafana. Justify the tool spend to your VP of Engineering.
Two Ways To Use It
There are two product surfaces in this repo:
- MCP server (
src.server) for Claude/Desktop and agent workflows. - Self-contained skill scripts (
skills/token-saver-context-compression) that run locally without MCP.
Multi-Tenant SaaS Deployment
Token Saver 5000 can also be used as a multi-tenant context service, not just a local MCP helper.
The core scope fields are:
workspace_id: isolates one customer or team workspace.user_id: isolates a person within that workspace.agent_id: isolates one automated agent or role.session_id: isolates one short-lived interaction thread.
Use those fields consistently across memory, prompts, connector feeds, temporal exports, and handoff bundles when you expose the system behind a shared API gateway or multi-tenant worker.
If you are deploying for multiple customers, read docs/deployment/SAAS_MULTI_TENANT.md.
Local vs Docker
You do not need Docker. Docker is optional.
Choose your runtime:
- Local Python:
- Best for development and quick usage.
- Direct access to scripts and source.
- Command:
token-saver-mcp
- Docker:
- Best for reproducible deployment/team environments.
- Avoids local dependency drift.
- Command:
docker-compose up -d
First 10 Minutes (Recommended Path)
- Get the code:
git clone https://github.com/oimiragieo/token-saver-5000.git
cd token-saver-5000
- Install it like a tool:
Option A: uv (recommended)
uv tool install -e .
Option B: pipx
pipx install .
Option C: developer/editable install
pip install -r requirements.txt
pip install -e .
- Run guided setup:
token-saver-setup --auto
That command picks the most likely target for your environment:
desktopfor Claude Desktop-centric local use.portable-projectwhen you run it inside a repo/workspace that looks project-scoped.
If you want the low-level status report only:
token-saver-install-mcp --doctor --human
For a deeper, network-using verification pass that downloads the embedding model and runs a smoke test:
python scripts/check_setup.py
- Run a local example:
python examples/example_usage.py
- Try the self-contained skill scripts:
python skills/token-saver-context-compression/scripts/profile_tokens.py --file tests/fixtures/skill_context_sample.txt --output-format auto
python skills/token-saver-context-compression/scripts/compress_context.py --file tests/fixtures/skill_context_sample.txt --mode query_guided --query "what are the retry rules?" --output-format auto
python skills/token-saver-context-compression/scripts/validate_evidence.py --file tests/fixtures/skill_context_sample.txt --query "what are the retry rules?" --min-similarity 0.4
How The Compression Flow Works
At a high level:
- Ingest text (
ingest_context). - Chunk + embed text.
- Build semantic graph (nodes=chunks, edges=semantic similarity).
- Rank nodes by importance.
- Return compressed skeleton (
read_skeleton). - Search/extract relevant regions (
search_semantic,modulate_region).
If query-aware mode is used, scoring is biased toward the query.
If evidence-aware mode is used, it checks whether selected context likely contains enough answer-supporting evidence.
read_skeleton now also returns a pipeline object so you can inspect which passes ran:
baselinequery_guidedevidence_aware
That makes it easier to debug why a document was compressed a certain way and to verify when evidence-aware retrieval expanded or changed the final anchor set.
Core MCP Tools (The Ones Most Users Need)
If you are new, start with these 7:
ingest_context: add a document.read_skeleton: view compressed structure.search_semantic: find relevant nodes by query.modulate_region: expand selected nodes at chosen fidelity.get_stats: view compression stats.list_documents: list ingested docs.delete_document: remove a doc.
You can force this minimal surface with:
MCP_TOOL_PROFILE=core_stable python -m src.server
Or, after installing the tool:
MCP_TOOL_PROFILE=core_stable token-saver-mcp
Client-Aware Token Optimization Tools
If you use Token Saver with a specific LLM client (Claude Code, Gemini CLI, etc.), these tools auto-tune compression for your model's context window and behavior:
configure_for_client: set model ID or explicit context window size. Auto-tunes skeleton ratio based on window size and how aggressively the client compresses. Supports Claude, Gemini, GPT, and explicit overrides.estimate_tokens: multi-method token estimation (tiktoken, fast len/4, Gemini-compatible, JSON-density, raw bytes). Use to budget context before ingestion.set_compression_profile: named presets (minimal/summary/balanced/detailed/full) that bundle skeleton_ratio, fidelity, and chunk_size into one setting.get_compression_profile: view the active profile and available profiles.
Example: configure for Gemini CLI (1M context, aggressive compression at 50%):
# Via MCP tool call
configure_for_client(model_id="gemini-2.5-pro")
# -> skeleton_ratio ~0.31 (vs ~0.50 for Claude with same window)
For details, see docs/claude-code-token-optimization-enhancements.md,
docs/gemini-cli-token-optimization-enhancements.md, and
docs/codex-cli-token-optimization-enhancements.md.
Proven Benchmark Results
Document compression (locally measured, reproducible): Token Saver achieves 13x document compression (16,461 tokens -> 1,269 tokens) via semantic graph + PageRank + token-level refinement + lossless meta-tokens.
API input token savings (real measurements, total content tokens):
| Provider | Baseline | Compressed | Savings | Notes |
|---|---|---|---|---|
| Codex (gpt-5.1-codex) | 37,514 | 23,189 | ~38% | Most stable (consistent cache) |
| Gemini CLI (Flash) | 69,172 | 30,672 | ~56% | Measured via prompt field (cache-independent) |
| Claude Code (Opus 4.6) | ~61,500 | ~45,300 | ~26% | Approximate (cache state varies +/-3%) |
Methodology:
- Validated across 3 independent runs with zero variance (min = max = median)
- "Total content tokens" used (not billed tokens) to remove cache hit/miss variance
- "Doc Compression" = document-only reduction (92.3%) vs "Total API Savings" = including system prompt
- System prompt overhead varies by provider (~42K for Claude, ~16K for Codex, ~28K for Gemini)
- Cost savings NOT reported as headline (volatile due to cache pricing effects)
- Answer quality NOT formally evaluated (compression may affect response quality)
All sizes (large corpus, 2206 lines):
| Corpus | Claude | Codex | Gemini |
|---|---|---|---|
| small (156 lines) | ~2% | ~4% | ~8% |
| medium (479 lines) | ~9% | ~14% | ~26% |
| large (2,206 lines) | ~26% | ~38% | ~56% |
Real-world savings depend on system prompt size: smaller system prompts (Codex, Gemini) see proportionally larger savings from document compression.
Run benchmarks yourself:
# Dry run (no API calls, validates setup)
python scripts/benchmark_token_savings.py --dry-run --verbose
# Full benchmark across all providers
python scripts/benchmark_token_savings.py --mode skill --verbose --output results.json
# Single provider, single corpus size
python scripts/benchmark_token_savings.py --providers claude --sizes large --verbose
Token Savings Tracker (NEW)
Every compression operation is tracked with exact dollar savings. See your ROI in real-time:
get_savings_report(session_id="my-session")
# -> {
# "total_tokens_saved": 142,500,
# "total_dollars_saved": 2.14,
# "avg_compression_ratio": 13.0,
# "monthly_projected_savings": 64.20,
# "roi_vs_pro_plan": 2.2,
# "breakeven_operations": 14,
# "by_tool": {"ingest_context": {"operations": 8, "dollars_saved": 1.87}, ...}
# }
The tracker computes: tokens saved, dollar savings (model-aware pricing), compression ratios, monthly projections, ROI vs the $29/mo Pro plan, and breakeven analysis. Persists to SQLite so savings accumulate across sessions.
CLI Output Optimizer (NEW -- RTK-Inspired)
Coding agent CLI output is the #1 token waster. Token Saver now auto-detects 10 command types and applies optimal filtering:
| Command | Strategy | Typical Savings |
|---|---|---|
git diff |
Extract file list + stats summary | 90-99% |
pytest / jest |
Show only failures + summary | 94-99% |
npm install / pip install |
Keep summary, strip progress | 85-95% |
| Lint (ruff, eslint) | Group by rule, count occurrences | 80-90% |
| JSON output | Extract keys + types, first item | 80-95% |
| Logs | Deduplicate repeated lines | 70-85% |
| Any colored output | Strip ANSI escape codes | 10-30% |
Use directly via MCP tool:
filter_cli_output(text="<raw CLI output>")
Or automatically via the proxy (applies to all upstream tool responses):
token-saver-proxy npx some-server --provider anthropic
Falls back to RTK when installed for maximum quality.
MCP Proxy Mode (NEW)
Wrap ANY MCP server transparently -- compress tool responses automatically with zero code changes:
# Compress any MCP server's output
token-saver-proxy npx some-mcp-server --provider anthropic
# Enable schema compression (N tools -> 3 meta-tools, ~96% token reduction)
token-saver-proxy python -m my_server --schema-compression
In Claude Desktop config:
{
"mcpServers": {
"my-server-compressed": {
"command": "token-saver-proxy",
"args": ["npx", "some-mcp-server", "--provider", "anthropic"]
}
}
}
The proxy applies TokenRefiner + MetaTokenCompressor to every tool response. Optional --schema-compression replaces all upstream tools with 3 meta-tools (search_tools, get_tool_schema, invoke_tool).
Session Continuity (NEW)
Token Saver now survives conversation compaction. A SQLite-backed journal records all ingestions, configurations, and tool calls. After the CLI compacts your conversation:
# Call recover_session to get a compact summary of everything that happened
recover_session(session_id="my-session")
# -> {ingested_files: [...], client_config: {...}, active_profile: "balanced", total_tokens_saved: 14500}
Cache Strategy Advisor (NEW)
Every LLM provider handles caching differently. The advisor tells you exactly what to do:
advise_cache_strategy(model_id="claude-4-sonnet")
# -> Anthropic: explicit cache, 90% discount, add ephemeral markers, 5min TTL
advise_cache_strategy(model_id="gpt-4.1")
# -> OpenAI: automatic cache, 50% discount, keep 1024+ token prefix stable
advise_cache_strategy(model_id="gemini-2.5-flash")
# -> Google: implicit cache, 90% discount, no client action needed
advise_cache_strategy(model_id="groq-llama-4-scout")
# -> Groq: no caching, focus on small prompts for fastest inference
Supports: Anthropic, OpenAI, Google Gemini, Groq, XAI (Grok), Azure, Bedrock, local/Ollama.
Multi-Agent Setup (NEW)
Install Token Saver for any AI coding agent with a single command:
token-saver-setup --auto # Auto-detect your environment
token-saver-install-mcp --agent cursor # Cursor
token-saver-install-mcp --agent windsurf # Windsurf
token-saver-install-mcp --agent cline # Cline / Roo Code
token-saver-install-mcp --agent codex # OpenAI Codex CLI
token-saver-install-mcp --agent gemini # Gemini CLI
token-saver-install-mcp --agent copilot # VS Code Copilot
token-saver-install-mcp --doctor-all # Check all agent configs
8 agents supported: Claude Desktop, Claude Code (project), Cursor, Windsurf, Cline, VS Code Copilot, Codex, and Gemini CLI.
Savings Dashboard (NEW)
Track your token savings across sessions:
token-saver-stats # All-time summary
token-saver-stats --daily # Day-by-day breakdown
token-saver-stats --weekly # Weekly summary
token-saver-stats --by-tool # Per-tool breakdown
token-saver-stats --cost # Cost savings with model pricing
token-saver-stats --json # Machine-readable output
token-saver-stats --csv # Spreadsheet export
ROI Calculator (NEW)
Calculate your return on investment via the calculate_roi MCP tool:
Input: model=claude-opus-4-6, tokens_per_day=500000, team_size=10
Output:
Without gotcontext: $1,650.00/mo
With gotcontext: $247.50/mo (85% savings)
Pro plan cost: $290.00/mo ($29/user × 10 users)
Net savings: $1,112.50/mo (5.7x ROI)
Supports 20+ models with real pricing data.
Token Budget Monitoring (NEW)
Set per-session, daily, or monthly token budgets via check_budget MCP tool or environment variables:
TOKEN_BUDGET_SESSION=500000 TOKEN_BUDGET_DAILY=2000000 token-saver-mcp
Returns usage status, alert levels (ok/info/warning/critical), and projected usage.
Team Dashboard Export (NEW)
Export aggregated team savings data via export_team_data MCP tool:
- JSON: For API consumption and custom dashboards.
- CSV: For spreadsheet analysis.
- Prometheus: For Grafana/Datadog monitoring.
Tee/Recovery System (NEW)
When compression drops information, the original is saved for recovery:
TEE_MODE=failures token-saver-mcp # Tee on high compression (default)
TEE_MODE=always token-saver-mcp # Tee everything
MCP tools: get_original_output, list_tee_entries, tee_store_stats.
Custom Filter Rules (NEW)
Define project-specific output filtering rules in .gotcontext.toml:
[filters.my_build_output]
match_command = "my-build-tool"
strip_ansi = true
strip_lines_matching = ["^Progress:", "^\\s*$"]
keep_lines_matching = ["^ERROR", "^WARNING"]
head_lines = 50
tail_lines = 20
max_lines = 100
Supports 8-stage pipeline, inline tests, project + user-global precedence.
Missed Savings Discovery (NEW)
The discover_savings MCP tool scans directories to find files that would benefit from compression:
discover_savings(directory="/path/to/project")
→ README.md: ~2,400 tokens → ~300 compressed (87% savings)
→ src/main.py: ~800 tokens → ~200 compressed (75% savings)
→ Total opportunity: ~12,000 tokens saveable
Research-Backed Compression Techniques
Token Saver integrates techniques from recent AI research papers:
compress_meta_tokens: lossless LZ77-inspired compression (arXiv 2506.00307). Replaces repeated token subsequences with §N symbols + dictionary header. Fully reversible.recommend_compression: quality-floor-based profile selection (arXiv 2603.19733). Specify minimum acceptable quality (e.g. 0.85) instead of manually choosing a compression profile. Auto-selects the most aggressive profile that meets your quality target.- COMI MIG scoring (arXiv 2602.01719): query-aware token refinement. When a query is provided, the token refiner uses Marginal Information Gain to keep relevant tokens and remove redundant ones, instead of simple filler-word removal.
Cache Optimization Features
Token Saver automatically optimizes for each provider's caching behavior:
- Cache-stable response ordering: tool responses are key-ordered so stable metadata (status, file_id) sits at the prefix for Claude/Gemini cache hits, and is mirrored at the tail for Codex's middle-truncation pattern.
- Token-level refinement: LLMLingua-inspired post-processing removes articles, fillers, and hedges from compressed skeletons (20-40% additional reduction). Preserves numbers, code identifiers, URLs, and sentence boundaries.
- TurboQuant-inspired embedding quantization: 384-dim float32 embeddings compressed to 96-dim int8 (13x memory reduction) using random orthogonal rotation + int8 quantization + 1-bit residual error correction. >0.99 fidelity in the compressed subspace.
Prompt Cache Observability Tools
If you are optimizing for prompt caching, the most relevant MCP tools are:
audit_prompt_cacheability: checks section ordering and volatility before provider calls.render_prompt_template: produces a canonical cache-friendly prompt plus aprompt_id.assess_cache_compatibility: checks whether Gemini CLI, Claude Code, Codex, or raw provider APIs expose enough cache telemetry to validate real reuse.capture_cache_telemetry: normalizes provider cache-hit telemetry from Claude, OpenAI, and Gemini responses.diagnose_cache_miss: explains likely causes of unexpected misses, partial reuse, section drift, and cache-creation churn.
The model-optimization layer now also exposes:
- provider-specific cache threshold guidance via
optimize_for_model - deterministic
prompt_cache_keyguidance for OpenAI/Codex-style routing stickiness - local extractive compression and history-compaction primitives for lower-latency context trimming
- benchmark method comparisons between semantic and extractive baselines
For usage guidance, see docs/guides/PROMPT_CACHING.md.
For Gemini CLI, Claude, and Codex compatibility guidance, see docs/guides/PROVIDER_CACHE_COMPATIBILITY.md.
Skill Scripts (No MCP Required)
Path: skills/token-saver-context-compression/scripts/
Main scripts:
profile_tokens.py: raw vs compressed token profile.compress_context.py: baseline/query-guided/evidence-aware compression.validate_evidence.py: checks if compressed output has enough evidence.run_skill_workflow.py: profile + compress + evidence in one command.benchmark_toon_vs_json.py: TOON/JSON token + quality guard checks.
All support local execution with no dependency on external MCP wrappers.
Data Source Flexibility
You can feed Token Saver from any source as long as you provide text input:
- Local files (
--file). - Pasted text (
--text). - Piped stdin from another command.
- Upstream connectors that export text payloads.
The compressor itself is source-agnostic; GitHub is just one possible integration path, not a requirement.
Output Formats (JSON vs TOON)
Skill scripts support:
--output-format json--output-format toon--output-format auto
auto behavior:
- Select TOON only when data shape is TOON-friendly (uniform object arrays) and token-efficient.
- Fall back to JSON otherwise.
Repo Structure (Practical Map)
src/- core implementation.src/handlers/- MCP tool handlers.src/semantic_modulator/- app/api/service-layer architecture.skills/- portable no-MCP skill package.scripts/- benchmark/setup/dev scripts.tests/- unit/integration/regression tests.docs/- detailed guides and reference docs.
Present but not wired into the default runtime:
src/reliability.py— timeout, circuit breaker, and retry primitives. Fully tested but not yet integrated into the server hot path. Best first integration target: async batch compression incompression_handlers.py.src/multimodal_compressor.py,src/training_utils.py— experimental research modules. Gated behind"experimental": truein handler responses. Not part of the core MCP tool surface.
Run The Server (MCP Mode)
token-saver-mcp
For web/API deployments, set HTTP_ENABLED=true to start an HTTP server with health/metrics endpoints alongside the MCP server:
HTTP_ENABLED=true HTTP_PORT=8080 token-saver-mcp
Endpoints: /health/liveness, /health/readiness, /health/diagnostics, /metrics (Prometheus).
See docs/deployment/DOCKER.md and docs/deployment/SAAS_MULTI_TENANT.md for reverse proxy and API gateway patterns.
Claude Desktop config example:
{
"mcpServers": {
"token-saver": {
"command": "token-saver-mcp",
"args": [],
"cwd": "/path/to/token-saver-5000"
}
}
}
The simplest setup path is:
token-saver-setup --auto
To install that entry automatically into Claude Desktop with the low-level installer:
token-saver-install-mcp
To generate a project-scoped .claude\.mcp.json for Claude Code or another MCP-aware workspace:
token-saver-install-mcp --project-config
To generate a portable project-scoped config for a shared repo using ${workspaceFolder}:
token-saver-install-mcp --portable-project-config
If you want raw JSON instead of writing the project config file:
token-saver-install-mcp --print-config > .mcp.json
To inspect whether the command, Claude Desktop config, and project config are installed correctly:
token-saver-install-mcp --doctor --human
To uninstall cleanly:
token-saver-setup --uninstall-all
Or target just one surface:
token-saver-setup --uninstall --desktop
token-saver-setup --uninstall --portable-project
The MCP server now also exposes first-class prompts and resources:
- Prompts for document compression, prompt-cache review, and MCP setup guidance.
- Resources for tool catalogs, workflow instructions, install modes, and live install status.
- A resource template at
token-saver://tool/{name}/helpfor canonical per-tool help payloads.
Test and Quality Commands
Run tests:
pytest tests/ -v
Run benchmark guard:
python scripts/benchmarks/run_benchmarks.py --compare baseline,query_guided,evidence_aware
python scripts/benchmarks/check_benchmark_guard.py --strict-case-set --summary-file artifacts/benchmarks/guard_summary.md
Lint/format:
python -m ruff check src tests scripts skills
python -m black src tests scripts skills
Version and Requirements
- Version:
0.11.0 - Python:
3.10-3.14(chromadb requires 3.10-3.12) - Suggested RAM:
~4GBfor embedding workloads
Version source-of-truth: pyproject.toml (all other files derive from it).
Roadmap
This repo is the open-source local MCP tool. gotcontext.ai will be the SaaS platform built on top of it.
Planned for gotcontext.ai:
- Context as a Service (CaaS) — cloud API for semantic compression with team dashboards, session history, and usage metering
- Knowledge Hub — model-agnostic RAG notebooks with compressed retrieval. Upload docs, chat with compressed context. Like NotebookLM but open, self-hostable, and 85% more token-efficient. Built on open-notebook (21.8K★).
- Agent Context Hub — always-current framework docs for AI coding agents with compressed retrieval. Like Context7/ref.tools but local-first, open source, and 85% fewer tokens per response. Built on docs-mcp-server (1.2K★).
- Global AI Benchmark Repository — crowd-sourced database of model inference performance across hardware, quantization formats, and providers. Like UserBenchmark but for AI.
- AI News Center — curated AI infrastructure intelligence hub with data-driven reports from benchmark data
- Model-aware routing — combine context compression with benchmark data to recommend optimal model + quant for each request
See docs/GO_TO_MARKET_PLAN.md for the full 5-product platform strategy.
Documentation
Start here:
docs/getting-started/GETTING_STARTED.mddocs/guides/HOW_IT_WORKS.mddocs/reference/ARCHITECTURE.mddocs/guides/MCP_TOOLS_GUIDE.mddocs/guides/WORKFLOW_ORCHESTRATION.mddocs/deployment/SAAS_MULTI_TENANT.mdCHANGELOG.md
License
MIT (LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gotcontext_server-0.11.0.tar.gz.
File metadata
- Download URL: gotcontext_server-0.11.0.tar.gz
- Upload date:
- Size: 909.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a89150138dcab21145f343a0fb09ae24a86c5471fd589f16deba7ed08a1c374
|
|
| MD5 |
b142b3e94f5bb43cce9c0b3abed4b7d3
|
|
| BLAKE2b-256 |
914db1b627364544b8759b4972dc7e8eac6f90f260433be83b6f92687ba90133
|
Provenance
The following attestation bundles were made for gotcontext_server-0.11.0.tar.gz:
Publisher:
publish.yml on oimiragieo/token-saver-5000
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gotcontext_server-0.11.0.tar.gz -
Subject digest:
9a89150138dcab21145f343a0fb09ae24a86c5471fd589f16deba7ed08a1c374 - Sigstore transparency entry: 1364840466
- Sigstore integration time:
-
Permalink:
oimiragieo/token-saver-5000@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/oimiragieo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad -
Trigger Event:
push
-
Statement type:
File details
Details for the file gotcontext_server-0.11.0-py3-none-any.whl.
File metadata
- Download URL: gotcontext_server-0.11.0-py3-none-any.whl
- Upload date:
- Size: 552.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9f1ca4771bc5d2cac9dfcf0179ae16828e2546fbe0b82d080ca3959d61e0181
|
|
| MD5 |
810904d8cc532b3dce643f7af4148a24
|
|
| BLAKE2b-256 |
d120768fdec3f2c9526a78a882b21c69281d497b59648db84d49afbc78a178b9
|
Provenance
The following attestation bundles were made for gotcontext_server-0.11.0-py3-none-any.whl:
Publisher:
publish.yml on oimiragieo/token-saver-5000
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gotcontext_server-0.11.0-py3-none-any.whl -
Subject digest:
e9f1ca4771bc5d2cac9dfcf0179ae16828e2546fbe0b82d080ca3959d61e0181 - Sigstore transparency entry: 1364840483
- Sigstore integration time:
-
Permalink:
oimiragieo/token-saver-5000@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/oimiragieo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad -
Trigger Event:
push
-
Statement type: