Self-hosted MCP server for gotcontext.ai — semantic compression, AST-aware code understanding, and context engineering for LLM agents.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jhollingsworth

These details have not been verified by PyPI

Project links

Project description

Token Saver 5000

Token Saver 5000 is a local semantic compression system for AI context.

In plain terms: it takes large text/code context, keeps the important parts, and gives you a smaller context that is cheaper to send to models.

What This Project Actually Does

You give it a long document or codebase context.
It builds a semantic graph, ranks importance, and outputs a compressed "skeleton" you can query and expand.

Core outcomes:

Lower token usage.
Faster context handling.
Better control over what information is kept vs omitted.

This is not tied to GitHub workflows. It works for any large context source (files, notes, transcripts, docs, code, or generated text).

Who This Is For

Use this if you:

Work with large prompts/documents.
Need to cut token cost.
Want retrieval-oriented compression (not just naive summarization).

Common use cases:

RAG context compression before answer generation.
Long internal docs and wiki pages.
Customer support transcripts and call notes.
Legal/policy/contract text review prep.
Large code and architecture context for agents.
Multi-turn assistant memory compression.

Do not use this if you only have short prompts and token cost is irrelevant.

Proven Results: 12 Real-World User Journeys

Every journey runs locally. No API keys needed. Verified with 40+ tests.

python scripts/benchmark_cujs.py --verbose

#	Journey	What Happens	Before	After	Savings
1	Solo Dev: Codebase Compression	Search 13-file project for "auth" → compress 6 matched files	20,406 tokens	767 tokens	96.2%
2	Long Document Compression	Compress 2,206-line API reference doc	16,461 tokens	1,269 tokens	92.3%
3	CLI Output Filtering	Filter git diff (320 lines) + pytest (86 lines) + npm install (28 lines)	4,888 tokens	777 tokens	84.1%
4	Query-Focused Code Search	Ask "how does caching work?" → find 3 relevant files → compress	20,406 tokens	439 tokens	97.8%
5	Session Recovery	Recover 7 events after conversation compaction	26,600 tokens	138 tokens	99.5%
6	ROI Justification	10 compressions on Claude Opus → show savings report	$1.27 saved	4.4x ROI	$127/mo projected
7	Tool Schema Compression	50 MCP tools → 3 meta-tools via SchemaCompressor	10,376 tokens	287 tokens	97.2%
8	Code-Aware Compression	Compress 10 Python source files for code review	15,547 tokens	1,294 tokens	91.7%
9	Dialogue Memory (AFM)	22-message conversation → budget-aware context packing	968 tokens	481 tokens	50.3%
10	Budget Governance	10 sessions tracked against per-session/daily/monthly limits	415,000 tokens	—	alerts
11	Tee/Recovery	Compress 3 CLI outputs + recover originals on demand	4,888 tokens	777 tokens	84.1%
12	Team Dashboard Export	5-member team aggregate → JSON/CSV/Prometheus export	3,860,000 tokens	579,000 tokens	85.0%

Aggregate: 4,485,940 input tokens → 590,849 output tokens (86.8% savings)

Journey Details

CUJ 1: Solo Developer with a Codebase. You have 13 Python files. You ask about authentication. Token Saver searches for relevant code (finds auth.py, middleware.py, tests), compresses only those files, skips everything irrelevant. 96.2% savings vs reading every file.

CUJ 2: Compressing Architecture Docs. You have a 2,206-line API reference. Token Saver builds a semantic graph, ranks importance via PageRank, generates a 13x compressed skeleton preserving endpoints, parameters, and error codes. 92.3% savings.

CUJ 3: Cleaning Up CLI Noise. Your AI agent runs git diff, pytest, and npm install. Token Saver auto-detects each command type and applies the right filter: stats extraction for git, failure focus for pytest, summary for npm. 84.1% savings (434 lines → 67 lines).

CUJ 4: "How Does Caching Work?" You ask a question about a codebase. Token Saver searches first (finds cache.py, config.py, middleware.py), then compresses only those 3 files instead of all 13. 97.8% savings -- 75% better than compressing everything.

CUJ 5: Surviving Conversation Compaction. After a long Claude Code session, the conversation gets compacted and you lose context. Token Saver's session journal recovers all your prior work (5 ingested files, model config, compression profile, 26,600 tokens saved) in just 138 tokens.

CUJ 6: Proving ROI to Your Manager. After 10 compressions on Claude Opus ($15/MTok), the savings tracker shows: $1.27 saved, 4.4x ROI vs the $29/mo Pro plan, breakeven at 228 operations. Projected $127/month savings. The tool pays for itself on day 1.

CUJ 7: Tool Schema Compression (Proxy Mode). Your MCP server exposes 50 tools. The proxy replaces all individual schemas with 3 meta-tools (search_tools, get_tool_schema, invoke_tool) — agents discover tools on demand instead of loading all schemas upfront. 97.2% savings on tool context.

CUJ 8: Code-Aware Compression. You need AI to review 10 Python files. Token Saver compresses each file using semantic graph + PageRank, preserving function signatures, class structure, and key logic. 91.7% savings — the AI sees the important parts without reading every line.

CUJ 9: Dialogue Memory (AFM). After 22 back-and-forth messages, your context window is filling up. Adaptive Focus Memory ranks each message by importance and recency, keeps critical messages in full, compresses the rest to placeholders. 50.3% savings while preserving conversation coherence.

CUJ 10: Budget Governance. Your team has token budgets per session (100K), per day (500K), and per month (10M). After 10 coding sessions consuming 415K tokens, the budget monitor alerts: session CRITICAL, daily WARNING, monthly OK. No surprise bills.

CUJ 11: Tee/Recovery. The CLI optimizer compresses git diff, pytest, and npm output aggressively. But you need the full pytest output to debug a failure. Tee/recovery saved the original — retrieve it by ID. 84.1% savings with a safety net.

CUJ 12: Team Dashboard Export (Enterprise). Your 5-person team ran 51 sessions, consuming 3.86M tokens. Token Saver compressed to 579K (85% savings). Export the data as JSON for your dashboard, CSV for spreadsheets, or Prometheus metrics for Grafana. Justify the tool spend to your VP of Engineering.

Two Ways To Use It

There are two product surfaces in this repo:

MCP server (src.server) for Claude/Desktop and agent workflows.
Self-contained skill scripts (skills/token-saver-context-compression) that run locally without MCP.

Multi-Tenant SaaS Deployment

Token Saver 5000 can also be used as a multi-tenant context service, not just a local MCP helper.

The core scope fields are:

workspace_id: isolates one customer or team workspace.
user_id: isolates a person within that workspace.
agent_id: isolates one automated agent or role.
session_id: isolates one short-lived interaction thread.

Use those fields consistently across memory, prompts, connector feeds, temporal exports, and handoff bundles when you expose the system behind a shared API gateway or multi-tenant worker.

If you are deploying for multiple customers, read docs/deployment/SAAS_MULTI_TENANT.md.

Local vs Docker

You do not need Docker. Docker is optional.

Choose your runtime:

Local Python:
- Best for development and quick usage.
- Direct access to scripts and source.
- Command: token-saver-mcp
Docker:
- Best for reproducible deployment/team environments.
- Avoids local dependency drift.
- Command: docker-compose up -d

First 10 Minutes (Recommended Path)

Get the code:

git clone https://github.com/oimiragieo/token-saver-5000.git
cd token-saver-5000

Install it like a tool:

Option A: uv (recommended)

uv tool install -e .

Option B: pipx

pipx install .

Option C: developer/editable install

pip install -r requirements.txt
pip install -e .

Run guided setup:

token-saver-setup --auto

That command picks the most likely target for your environment:

desktop for Claude Desktop-centric local use.
portable-project when you run it inside a repo/workspace that looks project-scoped.

If you want the low-level status report only:

token-saver-install-mcp --doctor --human

For a deeper, network-using verification pass that downloads the embedding model and runs a smoke test:

python scripts/check_setup.py

Run a local example:

python examples/example_usage.py

Try the self-contained skill scripts:

python skills/token-saver-context-compression/scripts/profile_tokens.py --file tests/fixtures/skill_context_sample.txt --output-format auto
python skills/token-saver-context-compression/scripts/compress_context.py --file tests/fixtures/skill_context_sample.txt --mode query_guided --query "what are the retry rules?" --output-format auto
python skills/token-saver-context-compression/scripts/validate_evidence.py --file tests/fixtures/skill_context_sample.txt --query "what are the retry rules?" --min-similarity 0.4

How The Compression Flow Works

At a high level:

Ingest text (ingest_context).
Chunk + embed text.
Build semantic graph (nodes=chunks, edges=semantic similarity).
Rank nodes by importance.
Return compressed skeleton (read_skeleton).
Search/extract relevant regions (search_semantic, modulate_region).

If query-aware mode is used, scoring is biased toward the query.
If evidence-aware mode is used, it checks whether selected context likely contains enough answer-supporting evidence.

read_skeleton now also returns a pipeline object so you can inspect which passes ran:

baseline
query_guided
evidence_aware

That makes it easier to debug why a document was compressed a certain way and to verify when evidence-aware retrieval expanded or changed the final anchor set.

Core MCP Tools (The Ones Most Users Need)

If you are new, start with these 7:

ingest_context: add a document.
read_skeleton: view compressed structure.
search_semantic: find relevant nodes by query.
modulate_region: expand selected nodes at chosen fidelity.
get_stats: view compression stats.
list_documents: list ingested docs.
delete_document: remove a doc.

You can force this minimal surface with:

MCP_TOOL_PROFILE=core_stable python -m src.server

Or, after installing the tool:

MCP_TOOL_PROFILE=core_stable token-saver-mcp

Client-Aware Token Optimization Tools

If you use Token Saver with a specific LLM client (Claude Code, Gemini CLI, etc.), these tools auto-tune compression for your model's context window and behavior:

configure_for_client: set model ID or explicit context window size. Auto-tunes skeleton ratio based on window size and how aggressively the client compresses. Supports Claude, Gemini, GPT, and explicit overrides.
estimate_tokens: multi-method token estimation (tiktoken, fast len/4, Gemini-compatible, JSON-density, raw bytes). Use to budget context before ingestion.
set_compression_profile: named presets (minimal/summary/balanced/detailed/full) that bundle skeleton_ratio, fidelity, and chunk_size into one setting.
get_compression_profile: view the active profile and available profiles.

Example: configure for Gemini CLI (1M context, aggressive compression at 50%):

# Via MCP tool call
configure_for_client(model_id="gemini-2.5-pro")
# -> skeleton_ratio ~0.31 (vs ~0.50 for Claude with same window)

For details, see docs/claude-code-token-optimization-enhancements.md, docs/gemini-cli-token-optimization-enhancements.md, and docs/codex-cli-token-optimization-enhancements.md.

Proven Benchmark Results

Document compression (locally measured, reproducible): Token Saver achieves 13x document compression (16,461 tokens -> 1,269 tokens) via semantic graph + PageRank + token-level refinement + lossless meta-tokens.

API input token savings (real measurements, total content tokens):

Provider	Baseline	Compressed	Savings	Notes
Codex (gpt-5.1-codex)	37,514	23,189	~38%	Most stable (consistent cache)
Gemini CLI (Flash)	69,172	30,672	~56%	Measured via prompt field (cache-independent)
Claude Code (Opus 4.6)	~61,500	~45,300	~26%	Approximate (cache state varies +/-3%)

Methodology:

Validated across 3 independent runs with zero variance (min = max = median)
"Total content tokens" used (not billed tokens) to remove cache hit/miss variance
"Doc Compression" = document-only reduction (92.3%) vs "Total API Savings" = including system prompt
System prompt overhead varies by provider (~42K for Claude, ~16K for Codex, ~28K for Gemini)
Cost savings NOT reported as headline (volatile due to cache pricing effects)
Answer quality NOT formally evaluated (compression may affect response quality)

All sizes (large corpus, 2206 lines):

Corpus	Claude	Codex	Gemini
small (156 lines)	~2%	~4%	~8%
medium (479 lines)	~9%	~14%	~26%
large (2,206 lines)	~26%	~38%	~56%

Real-world savings depend on system prompt size: smaller system prompts (Codex, Gemini) see proportionally larger savings from document compression.

Run benchmarks yourself:

# Dry run (no API calls, validates setup)
python scripts/benchmark_token_savings.py --dry-run --verbose

# Full benchmark across all providers
python scripts/benchmark_token_savings.py --mode skill --verbose --output results.json

# Single provider, single corpus size
python scripts/benchmark_token_savings.py --providers claude --sizes large --verbose

Token Savings Tracker (NEW)

Every compression operation is tracked with exact dollar savings. See your ROI in real-time:

get_savings_report(session_id="my-session")
# -> {
#   "total_tokens_saved": 142,500,
#   "total_dollars_saved": 2.14,
#   "avg_compression_ratio": 13.0,
#   "monthly_projected_savings": 64.20,
#   "roi_vs_pro_plan": 2.2,
#   "breakeven_operations": 14,
#   "by_tool": {"ingest_context": {"operations": 8, "dollars_saved": 1.87}, ...}
# }

The tracker computes: tokens saved, dollar savings (model-aware pricing), compression ratios, monthly projections, ROI vs the $29/mo Pro plan, and breakeven analysis. Persists to SQLite so savings accumulate across sessions.

CLI Output Optimizer (NEW -- RTK-Inspired)

Coding agent CLI output is the #1 token waster. Token Saver now auto-detects 10 command types and applies optimal filtering:

Command	Strategy	Typical Savings
`git diff`	Extract file list + stats summary	90-99%
`pytest` / `jest`	Show only failures + summary	94-99%
`npm install` / `pip install`	Keep summary, strip progress	85-95%
Lint (ruff, eslint)	Group by rule, count occurrences	80-90%
JSON output	Extract keys + types, first item	80-95%
Logs	Deduplicate repeated lines	70-85%
Any colored output	Strip ANSI escape codes	10-30%

Use directly via MCP tool:

filter_cli_output(text="<raw CLI output>")

Or automatically via the proxy (applies to all upstream tool responses):

token-saver-proxy npx some-server --provider anthropic

Falls back to RTK when installed for maximum quality.

MCP Proxy Mode (NEW)

Wrap ANY MCP server transparently -- compress tool responses automatically with zero code changes:

# Compress any MCP server's output
token-saver-proxy npx some-mcp-server --provider anthropic

# Enable schema compression (N tools -> 3 meta-tools, ~96% token reduction)
token-saver-proxy python -m my_server --schema-compression

In Claude Desktop config:

{
  "mcpServers": {
    "my-server-compressed": {
      "command": "token-saver-proxy",
      "args": ["npx", "some-mcp-server", "--provider", "anthropic"]
    }
  }
}

The proxy applies TokenRefiner + MetaTokenCompressor to every tool response. Optional --schema-compression replaces all upstream tools with 3 meta-tools (search_tools, get_tool_schema, invoke_tool).

Session Continuity (NEW)

Token Saver now survives conversation compaction. A SQLite-backed journal records all ingestions, configurations, and tool calls. After the CLI compacts your conversation:

# Call recover_session to get a compact summary of everything that happened
recover_session(session_id="my-session")
# -> {ingested_files: [...], client_config: {...}, active_profile: "balanced", total_tokens_saved: 14500}

Cache Strategy Advisor (NEW)

Every LLM provider handles caching differently. The advisor tells you exactly what to do:

advise_cache_strategy(model_id="claude-4-sonnet")
# -> Anthropic: explicit cache, 90% discount, add ephemeral markers, 5min TTL

advise_cache_strategy(model_id="gpt-4.1")
# -> OpenAI: automatic cache, 50% discount, keep 1024+ token prefix stable

advise_cache_strategy(model_id="gemini-2.5-flash")
# -> Google: implicit cache, 90% discount, no client action needed

advise_cache_strategy(model_id="groq-llama-4-scout")
# -> Groq: no caching, focus on small prompts for fastest inference

Supports: Anthropic, OpenAI, Google Gemini, Groq, XAI (Grok), Azure, Bedrock, local/Ollama.

Multi-Agent Setup (NEW)

Install Token Saver for any AI coding agent with a single command:

token-saver-setup --auto                    # Auto-detect your environment
token-saver-install-mcp --agent cursor      # Cursor
token-saver-install-mcp --agent windsurf    # Windsurf
token-saver-install-mcp --agent cline       # Cline / Roo Code
token-saver-install-mcp --agent codex       # OpenAI Codex CLI
token-saver-install-mcp --agent gemini      # Gemini CLI
token-saver-install-mcp --agent copilot     # VS Code Copilot
token-saver-install-mcp --doctor-all        # Check all agent configs

8 agents supported: Claude Desktop, Claude Code (project), Cursor, Windsurf, Cline, VS Code Copilot, Codex, and Gemini CLI.

Savings Dashboard (NEW)

Track your token savings across sessions:

token-saver-stats                  # All-time summary
token-saver-stats --daily          # Day-by-day breakdown
token-saver-stats --weekly         # Weekly summary
token-saver-stats --by-tool        # Per-tool breakdown
token-saver-stats --cost           # Cost savings with model pricing
token-saver-stats --json           # Machine-readable output
token-saver-stats --csv            # Spreadsheet export

ROI Calculator (NEW)

Calculate your return on investment via the calculate_roi MCP tool:

Input:  model=claude-opus-4-6, tokens_per_day=500000, team_size=10
Output:
  Without gotcontext: $1,650.00/mo
  With gotcontext:      $247.50/mo (85% savings)
  Pro plan cost:        $290.00/mo ($29/user × 10 users)
  Net savings:        $1,112.50/mo (5.7x ROI)

Supports 20+ models with real pricing data.

Token Budget Monitoring (NEW)

Set per-session, daily, or monthly token budgets via check_budget MCP tool or environment variables:

TOKEN_BUDGET_SESSION=500000 TOKEN_BUDGET_DAILY=2000000 token-saver-mcp

Returns usage status, alert levels (ok/info/warning/critical), and projected usage.

Team Dashboard Export (NEW)

Export aggregated team savings data via export_team_data MCP tool:

JSON: For API consumption and custom dashboards.
CSV: For spreadsheet analysis.
Prometheus: For Grafana/Datadog monitoring.

Tee/Recovery System (NEW)

When compression drops information, the original is saved for recovery:

TEE_MODE=failures token-saver-mcp    # Tee on high compression (default)
TEE_MODE=always token-saver-mcp      # Tee everything

MCP tools: get_original_output, list_tee_entries, tee_store_stats.

Custom Filter Rules (NEW)

Define project-specific output filtering rules in .gotcontext.toml:

[filters.my_build_output]
match_command = "my-build-tool"
strip_ansi = true
strip_lines_matching = ["^Progress:", "^\\s*$"]
keep_lines_matching = ["^ERROR", "^WARNING"]
head_lines = 50
tail_lines = 20
max_lines = 100

Supports 8-stage pipeline, inline tests, project + user-global precedence.

Missed Savings Discovery (NEW)

The discover_savings MCP tool scans directories to find files that would benefit from compression:

discover_savings(directory="/path/to/project")
→ README.md: ~2,400 tokens → ~300 compressed (87% savings)
→ src/main.py: ~800 tokens → ~200 compressed (75% savings)
→ Total opportunity: ~12,000 tokens saveable

Research-Backed Compression Techniques

Token Saver integrates techniques from recent AI research papers:

compress_meta_tokens: lossless LZ77-inspired compression (arXiv 2506.00307). Replaces repeated token subsequences with §N symbols + dictionary header. Fully reversible.
recommend_compression: quality-floor-based profile selection (arXiv 2603.19733). Specify minimum acceptable quality (e.g. 0.85) instead of manually choosing a compression profile. Auto-selects the most aggressive profile that meets your quality target.
COMI MIG scoring (arXiv 2602.01719): query-aware token refinement. When a query is provided, the token refiner uses Marginal Information Gain to keep relevant tokens and remove redundant ones, instead of simple filler-word removal.

Cache Optimization Features

Token Saver automatically optimizes for each provider's caching behavior:

Cache-stable response ordering: tool responses are key-ordered so stable metadata (status, file_id) sits at the prefix for Claude/Gemini cache hits, and is mirrored at the tail for Codex's middle-truncation pattern.
Token-level refinement: LLMLingua-inspired post-processing removes articles, fillers, and hedges from compressed skeletons (20-40% additional reduction). Preserves numbers, code identifiers, URLs, and sentence boundaries.
TurboQuant-inspired embedding quantization: 384-dim float32 embeddings compressed to 96-dim int8 (13x memory reduction) using random orthogonal rotation + int8 quantization + 1-bit residual error correction. >0.99 fidelity in the compressed subspace.

Prompt Cache Observability Tools

If you are optimizing for prompt caching, the most relevant MCP tools are:

audit_prompt_cacheability: checks section ordering and volatility before provider calls.
render_prompt_template: produces a canonical cache-friendly prompt plus a prompt_id.
assess_cache_compatibility: checks whether Gemini CLI, Claude Code, Codex, or raw provider APIs expose enough cache telemetry to validate real reuse.
capture_cache_telemetry: normalizes provider cache-hit telemetry from Claude, OpenAI, and Gemini responses.
diagnose_cache_miss: explains likely causes of unexpected misses, partial reuse, section drift, and cache-creation churn.

The model-optimization layer now also exposes:

provider-specific cache threshold guidance via optimize_for_model
deterministic prompt_cache_key guidance for OpenAI/Codex-style routing stickiness
local extractive compression and history-compaction primitives for lower-latency context trimming
benchmark method comparisons between semantic and extractive baselines

For usage guidance, see docs/guides/PROMPT_CACHING.md. For Gemini CLI, Claude, and Codex compatibility guidance, see docs/guides/PROVIDER_CACHE_COMPATIBILITY.md.

Skill Scripts (No MCP Required)

Path: skills/token-saver-context-compression/scripts/

Main scripts:

profile_tokens.py: raw vs compressed token profile.
compress_context.py: baseline/query-guided/evidence-aware compression.
validate_evidence.py: checks if compressed output has enough evidence.
run_skill_workflow.py: profile + compress + evidence in one command.
benchmark_toon_vs_json.py: TOON/JSON token + quality guard checks.

All support local execution with no dependency on external MCP wrappers.

Data Source Flexibility

You can feed Token Saver from any source as long as you provide text input:

Local files (--file).
Pasted text (--text).
Piped stdin from another command.
Upstream connectors that export text payloads.

The compressor itself is source-agnostic; GitHub is just one possible integration path, not a requirement.

Output Formats (JSON vs TOON)

Skill scripts support:

--output-format json
--output-format toon
--output-format auto

auto behavior:

Select TOON only when data shape is TOON-friendly (uniform object arrays) and token-efficient.
Fall back to JSON otherwise.

Repo Structure (Practical Map)

src/ - core implementation.
src/handlers/ - MCP tool handlers.
src/semantic_modulator/ - app/api/service-layer architecture.
skills/ - portable no-MCP skill package.
scripts/ - benchmark/setup/dev scripts.
tests/ - unit/integration/regression tests.
docs/ - detailed guides and reference docs.

Present but not wired into the default runtime:

src/reliability.py — timeout, circuit breaker, and retry primitives. Fully tested but not yet integrated into the server hot path. Best first integration target: async batch compression in compression_handlers.py.
src/multimodal_compressor.py, src/training_utils.py — experimental research modules. Gated behind "experimental": true in handler responses. Not part of the core MCP tool surface.

Run The Server (MCP Mode)

token-saver-mcp

For web/API deployments, set HTTP_ENABLED=true to start an HTTP server with health/metrics endpoints alongside the MCP server:

HTTP_ENABLED=true HTTP_PORT=8080 token-saver-mcp

Endpoints: /health/liveness, /health/readiness, /health/diagnostics, /metrics (Prometheus). See docs/deployment/DOCKER.md and docs/deployment/SAAS_MULTI_TENANT.md for reverse proxy and API gateway patterns.

Claude Desktop config example:

{
  "mcpServers": {
    "token-saver": {
      "command": "token-saver-mcp",
      "args": [],
      "cwd": "/path/to/token-saver-5000"
    }
  }
}

The simplest setup path is:

token-saver-setup --auto

To install that entry automatically into Claude Desktop with the low-level installer:

token-saver-install-mcp

To generate a project-scoped .claude\.mcp.json for Claude Code or another MCP-aware workspace:

token-saver-install-mcp --project-config

To generate a portable project-scoped config for a shared repo using ${workspaceFolder}:

token-saver-install-mcp --portable-project-config

If you want raw JSON instead of writing the project config file:

token-saver-install-mcp --print-config > .mcp.json

To inspect whether the command, Claude Desktop config, and project config are installed correctly:

token-saver-install-mcp --doctor --human

To uninstall cleanly:

token-saver-setup --uninstall-all

Or target just one surface:

token-saver-setup --uninstall --desktop
token-saver-setup --uninstall --portable-project

The MCP server now also exposes first-class prompts and resources:

Prompts for document compression, prompt-cache review, and MCP setup guidance.
Resources for tool catalogs, workflow instructions, install modes, and live install status.
A resource template at token-saver://tool/{name}/help for canonical per-tool help payloads.

Test and Quality Commands

Run tests:

pytest tests/ -v

Run benchmark guard:

python scripts/benchmarks/run_benchmarks.py --compare baseline,query_guided,evidence_aware
python scripts/benchmarks/check_benchmark_guard.py --strict-case-set --summary-file artifacts/benchmarks/guard_summary.md

Lint/format:

python -m ruff check src tests scripts skills
python -m black src tests scripts skills

Version and Requirements

Version: 0.11.0
Python: 3.10-3.14 (chromadb requires 3.10-3.12)
Suggested RAM: ~4GB for embedding workloads

Version source-of-truth: pyproject.toml (all other files derive from it).

Roadmap

This repo is the open-source local MCP tool. gotcontext.ai will be the SaaS platform built on top of it.

Planned for gotcontext.ai:

Context as a Service (CaaS) — cloud API for semantic compression with team dashboards, session history, and usage metering
Knowledge Hub — model-agnostic RAG notebooks with compressed retrieval. Upload docs, chat with compressed context. Like NotebookLM but open, self-hostable, and 85% more token-efficient. Built on open-notebook (21.8K★).
Agent Context Hub — always-current framework docs for AI coding agents with compressed retrieval. Like Context7/ref.tools but local-first, open source, and 85% fewer tokens per response. Built on docs-mcp-server (1.2K★).
Global AI Benchmark Repository — crowd-sourced database of model inference performance across hardware, quantization formats, and providers. Like UserBenchmark but for AI.
AI News Center — curated AI infrastructure intelligence hub with data-driven reports from benchmark data
Model-aware routing — combine context compression with benchmark data to recommend optimal model + quant for each request

See docs/GO_TO_MARKET_PLAN.md for the full 5-product platform strategy.

Documentation

Start here:

docs/getting-started/GETTING_STARTED.md
docs/guides/HOW_IT_WORKS.md
docs/reference/ARCHITECTURE.md
docs/guides/MCP_TOOLS_GUIDE.md
docs/guides/WORKFLOW_ORCHESTRATION.md
docs/deployment/SAAS_MULTI_TENANT.md
CHANGELOG.md

License

MIT (LICENSE).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jhollingsworth

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.11.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gotcontext_server-0.11.0.tar.gz (909.5 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gotcontext_server-0.11.0-py3-none-any.whl (552.2 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file gotcontext_server-0.11.0.tar.gz.

File metadata

Download URL: gotcontext_server-0.11.0.tar.gz
Upload date: Apr 23, 2026
Size: 909.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gotcontext_server-0.11.0.tar.gz
Algorithm	Hash digest
SHA256	`9a89150138dcab21145f343a0fb09ae24a86c5471fd589f16deba7ed08a1c374`
MD5	`b142b3e94f5bb43cce9c0b3abed4b7d3`
BLAKE2b-256	`914db1b627364544b8759b4972dc7e8eac6f90f260433be83b6f92687ba90133`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gotcontext_server-0.11.0.tar.gz:

Publisher: publish.yml on oimiragieo/token-saver-5000

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gotcontext_server-0.11.0.tar.gz
- Subject digest: 9a89150138dcab21145f343a0fb09ae24a86c5471fd589f16deba7ed08a1c374
- Sigstore transparency entry: 1364840466
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: oimiragieo/token-saver-5000@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad
- Branch / Tag: refs/tags/v0.11.0
- Owner: https://github.com/oimiragieo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad
- Trigger Event: push

File details

Details for the file gotcontext_server-0.11.0-py3-none-any.whl.

File metadata

Download URL: gotcontext_server-0.11.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 552.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gotcontext_server-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9f1ca4771bc5d2cac9dfcf0179ae16828e2546fbe0b82d080ca3959d61e0181`
MD5	`810904d8cc532b3dce643f7af4148a24`
BLAKE2b-256	`d120768fdec3f2c9526a78a882b21c69281d497b59648db84d49afbc78a178b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gotcontext_server-0.11.0-py3-none-any.whl:

Publisher: publish.yml on oimiragieo/token-saver-5000

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gotcontext_server-0.11.0-py3-none-any.whl
- Subject digest: e9f1ca4771bc5d2cac9dfcf0179ae16828e2546fbe0b82d080ca3959d61e0181
- Sigstore transparency entry: 1364840483
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: oimiragieo/token-saver-5000@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad
- Branch / Tag: refs/tags/v0.11.0
- Owner: https://github.com/oimiragieo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0c0ffd4de8a1fd2a5ea53b1d91b8ec3779c29cad
- Trigger Event: push

gotcontext-server 0.11.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Token Saver 5000

What This Project Actually Does

Who This Is For

Proven Results: 12 Real-World User Journeys

Journey Details

Two Ways To Use It

Multi-Tenant SaaS Deployment

Local vs Docker

First 10 Minutes (Recommended Path)

How The Compression Flow Works

Core MCP Tools (The Ones Most Users Need)

Client-Aware Token Optimization Tools

Proven Benchmark Results

Token Savings Tracker (NEW)

CLI Output Optimizer (NEW -- RTK-Inspired)

MCP Proxy Mode (NEW)

Session Continuity (NEW)

Cache Strategy Advisor (NEW)

Multi-Agent Setup (NEW)

Savings Dashboard (NEW)

ROI Calculator (NEW)

Token Budget Monitoring (NEW)

Team Dashboard Export (NEW)

Tee/Recovery System (NEW)

Custom Filter Rules (NEW)

Missed Savings Discovery (NEW)

Research-Backed Compression Techniques

Cache Optimization Features

Prompt Cache Observability Tools

Skill Scripts (No MCP Required)

Data Source Flexibility

Output Formats (JSON vs TOON)

Repo Structure (Practical Map)

Run The Server (MCP Mode)

Test and Quality Commands

Version and Requirements

Roadmap

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance