Cut AI coding token costs by 96-98% — curated context, MCP server, and tray app for AI coding tools
Project description
cram-ai
Slash AI coding token costs by injecting only what the model needs — nothing more.
AI coding tools auto-index your entire repo at session start. That indexing generates cache writes — the most expensive token type (3–4× the cost of reads). cram-ai replaces auto-indexing with a small set of curated files that give the model exactly what it needs: repo structure, key decisions, and focused excerpts of only the files relevant to your current task.
Benchmarks
cram-ai itself (31 source files, Python CLI tool)
| Tokens | Sonnet cost/session | Opus cost/session | |
|---|---|---|---|
| Without cram — full repo auto-indexed | 49,683 | $0.186 | $0.932 |
| Without cram — orientation set only¹ | 19,687 | $0.074 | $0.369 |
| With cram — ARCHITECTURE + SYMBOLS + task context | 1,898 | $0.007 | $0.036 |
96% token reduction. $0.18 saved per session. $18 over 100 sessions (Sonnet).
pallets/flask (118 source files, Python web framework)
| Tokens | Sonnet cost/session | Opus cost/session | |
|---|---|---|---|
| Without cram — full repo auto-indexed | 171,641 | $0.644 | $3.22 |
| Without cram — orientation set only¹ | 60,863 | $0.228 | $1.14 |
| With cram — ARCHITECTURE + SYMBOLS + task context | 5,929 | $0.022 | $0.111 |
96.5% token reduction. $0.62 saved per session. $62 over 100 sessions (Sonnet).
hoppscotch/hoppscotch (2,151 source files, TypeScript monorepo)
| Tokens | Sonnet cost/session | |
|---|---|---|
| Without cram — full repo auto-indexed | 418,697 | $1.57 |
| With cram | 7,239 | $0.027 |
98.3% reduction. $154 saved over 100 sessions (Sonnet).
¹ Orientation set = file tree + README + pyproject.toml/package.json + 5 largest source files. A realistic estimate for tools that don't index everything.
Pricing: Claude Sonnet 4.6 cache write $3.75/M, Opus 4.8 $18.75/M. Savings scale with team size and session frequency.
How it works
AI agents spend most tokens on orientation — finding relevant files, understanding structure, reading configs. cram-ai replaces that with a curated map the model reads instead of building itself.
your-repo/
└── .cram-ai-context/
├── ARCHITECTURE.md ← repo structure, tech stack, key files (auto-generated by Haiku)
├── DECISIONS.md ← architectural decisions you want the AI to respect
├── SYMBOLS.md ← public function/class index across all source files (auto-generated)
└── CURRENT_TASK.md ← per-session: task + focused excerpts of relevant files
SYMBOLS.md is the key accuracy improvement. Rather than asking a model to guess which files matter based on filenames alone, cram maps every source file to its public identifiers (api/routes.py: handle_rate_limit, check_throttle, apply_backoff). The model uses that map to select files and identify the exact functions to excerpt — so "fix the rate limiter" finds check_throttle even if the words don't match.
cram task "..." runs before every session:
[1/4]LoadsSYMBOLS.md— 455 identifiers across 65 files, zero LLM calls[2/4]Sends architecture + symbol index to Haiku → model returnspath | RelevantFunc, OtherClass[3/4]Extracts identifier-focused excerpts — only the lines that contain those functions, plus context window[4/4]Writes to your tool's instruction file, warns if below cache minimum for your model
All stages stream live to the popup so you see exactly what's happening.
Quick start
pip install cram-ai
cd your-repo
cram init # one-time setup — scans repo, generates docs, indexes symbols
cram task "add login validation" # run before every session
# → context pre-loaded into your AI tool
cram sync # run after every commit (or fires automatically via git hook)
First command to context ready: under 60 seconds.
CLI commands
| Command | When to run | What it does |
|---|---|---|
cram init |
Once per repo | Scans structure, generates ARCHITECTURE.md + SYMBOLS.md via Haiku |
cram task "..." |
Before every session | Identifies relevant files by symbol, inlines focused excerpts |
cram continue |
Mid-session before committing | Extends grace period — prevents context reset on mid-task commits |
cram sync |
After every commit | Updates ARCHITECTURE.md + SYMBOLS.md from git diff |
cram decide "..." |
When making arch choices | Appends a dated decision entry to DECISIONS.md |
cram status |
Anytime | Shows .cram-ai-context/ files and freshness |
Provider support
The tool is model-agnostic. Set AICONTEXT_MODEL to any provider:
# Claude CLI (default — works inside Claude Code with no API key)
cram init
# Anthropic SDK
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5-20251001
cram init
# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
cram init
# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
cram init
# Local (Ollama — free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies — auto-discovered from env/credentials.
Session discipline
The context files handle orientation. These rules handle the rest:
- Run
cram task "..."before every session — never let the model hunt for files itself. - Hard session boundary — end the session the moment a feature works. New code = growing context = rising cost.
- Mid-task commit? Run
cram continuefirst to extend the grace period. - Run
cram syncafter every commit — keepsARCHITECTURE.mdandSYMBOLS.mdaccurate. - Architectural decision? Run
cram decide "use Redis for sessions"— keepsDECISIONS.mdcurrent without opening the file.
Environment variables
| Variable | Default | Description |
|---|---|---|
AICONTEXT_MODEL |
auto-detected | Model for context tasks — bare alias or provider/model |
ANTHROPIC_API_KEY |
— | Anthropic API key (optional inside Claude Code) |
AICONTEXT_MAX_FILES |
5 |
Max files inlined per task |
AICONTEXT_MAX_LINES |
300 |
Max lines per ARCHITECTURE.md |
AICONTEXT_MAX_EXCERPT_LINES |
80 |
Max lines excerpted per file in CURRENT_TASK.md |
CRAM_TASK_GRACE_SECONDS |
600 |
Seconds after cram task before a commit resets context |
Works with any AI coding tool
| Tool | How context loads |
|---|---|
| Claude Code | Reads .cram-ai-context/ recursively — all files auto-loaded |
| Cursor | Writes to .cursor/rules/cram-task.md — auto-loaded by Cursor |
| Windsurf | Writes to .windsurf/rules/cram-task.md — auto-loaded |
| Codex | Writes to .cram-ai-context/AGENTS.md — auto-loaded |
| GitHub Copilot | Writes to .github/cram-task.md — include once in copilot-instructions.md |
For non-Claude tools, cram automatically prepends a compact architecture summary so the model has repo orientation even without recursive file loading.
Running tests
pip install pytest
pytest
57 passing tests, no API key required. All model calls are mocked.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cram_ai-0.1.0.tar.gz.
File metadata
- Download URL: cram_ai-0.1.0.tar.gz
- Upload date:
- Size: 62.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4b761e618c9ca5f173b75cab659ef173f6ba38125364c3e768c080341d0c628
|
|
| MD5 |
d4d53002cf8bbbd8c3e6fd312e4844cc
|
|
| BLAKE2b-256 |
d716da6faacd5fed219309295d201d07e833d9126d954bccc58020e83c5d6fdc
|
File details
Details for the file cram_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cram_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f917c9f0cec6fbb31cb61c9872eb0b4925a97141f9459da86e0e0bfd8cbaaa6
|
|
| MD5 |
b8b0e6ec0766f89c3c75144d35555119
|
|
| BLAKE2b-256 |
e94aa917f178cb59b7cedfc8b653cf0f735116fb6d8fafd6fb613ea20b01a67b
|