Stable context layer for AI coding tools — Haiku-generated, delivered via MCP to keep the prefix tiny
Project description
cram-ai
Stable context layer for AI coding tools — generated once by a cheap model, delivered cheaply on every session.
Install
# pip (MCP support required for Claude Code)
pip install 'cram-ai[mcp]'
# Homebrew (macOS)
brew tap vishbay/cram-ai
brew install cram-ai
What it does
cram-ai generates three files from your repo (via Haiku or equivalent cheap model):
.cram-ai-context/
├── ARCHITECTURE.md — repo structure, tech stack, key files
├── DECISIONS.md — architectural decisions the AI should respect
└── SYMBOLS.md — every source file mapped to its public functions and classes
At session start you call get_context("your task") via the MCP server. cram picks the relevant files using the symbol index, extracts focused excerpts, and returns them as a tool result. The model gets exactly what it needs — no repo auto-indexing, no hunting.
Why MCP, not CLAUDE.md
Anthropic's prompt cache has a 5-minute TTL. Any content in the conversation prefix gets cache-written on every new session and on every TTL expiry. Cache writes cost 1.25× the base input price. Injecting 10K tokens of context into CLAUDE.md means 10K tokens of cache writes fire every time you open a new session — even if the content hasn't changed.
MCP tool results land in the conversation tail, not the prefix. They don't expand what gets cache-written on session start. The prefix stays tiny (tool definitions only, ~1–2K tokens), written once, read cheaply thereafter.
Run cram benchmark to see the exact cost difference for your repo.
Quick start
pip install 'cram-ai[mcp]'
cd your-repo
cram init # one-time: scans repo, generates context files, installs git hook
Add cram-ai to your .claude/settings.json (or see CLAUDE.md for the snippet after init):
{
"mcpServers": {
"cram-ai": {
"command": "cram",
"args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
}
}
}
Then at the start of each Claude Code session:
get_context("your task description")
That's it. The tool runs the full pipeline — symbol lookup, file selection, excerpt extraction — and returns the context as a tool result.
CLI commands
| Command | When to run | What it does |
|---|---|---|
cram init |
Once per repo | Scans structure, generates ARCHITECTURE.md + SYMBOLS.md via Haiku |
cram mcp |
On session start | Starts the MCP server (stdio) — wire this into your editor settings |
cram sync |
After every commit | Updates ARCHITECTURE.md + SYMBOLS.md from git diff |
cram decide "..." |
When making arch choices | Appends a dated decision entry to DECISIONS.md |
cram task "..." |
Optional CLI path | Runs the context pipeline and writes CURRENT_TASK.md without MCP |
cram benchmark |
Anytime | Shows three-scenario cache-write cost model for your repo |
cram task --inject "..." writes task content into CLAUDE.md directly (backward compat for tools without MCP support).
Provider support
The context generation (init, sync) is model-agnostic. Set AICONTEXT_MODEL to any provider:
# Claude CLI (default — works inside Claude Code with no API key)
cram init
# Anthropic SDK
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5-20251001
cram init
# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
cram init
# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
cram init
# Local (Ollama — free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies.
Tool support
| Tool | Context delivery |
|---|---|
| Claude Code | MCP server — get_context() tool result, prefix stays tiny |
| Cursor | Prefix injection — writes to .cursor/rules/cram-task.md |
| Windsurf | Prefix injection — writes to .windsurf/rules/cram-task.md |
| Codex | Prefix injection — writes to .cram-ai-context/AGENTS.md |
| GitHub Copilot | Prefix injection — writes to .github/cram-task.md |
Cursor, Windsurf, Codex, and Copilot have no MCP option — they use prefix injection via cram task. The cache-write cost savings only apply to the MCP (Claude Code) path.
Benchmarks
Run cram benchmark in your repo for exact numbers. Three scenarios are modelled:
| Scenario | What gets cache-written per session |
|---|---|
| No cram — model auto-indexes repo | N × full repo tokens |
| cram prefix-injected — content in CLAUDE.md | N × frozen context tokens |
| cram MCP-delivered — content as tool result | 1 × frozen context tokens + (N−1) reads |
At Sonnet 4.6 pricing for a medium repo (~50K tokens, 4 tasks/session):
| Scenario | Cache writes | $/session | $/100 sessions |
|---|---|---|---|
| No cram | ~200,000 tok | ~$0.94 | ~$94 |
| Prefix-injected | ~40,000 tok | ~$0.19 | ~$19 |
| MCP-delivered | ~10,000 tok | ~$0.05 | ~$5 |
The MCP path reduces cache-write cost ~20× vs no cram and ~4× vs prefix injection. Savings scale with repo size and session frequency. Run cram benchmark against your actual repo to get numbers grounded in your file sizes.
Note: the frozen prefix must exceed the model's cache minimum (2,048 tokens for Sonnet 4.6, 4,096 for Opus and Haiku) to cache at all. If cram benchmark flags this, run cram sync to rebuild the context files with more detail.
Environment variables
| Variable | Default | Description |
|---|---|---|
AICONTEXT_MODEL |
auto-detected | Model for context tasks — bare alias or provider/model |
ANTHROPIC_API_KEY |
— | Anthropic API key (optional inside Claude Code) |
AICONTEXT_MAX_FILES |
5 |
Max files inlined per task |
AICONTEXT_MAX_LINES |
300 |
Max lines per ARCHITECTURE.md |
AICONTEXT_MAX_EXCERPT_LINES |
80 |
Max lines excerpted per file |
CRAM_TASK_GRACE_SECONDS |
600 |
Seconds after cram task before a commit resets context |
Upgrading from v0.1.0
v0.2.0 changes the Claude Code delivery path from CLAUDE.md prefix injection to MCP tool results. This is the main behavioral change:
| v0.1.0 | v0.2.0 | |
|---|---|---|
| Claude Code delivery | cram task writes context into CLAUDE.md |
get_context() MCP tool, CLAUDE.md is a config pointer |
cram task |
Writes to CLAUDE.md | Writes CURRENT_TASK.md only, prints MCP reminder |
cram task --inject |
n/a | Restores old CLAUDE.md injection behavior |
| Other tools (Cursor, Windsurf, etc.) | Unchanged | Unchanged |
If you had cram task wired into a pre-session script:
- For Claude Code: remove it. Use the MCP
get_context()tool instead. - For other tools: keep it as-is, or add
--injectif you want CLAUDE.md injection preserved.
Running tests
pip install pytest
pytest
99 passing tests, no API key required. All model calls are mocked.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cram_ai-0.2.0.tar.gz.
File metadata
- Download URL: cram_ai-0.2.0.tar.gz
- Upload date:
- Size: 67.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f319d499ae596aecd4f1bf8fa94813dcdfc383d01b469814fa6967bb3aa40786
|
|
| MD5 |
066097d1ee3d75ac7c5fe7e66b6b7c29
|
|
| BLAKE2b-256 |
ebc63916ba917ef0574e48cede7a351739a2d8025eaf9f36b59904530610cddb
|
File details
Details for the file cram_ai-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cram_ai-0.2.0-py3-none-any.whl
- Upload date:
- Size: 68.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb08d3c1b4c2853815922fb6c6c2fbb60f5e7d3a644bd1f4b57976e25c5279d6
|
|
| MD5 |
5a18eededd31e051e69bbaa40f73a15e
|
|
| BLAKE2b-256 |
5a505dc7daf085534478c61edfa973ff9e2e192d2710c52a0fe2c4cf9e5169c8
|