Stable context layer for AI coding tools — Haiku-generated, delivered via MCP to keep the prefix tiny
Project description
cram-ai
Your AI coding tool starts fresh every session. cram gives it memory.
cram maintains a curated context layer for your repo — architecture, symbols, decisions, known gotchas, and focused file excerpts. Your AI tool loads exactly what it needs instead of re-discovering your codebase from scratch every time.
Works with Claude Code, Cursor, Windsurf, Zed, Codex, GitHub Copilot, and any tool that reads a file on startup.
Install
# Standard — includes MCP server for Claude Code / Cursor / Windsurf / Zed
pip install 'cram-ai[mcp]'
# With macOS menu bar app
pip install 'cram-ai[mcp,tray]'
# With additional model providers (OpenAI, Gemini, Bedrock, Ollama …)
pip install 'cram-ai[mcp,multi-provider]'
# Homebrew (macOS)
brew tap vishbay/cram-ai && brew install cram-ai
Quick start
cd your-repo
# 1. One-time setup
cram init
# → scans your repo, generates ARCHITECTURE.md + SYMBOLS.md via a cheap model
# → scaffolds DECISIONS.md + GOTCHAS.md for you to fill in
# → installs a git post-commit hook to keep context fresh
# 2. Fill in the manual files — this is where cram's real value lives
vim .cram-ai-context/DECISIONS.md # architectural invariants, naming conventions
vim .cram-ai-context/GOTCHAS.md # non-obvious traps that burned your team
# 3. Commit so teammates get the context layer too
git add .cram-ai-context/ CLAUDE.md
git commit -m "chore: init cram-ai context layer"
Then wire up your tool of choice — MCP or prefix injection.
The context layer
cram maintains five files in .cram-ai-context/. Two are auto-generated. Two are manual. One is
generated per task.
your-repo/
└── .cram-ai-context/
├── ARCHITECTURE.md ← auto · repo structure, tech stack, key files
├── SYMBOLS.md ← auto · every source file mapped to its public identifiers
├── DECISIONS.md ← manual · architectural commitments your team has made
├── GOTCHAS.md ← manual · non-obvious traps, foot-guns, things that burn people
└── CURRENT_TASK.md ← per-task · focused excerpts for the current work
Auto-generated (ARCHITECTURE.md, SYMBOLS.md):
- Generated by
cram init, refreshed automatically via the git post-commit hook after each commit SYMBOLS.mduses regex — deterministic, no LLM cost, byte-stable across runsARCHITECTURE.mduses a cheap model (Haiku / Gemini Flash / GPT-4o Mini)
Manual (DECISIONS.md, GOTCHAS.md):
- Scaffolded by
cram init— you fill them in over time DECISIONS.md: "we use X", "never do Y", naming conventions, non-obvious invariantsGOTCHAS.md: silent side effects, middleware gaps, surprising nulls — things grep can't tell you- Append entries with
cram decide "..."orcram gotcha "..."
Per-task (CURRENT_TASK.md):
When you call get_context("task description"), cram runs a four-stage pipeline: reads the
symbol index, asks a cheap model to identify relevant files, extracts identifier-focused excerpts
from those files, and writes the result. Typically 800–1,500 tokens covering exactly the files
that matter for the task.
What this replaces: the agent spending 3–5 tool calls grep-ing and reading files to orient itself at the start of every session. With cram the context arrives in one call and includes knowledge — decisions, gotchas — that the agent can't discover by searching.
MCP delivery
If your tool supports MCP (Claude Code, Cursor, Windsurf, Zed, Codex CLI), wire up the cram MCP server once and the tool can call context tools directly.
One-time server config (same format for all MCP clients):
{
"mcpServers": {
"cram-ai": {
"command": "cram",
"args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
}
}
}
| Client | Config file |
|---|---|
| Claude Code | .claude/settings.json |
| Cursor | .cursor/mcp.json or Cursor Settings → MCP |
| Windsurf | Windsurf MCP settings |
| Zed | Zed assistant settings → context servers |
| Codex CLI | ~/.codex/config.yaml → mcpServers |
Available MCP tools:
| Tool | What it returns | When to call it |
|---|---|---|
get_context(task='') |
Runs symbol lookup → file selection → excerpt extraction. No-arg: returns last CURRENT_TASK.md without re-running the LLM. | First thing every session |
get_architecture() |
ARCHITECTURE.md — repo structure, tech stack, key files | Orientation in an unfamiliar area |
get_symbols(query='') |
SYMBOLS.md — source files mapped to public identifiers, optionally filtered | Finding where a function is defined |
get_decisions() |
DECISIONS.md — architectural commitments | Before making a design choice |
get_gotchas() |
GOTCHAS.md — non-obvious traps and foot-guns | Before touching an unfamiliar area |
add_file(path, identifiers='') |
Appends a file's excerpts to CURRENT_TASK.md | When a mid-task discovery needs new context |
Prefix injection
For tools that don't support MCP, run cram task "..." --target <tool> before your session.
cram writes focused context into the file the tool auto-loads at startup.
# GitHub Copilot
cram task "add pagination to the users endpoint" --target copilot
# → writes to .github/cram-task.md (one-time: add an include line to copilot-instructions.md)
# Cursor (no-MCP fallback)
cram task "add pagination to the users endpoint" --target cursor
# → writes to .cursor/rules/cram-task.md
# Windsurf (no-MCP fallback)
cram task "add pagination to the users endpoint" --target windsurf
# → writes to .windsurf/rules/cram-task.md
# All targets at once
cram task "add pagination to the users endpoint" --target all
| Target | File written |
|---|---|
cursor |
.cursor/rules/cram-task.md |
windsurf |
.windsurf/rules/cram-task.md |
copilot |
.github/cram-task.md |
codex |
.cram-ai-context/AGENTS.md |
claude |
CLAUDE.md (escape hatch; prefer MCP for Claude Code) |
all |
All of the above |
Daily workflow
# Before a session — MCP path (Claude Code / Cursor / Windsurf / Zed)
# Nothing to run. The agent calls get_context() itself.
# Before a session — injection path (Copilot / no-MCP tools)
cram task "fix the rate limiter" --target copilot
# Log a decision while working
cram decide "use cursor-based pagination, not offset — offset breaks under concurrent writes"
# Log a gotcha you just found
cram gotcha "the users.email column is nullable in prod despite NOT NULL in schema.prisma"
# Extend grace period if you commit mid-task (prevents context reset)
cram continue
# Check context freshness
cram status
After every commit the git post-commit hook runs cram sync automatically to refresh
ARCHITECTURE.md and SYMBOLS.md. A session grace period prevents sync from firing while
you're mid-task.
CLI reference
| Command | What it does |
|---|---|
cram init [path] [--team] |
One-time setup — scans repo, generates context files, installs git hook |
cram mcp [--repo PATH] |
Start MCP server (stdio). Wire into your tool's settings once; clients launch it automatically. |
cram task "..." [--target T] |
Run context pipeline, write CURRENT_TASK.md, optionally inject into tool's auto-loaded file |
cram sync [path] |
Refresh ARCHITECTURE.md + SYMBOLS.md from current repo state |
cram decide "..." [path] |
Append a dated architectural decision to DECISIONS.md |
cram gotcha "..." [path] |
Append a non-obvious trap to GOTCHAS.md |
cram continue [path] |
Extend grace period — keep context across a mid-task commit |
cram status [path] |
Show each context file with age, line count, staleness warning |
cram benchmark [path] |
Show token and cost comparison across delivery strategies |
cram doctor [path] |
Health check — models, hooks, git, context files |
cram hook install|uninstall |
Manage the git post-commit hook manually |
cram menu [path] |
Launch macOS menu bar app |
cram autostart on|off |
Start menu bar app at login (macOS) |
Model providers
cram uses a cheap model for its maintenance calls (generating ARCHITECTURE.md, selecting files,
extracting excerpts). Set AICONTEXT_MODEL to any provider:
# Inside Claude Code — zero config, uses session credentials
cram init
# Anthropic API key
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5
# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
# Ollama (local, free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init
Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies (install
cram-ai[multi-provider]).
Environment variables
| Variable | Default | Description |
|---|---|---|
AICONTEXT_MODEL |
auto-detected | Model for context tasks — bare alias (haiku) or provider/model |
ANTHROPIC_API_KEY |
— | Optional inside Claude Code (uses session credentials) |
AICONTEXT_MAX_FILES |
5 |
Max files included in CURRENT_TASK.md per task |
AICONTEXT_MAX_LINES |
300 |
Max lines per file when extracting excerpts |
AICONTEXT_TASKS_PER_SESSION |
4 |
Assumed tasks per cache window (used by cram benchmark) |
CRAM_TASK_GRACE_SECONDS |
600 |
Seconds after cram task before a commit resets context |
💰 Real-world token consumption
Without context pre-loading, an agent spends the first few exchanges of every session re-discovering the codebase — reading files, running searches, building orientation from scratch.
What a typical session consumes (no cram):
| Phase | What happens | Tokens |
|---|---|---|
| Session start | System prompt + tool definitions + rules files | 3–8K |
| Orientation | find / grep / read calls to discover relevant files cold |
20–60K |
| Active work | Conversation, edits, test runs | 20–50K |
| Output | Code written, explanations | 5–15K |
| Per task total | 50–130K |
The orientation phase (30–50% of every session) is pure re-discovery overhead — the agent reads
10–20 files cold before it knows where to work. cram replaces that with one get_context() call
returning ~1–2K tokens of targeted excerpts.
Scaled to a full day:
| Usage | Tasks/day | Est. tokens/day | Cost at Sonnet 4.6 |
|---|---|---|---|
| Light | 2 short tasks | ~150K | ~$0.50 |
| Average | 4 feature tasks | ~400K | ~$1.20 |
| Heavy | 6+ complex tasks | ~900K | ~$2.70 |
For the average developer (~400K tokens/day), roughly 120–200K tokens/day is orientation overhead — paid on every session, for every task, from scratch.
cram tray shows live daily estimates based on your actual repo size (4 sessions × 4 tasks/day). Use the model selector in the tray popup to see estimates for your model:
| Model | Base input price |
|---|---|
| Haiku 4.5 | $1.00 / MTok |
| Sonnet 4.6 | $3.00 / MTok |
| Opus 4 | $5.00 / MTok |
| Metric | What it shows |
|---|---|
| Context reduction | How much smaller cram context is vs full repo scan |
| Without cram/day | Estimated daily cost if the agent reads the full repo each task |
| With cram/day | Estimated daily cost using the frozen context layer (MCP path) |
| Saved/day | The difference — scales with repo size and session frequency |
Run cram benchmark for a full breakdown across all three delivery strategies and all model tiers.
💸 Claude Code users: cache-write bonus
This section is specific to Claude Code + Anthropic. The context layer is useful for any tool, but Claude's prompt caching gives MCP delivery an additional cost advantage.
Anthropic's prompt cache has a 5-minute TTL. Content in the conversation prefix gets cache-written at 1.25× the base input price on every new session and every TTL expiry. Content that doesn't touch the prefix — like MCP tool results — isn't.
Prefix injection vs MCP:
Prefix injection (--target claude) |
MCP (get_context()) |
|
|---|---|---|
| Where context lands | CLAUDE.md → front of prefix | Conversation tail (tool result) |
| Cache writes per session | N × task context tokens | 1 × tool definitions (~1–2K tokens) |
| Per-task context cost | 1.25× write per task change | 0.1× read after first session write |
| 10K-token context, 4 tasks | ~$0.09–0.15 in cache writes | ~$0.01 in cache writes |
The larger your context and the more tasks per session, the more the MCP path saves.
Run cram benchmark to model the exact numbers for your repo.
The floor check: the frozen prefix must exceed 2,048 tokens (Sonnet 4.6) or 4,096 tokens
(Opus 4.8 / Haiku 4.5) to cache at all. cram benchmark flags this if your context files are
below the threshold.
Running tests
pip install pytest
pytest
No API key required — all model calls are mocked.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cram_ai-0.2.1.tar.gz.
File metadata
- Download URL: cram_ai-0.2.1.tar.gz
- Upload date:
- Size: 76.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
087cceb38b9fa5ed5ba7906258bc69e57aa286fa4f5ba8449fbd6fad8a4099b2
|
|
| MD5 |
f98beecdce4531c800de216bfd1fecac
|
|
| BLAKE2b-256 |
84537553bcbf130c6226406827fd15a04c633ef55429300374981258bc79eb55
|
File details
Details for the file cram_ai-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cram_ai-0.2.1-py3-none-any.whl
- Upload date:
- Size: 76.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a638c93de429d1bf16e4bfbbb0a34d6b62b0eacae18f53306d243d8fee9ffe8
|
|
| MD5 |
012e0eca1be8c423322ed51a975bf0df
|
|
| BLAKE2b-256 |
a5206bf84815f02f8509d0669ee514ca921566f6672c70232211e3fcc824bf37
|