Skip to main content

Stable context layer for AI coding tools — Haiku-generated, delivered via MCP to keep the prefix tiny

Project description

cram-ai

PyPI Python License

Your AI coding tool starts fresh every session. cram gives it memory.

cram maintains a curated context layer for your repo — architecture, symbols, decisions, known gotchas, and focused file excerpts. Your AI tool loads exactly what it needs instead of re-discovering your codebase from scratch every time.

Works with Claude Code, Cursor, Windsurf, Zed, Codex, GitHub Copilot, and any tool that reads a file on startup.


Install

# Standard — includes MCP server for Claude Code / Cursor / Windsurf / Zed
pip install 'cram-ai[mcp]'

# With macOS menu bar app
pip install 'cram-ai[mcp,tray]'

# With additional model providers (OpenAI, Gemini, Bedrock, Ollama …)
pip install 'cram-ai[mcp,multi-provider]'

# Homebrew (macOS)
brew tap vishbay/cram-ai && brew install cram-ai

Quick start

cd your-repo

# 1. One-time setup
cram init
#   → scans your repo, generates ARCHITECTURE.md + SYMBOLS.md via a cheap model
#   → scaffolds DECISIONS.md + GOTCHAS.md for you to fill in
#   → installs a git post-commit hook to keep context fresh

# 2. Fill in the manual files — this is where cram's real value lives
vim .cram-ai-context/DECISIONS.md   # architectural invariants, naming conventions
vim .cram-ai-context/GOTCHAS.md     # non-obvious traps that burned your team

# 3. Commit so teammates get the context layer too
git add .cram-ai-context/ CLAUDE.md
git commit -m "chore: init cram-ai context layer"

Then wire up your tool of choice — MCP or prefix injection.


The context layer

cram maintains five files in .cram-ai-context/. Two are auto-generated. Two are manual. One is generated per task.

your-repo/
└── .cram-ai-context/
    ├── ARCHITECTURE.md   ← auto  · repo structure, tech stack, key files
    ├── SYMBOLS.md        ← auto  · every source file mapped to its public identifiers
    ├── DECISIONS.md      ← manual · architectural commitments your team has made
    ├── GOTCHAS.md        ← manual · non-obvious traps, foot-guns, things that burn people
    └── CURRENT_TASK.md   ← per-task · focused excerpts for the current work

Auto-generated (ARCHITECTURE.md, SYMBOLS.md):

  • Generated by cram init, refreshed automatically via the git post-commit hook after each commit
  • SYMBOLS.md uses regex — deterministic, no LLM cost, byte-stable across runs
  • ARCHITECTURE.md uses a cheap model (Haiku / Gemini Flash / GPT-4o Mini)

Manual (DECISIONS.md, GOTCHAS.md):

  • Scaffolded by cram init — you fill them in over time
  • DECISIONS.md: "we use X", "never do Y", naming conventions, non-obvious invariants
  • GOTCHAS.md: silent side effects, middleware gaps, surprising nulls — things grep can't tell you
  • Append entries with cram decide "..." or cram gotcha "..."

Per-task (CURRENT_TASK.md): When you call get_context("task description"), cram runs a four-stage pipeline: reads the symbol index, asks a cheap model to identify relevant files, extracts identifier-focused excerpts from those files, and writes the result. Typically 800–1,500 tokens covering exactly the files that matter for the task.

What this replaces: the agent spending 3–5 tool calls grep-ing and reading files to orient itself at the start of every session. With cram the context arrives in one call and includes knowledge — decisions, gotchas — that the agent can't discover by searching.


MCP delivery

If your tool supports MCP (Claude Code, Cursor, Windsurf, Zed, Codex CLI), wire up the cram MCP server once and the tool can call context tools directly.

One-time server config (same format for all MCP clients):

{
  "mcpServers": {
    "cram-ai": {
      "command": "cram",
      "args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
    }
  }
}
Client Config file
Claude Code .claude/settings.json
Cursor .cursor/mcp.json or Cursor Settings → MCP
Windsurf Windsurf MCP settings
Zed Zed assistant settings → context servers
Codex CLI ~/.codex/config.yamlmcpServers

Available MCP tools:

Tool What it returns When to call it
get_context(task='') Runs symbol lookup → file selection → excerpt extraction. No-arg: returns last CURRENT_TASK.md without re-running the LLM. First thing every session
get_architecture() ARCHITECTURE.md — repo structure, tech stack, key files Orientation in an unfamiliar area
get_symbols(query='') SYMBOLS.md — source files mapped to public identifiers, optionally filtered Finding where a function is defined
get_decisions() DECISIONS.md — architectural commitments Before making a design choice
get_gotchas() GOTCHAS.md — non-obvious traps and foot-guns Before touching an unfamiliar area
add_file(path, identifiers='') Appends a file's excerpts to CURRENT_TASK.md When a mid-task discovery needs new context

Prefix injection

For tools that don't support MCP, run cram task "..." --target <tool> before your session. cram writes focused context into the file the tool auto-loads at startup.

# GitHub Copilot
cram task "add pagination to the users endpoint" --target copilot
# → writes to .github/cram-task.md (one-time: add an include line to copilot-instructions.md)

# Cursor (no-MCP fallback)
cram task "add pagination to the users endpoint" --target cursor
# → writes to .cursor/rules/cram-task.md

# Windsurf (no-MCP fallback)
cram task "add pagination to the users endpoint" --target windsurf
# → writes to .windsurf/rules/cram-task.md

# All targets at once
cram task "add pagination to the users endpoint" --target all
Target File written
cursor .cursor/rules/cram-task.md
windsurf .windsurf/rules/cram-task.md
copilot .github/cram-task.md
codex .cram-ai-context/AGENTS.md
claude CLAUDE.md (escape hatch; prefer MCP for Claude Code)
all All of the above

Daily workflow

# Before a session — MCP path (Claude Code / Cursor / Windsurf / Zed)
# Nothing to run. The agent calls get_context() itself.

# Before a session — injection path (Copilot / no-MCP tools)
cram task "fix the rate limiter" --target copilot

# Log a decision while working
cram decide "use cursor-based pagination, not offset — offset breaks under concurrent writes"

# Log a gotcha you just found
cram gotcha "the users.email column is nullable in prod despite NOT NULL in schema.prisma"

# Extend grace period if you commit mid-task (prevents context reset)
cram continue

# Check context freshness
cram status

After every commit the git post-commit hook runs cram sync automatically to refresh ARCHITECTURE.md and SYMBOLS.md. A session grace period prevents sync from firing while you're mid-task.


CLI reference

Command What it does
cram init [path] [--team] One-time setup — scans repo, generates context files, installs git hook
cram mcp [--repo PATH] Start MCP server (stdio). Wire into your tool's settings once; clients launch it automatically.
cram task "..." [--target T] Run context pipeline, write CURRENT_TASK.md, optionally inject into tool's auto-loaded file
cram sync [path] Refresh ARCHITECTURE.md + SYMBOLS.md from current repo state
cram decide "..." [path] Append a dated architectural decision to DECISIONS.md
cram gotcha "..." [path] Append a non-obvious trap to GOTCHAS.md
cram continue [path] Extend grace period — keep context across a mid-task commit
cram status [path] Show each context file with age, line count, staleness warning
cram benchmark [path] Show token and cost comparison across delivery strategies
cram doctor [path] Health check — models, hooks, git, context files
cram hook install|uninstall Manage the git post-commit hook manually
cram menu [path] Launch macOS menu bar app
cram autostart on|off Start menu bar app at login (macOS)

Model providers

cram uses a cheap model for its maintenance calls (generating ARCHITECTURE.md, selecting files, extracting excerpts). Set AICONTEXT_MODEL to any provider:

# Inside Claude Code — zero config, uses session credentials
cram init

# Anthropic API key
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini

# Ollama (local, free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies (install cram-ai[multi-provider]).


Environment variables

Variable Default Description
AICONTEXT_MODEL auto-detected Model for context tasks — bare alias (haiku) or provider/model
ANTHROPIC_API_KEY Optional inside Claude Code (uses session credentials)
AICONTEXT_MAX_FILES 5 Max files included in CURRENT_TASK.md per task
AICONTEXT_MAX_LINES 300 Max lines per file when extracting excerpts
AICONTEXT_TASKS_PER_SESSION 4 Assumed tasks per cache window (used by cram benchmark)
CRAM_TASK_GRACE_SECONDS 600 Seconds after cram task before a commit resets context

💰 Real-world token consumption

Without context pre-loading, an agent spends the first few exchanges of every session re-discovering the codebase — reading files, running searches, building orientation from scratch.

What a typical session consumes (no cram):

Phase What happens Tokens
Session start System prompt + tool definitions + rules files 3–8K
Orientation find / grep / read calls to discover relevant files cold 20–60K
Active work Conversation, edits, test runs 20–50K
Output Code written, explanations 5–15K
Per task total 50–130K

The orientation phase (30–50% of every session) is pure re-discovery overhead — the agent reads 10–20 files cold before it knows where to work. cram replaces that with one get_context() call returning ~1–2K tokens of targeted excerpts.

Scaled to a full day:

Usage Tasks/day Est. tokens/day Cost at Sonnet 4.6
Light 2 short tasks ~150K ~$0.50
Average 4 feature tasks ~400K ~$1.20
Heavy 6+ complex tasks ~900K ~$2.70

For the average developer (~400K tokens/day), roughly 120–200K tokens/day is orientation overhead — paid on every session, for every task, from scratch.

cram tray shows live daily estimates based on your actual repo size (4 sessions × 4 tasks/day). Use the model selector in the tray popup to see estimates for your model:

Model Base input price
Haiku 4.5 $1.00 / MTok
Sonnet 4.6 $3.00 / MTok
Opus 4 $5.00 / MTok
Metric What it shows
Context reduction How much smaller cram context is vs full repo scan
Without cram/day Estimated daily cost if the agent reads the full repo each task
With cram/day Estimated daily cost using the frozen context layer (MCP path)
Saved/day The difference — scales with repo size and session frequency

Run cram benchmark for a full breakdown across all three delivery strategies and all model tiers.


💸 Claude Code users: cache-write bonus

This section is specific to Claude Code + Anthropic. The context layer is useful for any tool, but Claude's prompt caching gives MCP delivery an additional cost advantage.

Anthropic's prompt cache has a 5-minute TTL. Content in the conversation prefix gets cache-written at 1.25× the base input price on every new session and every TTL expiry. Content that doesn't touch the prefix — like MCP tool results — isn't.

Prefix injection vs MCP:

Prefix injection (--target claude) MCP (get_context())
Where context lands CLAUDE.md → front of prefix Conversation tail (tool result)
Cache writes per session N × task context tokens 1 × tool definitions (~1–2K tokens)
Per-task context cost 1.25× write per task change 0.1× read after first session write
10K-token context, 4 tasks ~$0.09–0.15 in cache writes ~$0.01 in cache writes

The larger your context and the more tasks per session, the more the MCP path saves.

Run cram benchmark to model the exact numbers for your repo.

The floor check: the frozen prefix must exceed 2,048 tokens (Sonnet 4.6) or 4,096 tokens (Opus 4.8 / Haiku 4.5) to cache at all. cram benchmark flags this if your context files are below the threshold.


Running tests

pip install pytest
pytest

No API key required — all model calls are mocked.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cram_ai-0.2.1.tar.gz (76.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cram_ai-0.2.1-py3-none-any.whl (76.5 kB view details)

Uploaded Python 3

File details

Details for the file cram_ai-0.2.1.tar.gz.

File metadata

  • Download URL: cram_ai-0.2.1.tar.gz
  • Upload date:
  • Size: 76.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 087cceb38b9fa5ed5ba7906258bc69e57aa286fa4f5ba8449fbd6fad8a4099b2
MD5 f98beecdce4531c800de216bfd1fecac
BLAKE2b-256 84537553bcbf130c6226406827fd15a04c633ef55429300374981258bc79eb55

See more details on using hashes here.

File details

Details for the file cram_ai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cram_ai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 76.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a638c93de429d1bf16e4bfbbb0a34d6b62b0eacae18f53306d243d8fee9ffe8
MD5 012e0eca1be8c423322ed51a975bf0df
BLAKE2b-256 a5206bf84815f02f8509d0669ee514ca921566f6672c70232211e3fcc824bf37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page