Skip to main content

Stable context layer for AI coding tools — Haiku-generated, delivered via MCP to keep the prefix tiny

Project description

cram-ai

Stable context layer for AI coding tools — generated once by a cheap model, delivered cheaply on every session.

Install

# pip (MCP support required for Claude Code)
pip install 'cram-ai[mcp]'

# Homebrew (macOS)
brew tap vishbay/cram-ai
brew install cram-ai

What it does

cram-ai generates three files from your repo (via Haiku or equivalent cheap model):

.cram-ai-context/
├── ARCHITECTURE.md   — repo structure, tech stack, key files
├── DECISIONS.md      — architectural decisions the AI should respect
└── SYMBOLS.md        — every source file mapped to its public functions and classes

At session start you call get_context("your task") via the MCP server. cram picks the relevant files using the symbol index, extracts focused excerpts, and returns them as a tool result. The model gets exactly what it needs — no repo auto-indexing, no hunting.


Why MCP, not CLAUDE.md

Anthropic's prompt cache has a 5-minute TTL. Any content in the conversation prefix gets cache-written on every new session and on every TTL expiry. Cache writes cost 1.25× the base input price. Injecting 10K tokens of context into CLAUDE.md means 10K tokens of cache writes fire every time you open a new session — even if the content hasn't changed.

MCP tool results land in the conversation tail, not the prefix. They don't expand what gets cache-written on session start. The prefix stays tiny (tool definitions only, ~1–2K tokens), written once, read cheaply thereafter.

Run cram benchmark to see the exact cost difference for your repo.


Quick start

pip install 'cram-ai[mcp]'

cd your-repo
cram init                  # one-time: scans repo, generates context files, installs git hook

Add cram-ai to your .claude/settings.json (or see CLAUDE.md for the snippet after init):

{
  "mcpServers": {
    "cram-ai": {
      "command": "cram",
      "args": ["mcp", "--repo", "/absolute/path/to/your-repo"]
    }
  }
}

Then at the start of each Claude Code session:

get_context("your task description")

That's it. The tool runs the full pipeline — symbol lookup, file selection, excerpt extraction — and returns the context as a tool result.


CLI commands

Command When to run What it does
cram init Once per repo Scans structure, generates ARCHITECTURE.md + SYMBOLS.md via Haiku
cram mcp On session start Starts the MCP server (stdio) — wire this into your editor settings
cram sync After every commit Updates ARCHITECTURE.md + SYMBOLS.md from git diff
cram decide "..." When making arch choices Appends a dated decision entry to DECISIONS.md
cram task "..." Optional CLI path Runs the context pipeline and writes CURRENT_TASK.md without MCP
cram benchmark Anytime Shows three-scenario cache-write cost model for your repo

cram task --inject "..." writes task content into CLAUDE.md directly (backward compat for tools without MCP support).


Provider support

The context generation (init, sync) is model-agnostic. Set AICONTEXT_MODEL to any provider:

# Claude CLI (default — works inside Claude Code with no API key)
cram init

# Anthropic SDK
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5-20251001
cram init

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
cram init

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
cram init

# Local (Ollama — free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies.


Tool support

Tool Context delivery
Claude Code MCP server — get_context() tool result, prefix stays tiny
Cursor Prefix injection — writes to .cursor/rules/cram-task.md
Windsurf Prefix injection — writes to .windsurf/rules/cram-task.md
Codex Prefix injection — writes to .cram-ai-context/AGENTS.md
GitHub Copilot Prefix injection — writes to .github/cram-task.md

Cursor, Windsurf, Codex, and Copilot have no MCP option — they use prefix injection via cram task. The cache-write cost savings only apply to the MCP (Claude Code) path.


Benchmarks

Run cram benchmark in your repo for exact numbers. Three scenarios are modelled:

Scenario What gets cache-written per session
No cram — model auto-indexes repo N × full repo tokens
cram prefix-injected — content in CLAUDE.md N × frozen context tokens
cram MCP-delivered — content as tool result 1 × frozen context tokens + (N−1) reads

At Sonnet 4.6 pricing for a medium repo (~50K tokens, 4 tasks/session):

Scenario Cache writes $/session $/100 sessions
No cram ~200,000 tok ~$0.94 ~$94
Prefix-injected ~40,000 tok ~$0.19 ~$19
MCP-delivered ~10,000 tok ~$0.05 ~$5

The MCP path reduces cache-write cost ~20× vs no cram and ~4× vs prefix injection. Savings scale with repo size and session frequency. Run cram benchmark against your actual repo to get numbers grounded in your file sizes.

Note: the frozen prefix must exceed the model's cache minimum (2,048 tokens for Sonnet 4.6, 4,096 for Opus and Haiku) to cache at all. If cram benchmark flags this, run cram sync to rebuild the context files with more detail.


Environment variables

Variable Default Description
AICONTEXT_MODEL auto-detected Model for context tasks — bare alias or provider/model
ANTHROPIC_API_KEY Anthropic API key (optional inside Claude Code)
AICONTEXT_MAX_FILES 5 Max files inlined per task
AICONTEXT_MAX_LINES 300 Max lines per ARCHITECTURE.md
AICONTEXT_MAX_EXCERPT_LINES 80 Max lines excerpted per file
CRAM_TASK_GRACE_SECONDS 600 Seconds after cram task before a commit resets context

Upgrading from v0.1.0

v0.2.0 changes the Claude Code delivery path from CLAUDE.md prefix injection to MCP tool results. This is the main behavioral change:

v0.1.0 v0.2.0
Claude Code delivery cram task writes context into CLAUDE.md get_context() MCP tool, CLAUDE.md is a config pointer
cram task Writes to CLAUDE.md Writes CURRENT_TASK.md only, prints MCP reminder
cram task --inject n/a Restores old CLAUDE.md injection behavior
Other tools (Cursor, Windsurf, etc.) Unchanged Unchanged

If you had cram task wired into a pre-session script:

  • For Claude Code: remove it. Use the MCP get_context() tool instead.
  • For other tools: keep it as-is, or add --inject if you want CLAUDE.md injection preserved.

Running tests

pip install pytest
pytest

99 passing tests, no API key required. All model calls are mocked.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cram_ai-0.2.0.tar.gz (67.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cram_ai-0.2.0-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file cram_ai-0.2.0.tar.gz.

File metadata

  • Download URL: cram_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 67.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f319d499ae596aecd4f1bf8fa94813dcdfc383d01b469814fa6967bb3aa40786
MD5 066097d1ee3d75ac7c5fe7e66b6b7c29
BLAKE2b-256 ebc63916ba917ef0574e48cede7a351739a2d8025eaf9f36b59904530610cddb

See more details on using hashes here.

File details

Details for the file cram_ai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cram_ai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 68.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb08d3c1b4c2853815922fb6c6c2fbb60f5e7d3a644bd1f4b57976e25c5279d6
MD5 5a18eededd31e051e69bbaa40f73a15e
BLAKE2b-256 5a505dc7daf085534478c61edfa973ff9e2e192d2710c52a0fe2c4cf9e5169c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page