Skip to main content

Cut AI coding token costs by 96-98% — curated context, MCP server, and tray app for AI coding tools

Project description

cram-ai

Slash AI coding token costs by injecting only what the model needs — nothing more.

AI coding tools auto-index your entire repo at session start. That indexing generates cache writes — the most expensive token type (3–4× the cost of reads). cram-ai replaces auto-indexing with a small set of curated files that give the model exactly what it needs: repo structure, key decisions, and focused excerpts of only the files relevant to your current task.


Benchmarks

cram-ai itself (31 source files, Python CLI tool)

Tokens Sonnet cost/session Opus cost/session
Without cram — full repo auto-indexed 49,683 $0.186 $0.932
Without cram — orientation set only¹ 19,687 $0.074 $0.369
With cram — ARCHITECTURE + SYMBOLS + task context 1,898 $0.007 $0.036

96% token reduction. $0.18 saved per session. $18 over 100 sessions (Sonnet).


pallets/flask (118 source files, Python web framework)

Tokens Sonnet cost/session Opus cost/session
Without cram — full repo auto-indexed 171,641 $0.644 $3.22
Without cram — orientation set only¹ 60,863 $0.228 $1.14
With cram — ARCHITECTURE + SYMBOLS + task context 5,929 $0.022 $0.111

96.5% token reduction. $0.62 saved per session. $62 over 100 sessions (Sonnet).


hoppscotch/hoppscotch (2,151 source files, TypeScript monorepo)

Tokens Sonnet cost/session
Without cram — full repo auto-indexed 418,697 $1.57
With cram 7,239 $0.027

98.3% reduction. $154 saved over 100 sessions (Sonnet).


¹ Orientation set = file tree + README + pyproject.toml/package.json + 5 largest source files. A realistic estimate for tools that don't index everything.
Pricing: Claude Sonnet 4.6 cache write $3.75/M, Opus 4.8 $18.75/M. Savings scale with team size and session frequency.


How it works

AI agents spend most tokens on orientation — finding relevant files, understanding structure, reading configs. cram-ai replaces that with a curated map the model reads instead of building itself.

your-repo/
└── .cram-ai-context/
    ├── ARCHITECTURE.md   ← repo structure, tech stack, key files (auto-generated by Haiku)
    ├── DECISIONS.md      ← architectural decisions you want the AI to respect
    ├── SYMBOLS.md        ← public function/class index across all source files (auto-generated)
    └── CURRENT_TASK.md   ← per-session: task + focused excerpts of relevant files

SYMBOLS.md is the key accuracy improvement. Rather than asking a model to guess which files matter based on filenames alone, cram maps every source file to its public identifiers (api/routes.py: handle_rate_limit, check_throttle, apply_backoff). The model uses that map to select files and identify the exact functions to excerpt — so "fix the rate limiter" finds check_throttle even if the words don't match.

cram task "..." runs before every session:

  1. [1/4] Loads SYMBOLS.md — 455 identifiers across 65 files, zero LLM calls
  2. [2/4] Sends architecture + symbol index to Haiku → model returns path | RelevantFunc, OtherClass
  3. [3/4] Extracts identifier-focused excerpts — only the lines that contain those functions, plus context window
  4. [4/4] Writes to your tool's instruction file, warns if below cache minimum for your model

All stages stream live to the popup so you see exactly what's happening.


Quick start

pip install cram-ai

cd your-repo
cram init                              # one-time setup — scans repo, generates docs, indexes symbols
cram task "add login validation"       # run before every session
# → context pre-loaded into your AI tool
cram sync                              # run after every commit (or fires automatically via git hook)

First command to context ready: under 60 seconds.


CLI commands

Command When to run What it does
cram init Once per repo Scans structure, generates ARCHITECTURE.md + SYMBOLS.md via Haiku
cram task "..." Before every session Identifies relevant files by symbol, inlines focused excerpts
cram continue Mid-session before committing Extends grace period — prevents context reset on mid-task commits
cram sync After every commit Updates ARCHITECTURE.md + SYMBOLS.md from git diff
cram decide "..." When making arch choices Appends a dated decision entry to DECISIONS.md
cram status Anytime Shows .cram-ai-context/ files and freshness

Provider support

The tool is model-agnostic. Set AICONTEXT_MODEL to any provider:

# Claude CLI (default — works inside Claude Code with no API key)
cram init

# Anthropic SDK
export ANTHROPIC_API_KEY=sk-...
export AICONTEXT_MODEL=anthropic/claude-haiku-4-5-20251001
cram init

# OpenAI
export OPENAI_API_KEY=sk-...
export AICONTEXT_MODEL=openai/gpt-4o-mini
cram init

# Google Gemini
export GEMINI_API_KEY=...
export AICONTEXT_MODEL=gemini/gemini-2.0-flash
cram init

# Local (Ollama — free, no key needed)
export AICONTEXT_MODEL=ollama/mistral
cram init

Also supports: AWS Bedrock, GCP Vertex AI, Azure OpenAI, custom LiteLLM proxies — auto-discovered from env/credentials.


Session discipline

The context files handle orientation. These rules handle the rest:

  1. Run cram task "..." before every session — never let the model hunt for files itself.
  2. Hard session boundary — end the session the moment a feature works. New code = growing context = rising cost.
  3. Mid-task commit? Run cram continue first to extend the grace period.
  4. Run cram sync after every commit — keeps ARCHITECTURE.md and SYMBOLS.md accurate.
  5. Architectural decision? Run cram decide "use Redis for sessions" — keeps DECISIONS.md current without opening the file.

Environment variables

Variable Default Description
AICONTEXT_MODEL auto-detected Model for context tasks — bare alias or provider/model
ANTHROPIC_API_KEY Anthropic API key (optional inside Claude Code)
AICONTEXT_MAX_FILES 5 Max files inlined per task
AICONTEXT_MAX_LINES 300 Max lines per ARCHITECTURE.md
AICONTEXT_MAX_EXCERPT_LINES 80 Max lines excerpted per file in CURRENT_TASK.md
CRAM_TASK_GRACE_SECONDS 600 Seconds after cram task before a commit resets context

Works with any AI coding tool

Tool How context loads
Claude Code Reads .cram-ai-context/ recursively — all files auto-loaded
Cursor Writes to .cursor/rules/cram-task.md — auto-loaded by Cursor
Windsurf Writes to .windsurf/rules/cram-task.md — auto-loaded
Codex Writes to .cram-ai-context/AGENTS.md — auto-loaded
GitHub Copilot Writes to .github/cram-task.md — include once in copilot-instructions.md

For non-Claude tools, cram automatically prepends a compact architecture summary so the model has repo orientation even without recursive file loading.


Running tests

pip install pytest
pytest

57 passing tests, no API key required. All model calls are mocked.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cram_ai-0.1.0.tar.gz (62.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cram_ai-0.1.0-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file cram_ai-0.1.0.tar.gz.

File metadata

  • Download URL: cram_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 62.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4b761e618c9ca5f173b75cab659ef173f6ba38125364c3e768c080341d0c628
MD5 d4d53002cf8bbbd8c3e6fd312e4844cc
BLAKE2b-256 d716da6faacd5fed219309295d201d07e833d9126d954bccc58020e83c5d6fdc

See more details on using hashes here.

File details

Details for the file cram_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cram_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for cram_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f917c9f0cec6fbb31cb61c9872eb0b4925a97141f9459da86e0e0bfd8cbaaa6
MD5 b8b0e6ec0766f89c3c75144d35555119
BLAKE2b-256 e94aa917f178cb59b7cedfc8b653cf0f735116fb6d8fafd6fb613ea20b01a67b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page