Skip to main content

RAG pipeline for Claude Code — indexes your codebase and exposes it as an MCP server

Project description

ccrag

A RAG pipeline for Claude Code. Indexes your codebase locally and exposes it as an MCP server so Claude Code can semantically search your code during sessions.

How it works

ccrag index .        →  AST-chunks your code + embeds with sentence-transformers
                         stores vectors in .ccrag/ (LanceDB, stays in your repo)

ccrag serve .        →  MCP server (stdio) that Claude Code connects to
                         exposes search_codebase("how does auth work?") as a tool

Claude Code          →  automatically calls search_codebase when it needs context
                         gets back file paths, line ranges, and code snippets

Install

pip install ccrag

Usage

1. Index your codebase

cd /your/project
ccrag index .

2. Get the MCP config snippet

ccrag mcp-config .

This prints a JSON block to paste into .claude/settings.json:

{
  "mcpServers": {
    "ccrag": {
      "command": "/path/to/ccrag",
      "args": ["serve", "/your/project"]
    }
  }
}

3. Start a Claude Code session

Claude Code will now automatically call search_codebase whenever it needs to understand the codebase. No changes to your prompts needed.

Commands

Command Description
ccrag index [PATH] Index or re-index the codebase
ccrag index --force [PATH] Drop and rebuild the index
ccrag serve [PATH] Start the MCP server (used by Claude Code)
ccrag watch [PATH] Watch for file changes and re-index incrementally
ccrag status [PATH] Show index stats (files, chunks)
ccrag mcp-config [PATH] Print the settings.json snippet

How the index works

  • Chunking: Uses tree-sitter to split code at function/class/method boundaries — not arbitrary line windows. Falls back to line-window chunking for unsupported languages.
  • Embeddings: mixedbread-ai/mxbai-embed-large-v1 via sentence-transformers — 1024-dim, top-tier MTEB retrieval, runs entirely locally with no API keys and no remote code. Override with --model <name> (e.g. a lighter model like BAAI/bge-base-en-v1.5, or a code-specialized one like jinaai/jina-embeddings-v2-base-code, which needs trust_remote_code=True and a compatible transformers).
  • Model cache: weights are cached in .ccrag/models/ on first run and reused offline afterward — downloaded once, never again.
  • Storage: LanceDB in .ccrag/ inside your project. Add .ccrag/ to .gitignore.
  • Search: Cosine similarity over dense embeddings. Returns top-K chunks with file path, line range, language, and source.

Supported languages

Python, JavaScript, TypeScript, TSX, Go, Rust, Java, C, C++, Ruby, PHP, C#, Swift, Kotlin, Scala, Lua, Elixir, Haskell, OCaml, Bash, YAML, JSON, TOML, Markdown, and more.

MCP tools exposed

Tool Description
search_codebase(query, n_results=8) Semantic search over indexed code
codebase_stats() Number of indexed files and chunks

Incremental updates

ccrag index . is incremental. It keeps a manifest of per-file content hashes in .ccrag/manifest.json and on each run:

  • skips unchanged files entirely — no re-chunking, no re-embedding (the expensive step);
  • re-embeds only changed/new files;
  • prunes chunks for deleted files;
  • rebuilds from scratch if you switch --model (old vectors are incompatible).

So re-running it after editing a few files only touches those files:

$ ccrag index .
Found 20 files (1 new/changed, 19 unchanged, 0 removed)
Done. 3 chunks from 1 file(s) embedded, 0 file(s) removed.

For continuous updates, run the watcher, which re-indexes each file on save (and keeps the manifest in sync):

ccrag watch .

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccrag-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ccrag-0.1.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file ccrag-0.1.0.tar.gz.

File metadata

  • Download URL: ccrag-0.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ccrag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 74c6a321254d30233667e28a8285ea79df6b115b4ac7f0d7071225fe1b0cdeed
MD5 d464b0051b63a39536e5a1715677ed4c
BLAKE2b-256 bc6050777c600588d06c3c6a1f727add1dc688f2b6e07b0ebe7598b9962a2740

See more details on using hashes here.

File details

Details for the file ccrag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ccrag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ccrag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09f1ee27582ae8a59bdd28def52f6bd44d5ca7b612415f83030f04fc2b0c9229
MD5 0a2972ed2651fe9c7a101f2731de7604
BLAKE2b-256 9c0ba11eb1c0982840ce4bb2b3536ecdbaa269f03d4e6f05c75d1241e44064d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page