MCP server for codebase Q&A powered by RAG — index any git repo and ask questions about the code
Project description
Codebase Q&A MCP Server
An MCP server that lets AI assistants answer questions about any codebase using RAG (Retrieval-Augmented Generation). Index any git repository locally, then ask natural-language questions — answers are grounded in actual source code.
Built with FastMCP, ChromaDB, and sentence-transformers.
How It Works
┌─────────────┐ ┌──────────────────────────────────────────────┐
│ AI Client │ │ Codebase Q&A MCP Server │
│ (Claude Code,│ MCP │ │
│ Claude │◄───►│ ┌─────────┐ ┌──────────┐ ┌───────────┐ │
│ Desktop, │ │ │ Load & │─►│ Embed │─►│ ChromaDB │ │
│ Cursor) │ │ │ Chunk │ │(MiniLM) │ │ (persist) │ │
│ │ │ └─────────┘ └──────────┘ └───────────┘ │
└─────────────┘ └──────────────────────────────────────────────┘
- Index — Point the server at a git repo. It reads all source files, splits them into chunks, generates embeddings with
all-MiniLM-L6-v2, and stores them in a persistent ChromaDB instance. - Query — Ask a natural-language question. The server finds the most relevant code chunks via semantic similarity and returns them with file paths and metadata.
- Update — After new commits, run an incremental update that only re-indexes changed files (using
git diff).
Features
- 7 MCP tools for full codebase Q&A workflow
- Local embeddings — no API keys needed (runs
all-MiniLM-L6-v2on your machine) - Persistent storage — ChromaDB persists indexes across sessions
- Incremental updates — only re-index files that changed since last commit
- Multi-project support — index multiple repos and switch between them
- File filtering — narrow queries to specific paths (e.g. only
auth-related files) - 30+ file types supported (Python, JS/TS, Go, Rust, Java, C/C++, Markdown, YAML, SQL, and more)
Installation
Option 1: uvx (recommended)
uvx codebase-qa-mcp
Option 2: pip
pip install codebase-qa-mcp
Option 3: From source
git clone https://github.com/gokul-viswanathan/codebase-qa-mcp.git
cd codebase-qa-mcp
pip install -e .
Option 4: agentregistry
# Install arctl
curl -fsSL https://raw.githubusercontent.com/agentregistry-dev/agentregistry/main/scripts/get-arctl | bash
# Deploy the server
arctl deploy codebase-qa-mcp
# Auto-configure your IDE
arctl configure claude-desktop
Quick Start
1. Add to Claude Code
claude mcp add codebase-qa-mcp -- codebase-qa-mcp
Or if running from source:
claude mcp add codebase-qa-mcp -- /path/to/codebase-qa-mcp/.venv/bin/python -m codebase_qa_mcp.server
2. Add to Claude Desktop
Add this to your Claude Desktop config (~/.config/claude-desktop/config.json):
{
"mcpServers": {
"codebase-qa-mcp": {
"command": "codebase-qa-mcp",
"args": []
}
}
}
Or from source:
{
"mcpServers": {
"codebase-qa-mcp": {
"command": "/path/to/codebase-qa-mcp/.venv/bin/python",
"args": ["-m", "codebase_qa_mcp.server"]
}
}
}
3. Use it
Once connected, your AI assistant can use these tools:
You: "Index the repo at /home/user/my-project"
AI: → calls index_repository("/home/user/my-project")
✓ Indexed 42 files → 380 chunks
You: "How does authentication work in this codebase?"
AI: → calls query_codebase("how does authentication work")
Returns relevant code chunks from auth-related files
You: "What files handle database migrations?"
AI: → calls query_codebase("database migrations", file_filter="migrations")
Returns chunks specifically from migration files
MCP Tools Reference
| Tool | Description |
|---|---|
index_repository |
Full-index a git repo into ChromaDB. Reads all source files, chunks, embeds, and stores them. |
update_index |
Incrementally update using git diff. Only re-indexes changed files. |
query_codebase |
Semantic search for code chunks relevant to a natural-language question. |
switch_project |
Switch to a previously indexed repo without re-indexing. |
list_indexed_files |
List all file paths in the current project's index. |
get_index_stats |
Get stats: total chunks, total files, last indexed commit. |
list_projects |
List all previously indexed repositories. |
Configuration
The server uses sensible defaults but you can customize via environment variables:
| Variable | Default | Description |
|---|---|---|
CODEBASE_QA_CHROMA_PATH |
~/.local/share/codebase-qa-mcp/chroma_db |
Where ChromaDB stores its data |
Built-in defaults:
- Chunk size: 500 characters with 50-character overlap
- Embedding model:
all-MiniLM-L6-v2(384-dimensional, runs locally) - Top-K results: 5
Project Structure
codebase-qa-mcp/
├── pyproject.toml # Package config, deps, entry point
├── Dockerfile # Container build for agentregistry
└── src/codebase_qa_mcp/
├── __init__.py
├── config.py # Constants (chunk size, model, extensions)
├── indexer.py # Load → chunk → embed → store + incremental updates
├── retriever.py # Query, list files, stats, list projects
└── server.py # FastMCP server with 7 tools
How RAG Works Here
Retrieval-Augmented Generation (RAG) grounds AI responses in actual source code rather than relying on the model's training data:
- Chunking — Source files are split into ~500-character chunks using
RecursiveCharacterTextSplitterfrom LangChain, which respects code boundaries. - Embedding — Each chunk is converted to a 384-dimensional vector using
all-MiniLM-L6-v2, a fast sentence-transformer model that runs entirely on your machine. - Storage — Vectors are stored in ChromaDB with metadata (file path, chunk index, language). The database persists to disk so indexes survive restarts.
- Retrieval — When you ask a question, it's embedded with the same model and compared against stored chunks via cosine similarity. The top-K most relevant chunks are returned.
- Generation — The AI client (Claude, etc.) receives the relevant code chunks and uses them to generate an accurate, grounded answer.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codebase_qa_mcp-0.1.0.tar.gz.
File metadata
- Download URL: codebase_qa_mcp-0.1.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1befac6525672d2934754dd1311b9fd0e8bc6b74c8b56d2ed4829f1b33cbff69
|
|
| MD5 |
5387664058874d717c36875e5f3c3d00
|
|
| BLAKE2b-256 |
10bf2012ca7d367aee0ccb3e615b6382672e680931994da1de631ef8e5c973c9
|
File details
Details for the file codebase_qa_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codebase_qa_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d787ef6a6b5ec83142337b8c2dbcbd957da5f3a89a5b254398d4b330e1fd155
|
|
| MD5 |
44753e2162889b29d9523c6cbeae54e4
|
|
| BLAKE2b-256 |
53b9df98998c94b18cb21ee169a8fd6d9252b0a6c88e7148c958d2f9cdedf006
|