Skip to main content

MCP server for codebase Q&A powered by RAG — index any git repo and ask questions about the code

Project description

Codebase Q&A MCP Server

An MCP server that lets AI assistants answer questions about any codebase using RAG (Retrieval-Augmented Generation). Index any git repository locally, then ask natural-language questions — answers are grounded in actual source code.

Built with FastMCP, ChromaDB, and sentence-transformers.

How It Works

┌─────────────┐     ┌──────────────────────────────────────────────┐
│  AI Client   │     │         Codebase Q&A MCP Server              │
│ (Claude Code,│ MCP │                                              │
│  Claude      │◄───►│  ┌─────────┐  ┌──────────┐  ┌───────────┐  │
│  Desktop,    │     │  │  Load &  │─►│ Embed    │─►│ ChromaDB  │  │
│  Cursor)     │     │  │  Chunk   │  │(MiniLM)  │  │ (persist) │  │
│              │     │  └─────────┘  └──────────┘  └───────────┘  │
└─────────────┘     └──────────────────────────────────────────────┘
  1. Index — Point the server at a git repo. It reads all source files, splits them into chunks, generates embeddings with all-MiniLM-L6-v2, and stores them in a persistent ChromaDB instance.
  2. Query — Ask a natural-language question. The server finds the most relevant code chunks via semantic similarity and returns them with file paths and metadata.
  3. Update — After new commits, run an incremental update that only re-indexes changed files (using git diff).

Features

  • 7 MCP tools for full codebase Q&A workflow
  • Local embeddings — no API keys needed (runs all-MiniLM-L6-v2 on your machine)
  • Persistent storage — ChromaDB persists indexes across sessions
  • Incremental updates — only re-index files that changed since last commit
  • Multi-project support — index multiple repos and switch between them
  • File filtering — narrow queries to specific paths (e.g. only auth-related files)
  • 30+ file types supported (Python, JS/TS, Go, Rust, Java, C/C++, Markdown, YAML, SQL, and more)

Installation

Option 1: uvx (recommended)

uvx codebase-qa-mcp

Option 2: pip

pip install codebase-qa-mcp

Option 3: From source

git clone https://github.com/gokul-viswanathan/codebase-qa-mcp.git
cd codebase-qa-mcp
pip install -e .

Option 4: agentregistry

# Install arctl
curl -fsSL https://raw.githubusercontent.com/agentregistry-dev/agentregistry/main/scripts/get-arctl | bash

# Deploy the server
arctl deploy codebase-qa-mcp

# Auto-configure your IDE
arctl configure claude-desktop

Quick Start

1. Add to Claude Code

claude mcp add codebase-qa-mcp -- codebase-qa-mcp

Or if running from source:

claude mcp add codebase-qa-mcp -- /path/to/codebase-qa-mcp/.venv/bin/python -m codebase_qa_mcp.server

2. Add to Claude Desktop

Add this to your Claude Desktop config (~/.config/claude-desktop/config.json):

{
  "mcpServers": {
    "codebase-qa-mcp": {
      "command": "codebase-qa-mcp",
      "args": []
    }
  }
}

Or from source:

{
  "mcpServers": {
    "codebase-qa-mcp": {
      "command": "/path/to/codebase-qa-mcp/.venv/bin/python",
      "args": ["-m", "codebase_qa_mcp.server"]
    }
  }
}

3. Use it

Once connected, your AI assistant can use these tools:

You: "Index the repo at /home/user/my-project"
AI:  → calls index_repository("/home/user/my-project")
     ✓ Indexed 42 files → 380 chunks

You: "How does authentication work in this codebase?"
AI:  → calls query_codebase("how does authentication work")
     Returns relevant code chunks from auth-related files

You: "What files handle database migrations?"
AI:  → calls query_codebase("database migrations", file_filter="migrations")
     Returns chunks specifically from migration files

MCP Tools Reference

Tool Description
index_repository Full-index a git repo into ChromaDB. Reads all source files, chunks, embeds, and stores them.
update_index Incrementally update using git diff. Only re-indexes changed files.
query_codebase Semantic search for code chunks relevant to a natural-language question.
switch_project Switch to a previously indexed repo without re-indexing.
list_indexed_files List all file paths in the current project's index.
get_index_stats Get stats: total chunks, total files, last indexed commit.
list_projects List all previously indexed repositories.

Configuration

The server uses sensible defaults but you can customize via environment variables:

Variable Default Description
CODEBASE_QA_CHROMA_PATH ~/.local/share/codebase-qa-mcp/chroma_db Where ChromaDB stores its data

Built-in defaults:

  • Chunk size: 500 characters with 50-character overlap
  • Embedding model: all-MiniLM-L6-v2 (384-dimensional, runs locally)
  • Top-K results: 5

Project Structure

codebase-qa-mcp/
├── pyproject.toml                  # Package config, deps, entry point
├── Dockerfile                      # Container build for agentregistry
└── src/codebase_qa_mcp/
    ├── __init__.py
    ├── config.py                   # Constants (chunk size, model, extensions)
    ├── indexer.py                  # Load → chunk → embed → store + incremental updates
    ├── retriever.py                # Query, list files, stats, list projects
    └── server.py                   # FastMCP server with 7 tools

How RAG Works Here

Retrieval-Augmented Generation (RAG) grounds AI responses in actual source code rather than relying on the model's training data:

  1. Chunking — Source files are split into ~500-character chunks using RecursiveCharacterTextSplitter from LangChain, which respects code boundaries.
  2. Embedding — Each chunk is converted to a 384-dimensional vector using all-MiniLM-L6-v2, a fast sentence-transformer model that runs entirely on your machine.
  3. Storage — Vectors are stored in ChromaDB with metadata (file path, chunk index, language). The database persists to disk so indexes survive restarts.
  4. Retrieval — When you ask a question, it's embedded with the same model and compared against stored chunks via cosine similarity. The top-K most relevant chunks are returned.
  5. Generation — The AI client (Claude, etc.) receives the relevant code chunks and uses them to generate an accurate, grounded answer.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebase_qa_mcp-0.1.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebase_qa_mcp-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file codebase_qa_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: codebase_qa_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for codebase_qa_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1befac6525672d2934754dd1311b9fd0e8bc6b74c8b56d2ed4829f1b33cbff69
MD5 5387664058874d717c36875e5f3c3d00
BLAKE2b-256 10bf2012ca7d367aee0ccb3e615b6382672e680931994da1de631ef8e5c973c9

See more details on using hashes here.

File details

Details for the file codebase_qa_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codebase_qa_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Manjaro Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for codebase_qa_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d787ef6a6b5ec83142337b8c2dbcbd957da5f3a89a5b254398d4b330e1fd155
MD5 44753e2162889b29d9523c6cbeae54e4
BLAKE2b-256 53b9df98998c94b18cb21ee169a8fd6d9252b0a6c88e7148c958d2f9cdedf006

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page