Structural codebase indexer with MCP server for AI-assisted development

These details have not been verified by PyPI

Project links

Project description

mcp-codebase-index

A structural codebase indexer with an MCP server for AI-assisted development. Zero runtime dependencies — uses Python's ast module for Python analysis and regex for TypeScript/JS. Requires Python 3.11+.

What It Does

Indexes codebases by parsing source files into structural metadata -- functions, classes, imports, dependency graphs, and cross-file call chains -- then exposes 17 query tools via the Model Context Protocol, enabling Claude Code and other MCP clients to navigate codebases efficiently without reading entire files.

Automatic incremental re-indexing: In git repositories, the index stays up to date automatically. Before every query, the server checks git diff and git status (~1-2ms). If files changed, only those files are re-parsed and the dependency graph is rebuilt. No need to manually call reindex after edits, branch switches, or pulls.

Language Support

Language	Method	Extracts
Python (`.py`)	AST parsing	Functions, classes, methods, imports, dependency graph
TypeScript/JS (`.ts`, `.tsx`, `.js`, `.jsx`)	Regex-based	Functions, arrow functions, classes, interfaces, type aliases, imports
Markdown/Text (`.md`, `.txt`, `.rst`)	Heading detection	Sections (# headings, underlines, numbered, ALL-CAPS)
Other	Generic	Line counts only

Installation

pip install "mcp-codebase-index[mcp]"

The [mcp] extra includes the MCP server dependency. Omit it if you only need the programmatic API.

For development (from a local clone):

pip install -e ".[dev,mcp]"

MCP Server

Running

# As a console script
PROJECT_ROOT=/path/to/project mcp-codebase-index

# As a Python module
PROJECT_ROOT=/path/to/project python -m mcp_codebase_index.server

PROJECT_ROOT specifies which directory to index. Defaults to the current working directory.

Configuring with OpenClaw

Install the package on the machine where OpenClaw is running:

# Local install
pip install "mcp-codebase-index[mcp]"

# Or inside a Docker container / remote VPS
docker exec -it openclaw bash
pip install "mcp-codebase-index[mcp]"

Add the MCP server to your OpenClaw agent config (openclaw.json):

{
  "agents": {
    "list": [{
      "id": "main",
      "mcp": {
        "servers": [
          {
            "name": "codebase-index",
            "command": "mcp-codebase-index",
            "env": {
              "PROJECT_ROOT": "/path/to/project"
            }
          }
        ]
      }
    }]
  }
}

Restart OpenClaw and verify the connection:

openclaw mcp list

All 17 tools will be available to your agent.

Performance note: OpenClaw's default MCP integration via mcporter spawns a fresh server process per tool call, which means the index is rebuilt each time (~1-2s for small projects, longer for large ones). For persistent connections, use the openclaw-mcp-adapter plugin, which connects once at startup and keeps the server running:

pip install openclaw-mcp-adapter

Configuring with Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "codebase-index": {
      "command": "mcp-codebase-index",
      "env": {
        "PROJECT_ROOT": "/path/to/project"
      }
    }
  }
}

Or using the Python module directly (useful if installed in a virtualenv):

{
  "mcpServers": {
    "codebase-index": {
      "command": "/path/to/.venv/bin/python3",
      "args": ["-m", "mcp_codebase_index.server"],
      "env": {
        "PROJECT_ROOT": "/path/to/project"
      }
    }
  }
}

Tip: Encourage the AI to Use Indexed Tools

By default, AI assistants may still read entire files instead of using the indexed tools. Add this to your project's CLAUDE.md (or equivalent instructions file) to nudge it:

Prefer using codebase-index MCP tools (get_project_summary, find_symbol, get_function_source,
get_class_source, get_dependencies, get_dependents, get_change_impact, get_call_chain, etc.)
over reading entire files when navigating the codebase.

This ensures the AI reaches for surgical indexed queries first, which saves tokens and context window.

Available Tools (17)

Tool	Description
`get_project_summary`	File count, packages, top classes/functions
`list_files`	List indexed files with optional glob filter
`get_structure_summary`	Structure of a file or the whole project
`get_functions`	List functions with name, lines, params
`get_classes`	List classes with name, lines, methods, bases
`get_imports`	List imports with module, names, line
`get_function_source`	Full source of a function/method
`get_class_source`	Full source of a class
`find_symbol`	Find where a symbol is defined (file, line, type)
`get_dependencies`	What a symbol calls/uses
`get_dependents`	What calls/uses a symbol
`get_change_impact`	Direct + transitive dependents
`get_call_chain`	Shortest dependency path (BFS)
`get_file_dependencies`	Files imported by a given file
`get_file_dependents`	Files that import from a given file
`search_codebase`	Regex search across all files (max 100 results)
`reindex`	Force full re-index (rarely needed — incremental updates happen automatically in git repos)

Benchmarks

Tested across four real-world projects on an M-series MacBook Pro, from a small project to CPython itself (1.1 million lines):

Index Build Performance

Project	Files	Lines	Functions	Classes	Index Time	Peak Memory
RMLPlus	36	7,762	237	55	0.9s	2.4 MB
FastAPI	2,556	332,160	4,139	617	5.7s	55 MB
Django	3,714	707,493	29,995	7,371	36.2s	126 MB
CPython	2,464	1,115,334	59,620	9,037	55.9s	197 MB

Query Response Size vs Total Source

Querying CPython — 41 million characters of source code:

Query	Response	Total Source	Reduction
`find_symbol("TestCase")`	67 chars	41,077,561 chars	99.9998%
`get_dependencies("compile")`	115 chars	41,077,561 chars	99.9997%
`get_change_impact("TestCase")`	16,812 chars	41,077,561 chars	99.96%
`get_function_source("compile")`	4,531 chars	41,077,561 chars	99.99%
`get_function_source("run_unittest")`	439 chars	41,077,561 chars	99.999%

find_symbol returns 54-67 characters regardless of whether the project is 7K lines or 1.1M lines. Response size scales with the answer, not the codebase.

get_change_impact("TestCase") on CPython found 154 direct dependents and 492 transitive dependents in 0.45ms — the kind of query that's impossible without a dependency graph. Use max_direct and max_transitive to cap output to your token budget.

Query Response Time

All targeted queries return in sub-millisecond time, even on CPython's 1.1M lines:

Query	RMLPlus	FastAPI	Django	CPython
`find_symbol`	0.01ms	0.01ms	0.03ms	0.08ms
`get_dependencies`	0.00ms	0.00ms	0.00ms	0.01ms
`get_change_impact`	0.02ms	0.00ms	2.81ms	0.45ms
`get_function_source`	0.01ms	0.02ms	0.03ms	0.10ms

Run the benchmarks yourself: python benchmarks/benchmark.py

How Is This Different from LSP?

LSP answers "where is this function?" — mcp-codebase-index answers "what happens if I change it?" LSP is point queries: one symbol, one file, one position. It can tell you where LLMClient is defined and who references it. But ask "what breaks transitively if I refactor LLMClient?" and LSP has nothing. This tool returns 11 direct dependents and 31 transitive impacts in a single call — 204 characters. To get the same answer from LSP, the AI would need to chain dozens of find-reference calls recursively, reading files at every step, burning thousands of tokens to reconstruct what the dependency graph already knows.

LSP also requires you to install a separate language server for every language in your project — pyright for Python, vtsls for TypeScript, gopls for Go. Each one is a heavyweight binary with its own dependencies and configuration. mcp-codebase-index is zero dependencies, handles Python + TypeScript/JS + Markdown out of the box, and every response has built-in token budget controls (max_results, max_lines). LSP was built for IDEs. This was built for AI.

Programmatic Usage

from mcp_codebase_index.project_indexer import ProjectIndexer
from mcp_codebase_index.query_api import create_project_query_functions

indexer = ProjectIndexer("/path/to/project", include_patterns=["**/*.py"])
index = indexer.index()
query_funcs = create_project_query_functions(index)

# Use query functions
print(query_funcs["get_project_summary"]())
print(query_funcs["find_symbol"]("MyClass"))
print(query_funcs["get_change_impact"]("some_function"))

Development

pip install -e ".[dev,mcp]"
pytest tests/ -v
ruff check src/ tests/

References

The structural indexer was originally developed as part of the RMLPlus project, an implementation of the Recursive Language Models framework.

License

This project is dual-licensed:

AGPL-3.0 for open-source use — see LICENSE
Commercial License for proprietary use — see COMMERCIAL-LICENSE.md

If you're using mcp-codebase-index as a standalone MCP server for development, the AGPL-3.0 license applies at no cost. If you're embedding it in a proprietary product or offering it as part of a hosted service, you'll need a commercial license. See COMMERCIAL-LICENSE.md for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Feb 25, 2026

0.7.0

Feb 25, 2026

0.6.0

Feb 25, 2026

0.5.0

Feb 24, 2026

0.4.6

Feb 22, 2026

0.4.5

Feb 20, 2026

0.4.4

Feb 20, 2026

0.4.3

Feb 20, 2026

0.4.2

Feb 18, 2026

0.4.1

Feb 18, 2026

0.4.0

Feb 18, 2026

This version

0.3.0

Feb 18, 2026

0.2.2

Feb 17, 2026

0.2.1

Feb 17, 2026

0.2.0

Feb 17, 2026

0.1.3

Feb 16, 2026

0.1.2

Feb 16, 2026

0.1.1

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_codebase_index-0.3.0.tar.gz (75.3 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_codebase_index-0.3.0-py3-none-any.whl (62.4 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file mcp_codebase_index-0.3.0.tar.gz.

File metadata

Download URL: mcp_codebase_index-0.3.0.tar.gz
Upload date: Feb 18, 2026
Size: 75.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mcp_codebase_index-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`814b38914dc69cdba4bb7ce980e6febde3630fa587e933991e15ebc2842249e7`
MD5	`3d0fc1ed2894730d5e81e7c026486d20`
BLAKE2b-256	`fd71b0349bb51cdef705f961d7432d815bb627333361b5aedc3bb19f50a386ca`

See more details on using hashes here.

File details

Details for the file mcp_codebase_index-0.3.0-py3-none-any.whl.

File metadata

Download URL: mcp_codebase_index-0.3.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 62.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mcp_codebase_index-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8dd637e5b736cd8f4709011692eda6c5b472c70b2c92ed9348574eb90f8dce7e`
MD5	`9419eb4ac6e86a8db3465af0eedd6e0e`
BLAKE2b-256	`39f8902513bb9aa38b2345956f5f449c52d1aaf9582f56efc3f747ad7bc27e8d`

See more details on using hashes here.

mcp-codebase-index 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mcp-codebase-index

What It Does

Language Support

Installation

MCP Server

Running

Configuring with OpenClaw

Configuring with Claude Code

Tip: Encourage the AI to Use Indexed Tools

Available Tools (17)

Benchmarks

Index Build Performance

Query Response Size vs Total Source

Query Response Time

How Is This Different from LSP?

Programmatic Usage

Development

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes