Skip to main content

MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.

Project description

MCP Code Indexer 🚀

PyPI version Python License

A production-ready Model Context Protocol (MCP) server that provides intelligent codebase navigation for AI agents through searchable file descriptions, token-aware overviews, and advanced merge capabilities.

🎯 What It Does

The MCP Code Indexer solves a critical problem for AI agents working with large codebases: understanding code structure without repeatedly scanning files. Instead of reading every file, agents can:

  • Query file purposes instantly with natural language descriptions
  • Search across codebases using full-text search
  • Get intelligent recommendations based on codebase size (overview vs search)
  • Merge branch descriptions with conflict resolution
  • Inherit descriptions from upstream repositories automatically

Perfect for AI-powered code review, refactoring tools, documentation generation, and codebase analysis workflows.

⚡ Quick Start

Install from PyPI

# Install the package
pip install mcp-code-indexer

# Run the server
mcp-code-indexer --token-limit 32000

# Check version
mcp-code-indexer --version

Install from Source

# Clone and setup
git clone https://github.com/your-username/mcp-code-indexer.git
cd mcp-code-indexer

# Install in development mode
pip install -e .

# Run the server
mcp-code-indexer --token-limit 32000

🔧 Development Setup

For development work, you must install the package in editable mode to ensure proper import resolution:

# Setup development environment
git clone https://github.com/your-username/mcp-code-indexer.git
cd mcp-code-indexer

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package in editable mode (REQUIRED for development)
pip install -e .

# Install development dependencies
pip install -e .[dev]

# Verify installation
python main.py --help
mcp-code-indexer --version

Why Editable Install is Required

The project uses a proper PyPI package structure with absolute imports like from mcp_code_indexer.database.database import DatabaseManager. Without the editable installation (pip install -e .), Python cannot resolve these imports and you'll get ModuleNotFoundError exceptions.

Development Workflow

# Activate virtual environment
source venv/bin/activate

# Run the server directly
python main.py --token-limit 32000

# Or use the installed CLI command
mcp-code-indexer --token-limit 32000

# Run tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

🛠️ MCP Tools Available

The server provides 8 powerful MCP tools for intelligent codebase management:

Core Operations

  • get_file_description - Retrieve stored file descriptions instantly
  • update_file_description - Store detailed file summaries and metadata
  • check_codebase_size - Get token count and size-based recommendations

Batch Operations

  • find_missing_descriptions - Scan projects for files without descriptions
  • update_missing_descriptions - Bulk update multiple file descriptions

Search & Discovery

  • search_descriptions - Fast full-text search across all descriptions
  • get_codebase_overview - Complete hierarchical project structure

Advanced Features

  • merge_branch_descriptions - Two-phase merge with conflict resolution

🏗️ Architecture Highlights

Performance Optimized

  • SQLite with WAL mode for high-concurrency access
  • Connection pooling for efficient database operations
  • FTS5 full-text search with prefix indexing
  • Token-aware caching to minimize expensive operations

Production Ready

  • Comprehensive error handling with structured JSON logging
  • Async-first design with proper resource cleanup
  • MCP protocol compliant with clean stdio streams
  • Upstream inheritance for fork workflows
  • Git integration with .gitignore support

Developer Friendly

  • 95%+ test coverage with async support
  • Integration tests for complete workflows
  • Performance benchmarks for large codebases
  • Clear error messages with MCP protocol compliance

📖 Documentation

🚦 System Requirements

  • Python 3.8+ with asyncio support
  • SQLite 3.35+ (included with Python)
  • 4GB+ RAM for large codebases (1000+ files)
  • SSD storage recommended for optimal performance

📊 Performance

Tested with codebases up to 10,000 files:

  • File description retrieval: < 10ms
  • Full-text search: < 100ms
  • Codebase overview generation: < 2s
  • Merge conflict detection: < 5s

🔧 Advanced Configuration

# Production setup with custom limits
mcp-code-indexer \
  --token-limit 50000 \
  --db-path /data/mcp-index.db \
  --cache-dir /tmp/mcp-cache \
  --log-level INFO

# Enable structured logging
export MCP_LOG_FORMAT=json
mcp-code-indexer

🤝 Integration Examples

With AI Agents

# Example: AI agent using MCP tools
async def analyze_codebase(project_path):
    # Check if codebase is large
    size_info = await mcp_client.call_tool("check_codebase_size", {
        "projectName": "my-project",
        "folderPath": project_path,
        "branch": "main"
    })
    
    if size_info["isLarge"]:
        # Use search for large codebases
        results = await mcp_client.call_tool("search_descriptions", {
            "projectName": "my-project", 
            "folderPath": project_path,
            "branch": "main",
            "query": "authentication logic"
        })
    else:
        # Get full overview for smaller projects
        overview = await mcp_client.call_tool("get_codebase_overview", {
            "projectName": "my-project",
            "folderPath": project_path, 
            "branch": "main"
        })

With CI/CD Pipelines

# Example: GitHub Actions integration
- name: Update Code Descriptions
  run: |
    python -c "
    import asyncio
    from mcp_client import MCPClient
    
    async def update_descriptions():
        client = MCPClient('mcp-code-indexer')
        
        # Find files without descriptions
        missing = await client.call_tool('find_missing_descriptions', {
            'projectName': '${{ github.repository }}',
            'folderPath': '.',
            'branch': '${{ github.ref_name }}'
        })
        
        # Process with AI and update...
    
    asyncio.run(update_descriptions())
    "

🧪 Testing

# Install with test dependencies
pip install mcp-code-indexer[test]

# Run full test suite
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Run performance tests
python -m pytest tests/ -m performance

# Run integration tests only
python -m pytest tests/integration/ -v

📈 Monitoring

The server provides structured JSON logs for monitoring:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "message": "Tool search_descriptions completed",
  "tool_usage": {
    "tool_name": "search_descriptions",
    "success": true,
    "duration_seconds": 0.045,
    "result_size": 1247
  }
}

🛡️ Security Features

  • Input validation on all MCP tool parameters
  • SQL injection protection via parameterized queries
  • File system sandboxing with .gitignore respect
  • Error sanitization to prevent information leakage
  • Async resource cleanup to prevent memory leaks

🚀 Next Steps

  1. Read the API docs to understand available tools
  2. Check the configuration guide for advanced setup
  3. Review the architecture for technical details
  4. Contribute to help improve the project

🤝 Contributing

We welcome contributions! See our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

📄 License

MIT License - see LICENSE for details.

🙏 Built With


Ready to supercharge your AI agents with intelligent codebase navigation? 🚀 Install from PyPI or explore the API docs!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_code_indexer-1.1.0.tar.gz (840.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_code_indexer-1.1.0-py3-none-any.whl (817.1 kB view details)

Uploaded Python 3

File details

Details for the file mcp_code_indexer-1.1.0.tar.gz.

File metadata

  • Download URL: mcp_code_indexer-1.1.0.tar.gz
  • Upload date:
  • Size: 840.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_code_indexer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 40b698b93b0ac4e6bbf2a65271eefde10503979145ba122e4154d2d5f1ff24c3
MD5 fecdce55867cada4d08b9e09794096ce
BLAKE2b-256 454da8818008553249b572230c93b3b36e84c003c2047a034fda2da28409ddd4

See more details on using hashes here.

File details

Details for the file mcp_code_indexer-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_code_indexer-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 988ce86fac066a90869ac8531f67f549c0c5723befbe4d441049394460499c12
MD5 500c37c109a610ecd2af22cb7788af58
BLAKE2b-256 e27c060c0b69bed0a526a4ebfad2aef444f08daaa47aee3cc12d08fd3f7799a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page