Skip to main content

MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.

Project description

MCP Code Indexer ๐Ÿš€

PyPI version Python License

A production-ready Model Context Protocol (MCP) server that revolutionizes how AI agents navigate and understand codebases. Built for high-concurrency environments with advanced database resilience, the server provides instant access to intelligent descriptions, semantic search, and context-aware recommendations while maintaining 800+ writes/sec throughput.

๐ŸŽฏ What It Does

The MCP Code Indexer solves a critical problem for AI agents working with large codebases: understanding code structure without repeatedly scanning files. Instead of reading every file, agents can:

  • Query file purposes instantly with natural language descriptions
  • Search across codebases using full-text search
  • Get intelligent recommendations based on codebase size (overview vs search)
  • Generate condensed overviews for project understanding

Perfect for AI-powered code review, refactoring tools, documentation generation, and codebase analysis workflows.

โšก Quick Start

๐Ÿ‘จโ€๐Ÿ’ป For Developers

Get started integrating MCP Code Indexer into your AI agent workflow:

# Install with Poetry
poetry add mcp-code-indexer

# Or with pip
pip install mcp-code-indexer

# Start the MCP server
mcp-code-indexer

# Connect your MCP client and start using tools
# See API Reference for complete tool documentation

๐ŸŒ For Web Applications

Enable HTTP/REST API access for browser-based applications:

# Start HTTP server with authentication
mcp-code-indexer --http --auth-token "your-secret-token"

# Custom host and port
mcp-code-indexer --http --host 0.0.0.0 --port 8080

# CORS configuration for web apps
mcp-code-indexer --http --cors-origins "https://localhost:3000" "https://myapp.com"

๐Ÿ”— Complete HTTP API Reference โ†’

๐Ÿค– For AI-Powered Q&A

Ask questions about your codebase using natural language:

# Set OpenRouter API key for Claude access
export OPENROUTER_API_KEY="your-openrouter-api-key"

# Simple questions about project architecture
mcp-code-indexer --ask "What does this project do?" my-project

# Enhanced analysis with file search
mcp-code-indexer --deepask "How is authentication implemented?" web-app

# JSON output for programmatic use
mcp-code-indexer --ask "List the main components" my-project --json

๐Ÿค– Complete Q&A Interface Guide โ†’

๐Ÿ”ง For System Administrators

Deploy and configure the server for your team:

# Production deployment with custom settings
mcp-code-indexer \
  --token-limit 64000 \
  --db-path /data/mcp-index.db \
  --cache-dir /var/cache/mcp \
  --log-level INFO

# Check installation
mcp-code-indexer --version

๐ŸŽฏ For Everyone

New to MCP Code Indexer? Start here:

  1. Install: poetry add mcp-code-indexer (or pip install mcp-code-indexer)
  2. Run: mcp-code-indexer --token-limit 32000
  3. Connect: Use your favorite MCP client
  4. Explore: Try the check_codebase_size tool first

Development Setup:

# Clone and setup for contributing
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Install with Poetry (recommended)
poetry install

# Or install in development mode with pip
pip install -e .

# Run the server
mcp-code-indexer --token-limit 32000

๐Ÿ”— Git Hook Integration

๐Ÿš€ NEW Feature: Automated code indexing with AI-powered analysis! Keep your file descriptions synchronized automatically as your codebase evolves.

๐Ÿ‘ค For Users: Quick Setup

# Set your OpenRouter API key
export OPENROUTER_API_KEY="sk-or-v1-your-api-key-here"

# Test git hook functionality
mcp-code-indexer --githook

# Install post-commit hook
cp examples/git-hooks/post-commit .git/hooks/
chmod +x .git/hooks/post-commit

๐Ÿ‘จโ€๐Ÿ’ป For Developers: How It Works

The git hook integration provides intelligent automation:

  • ๐Ÿ“Š Git Analysis: Automatically analyzes git diffs after commits/merges
  • ๐Ÿค– AI Processing: Uses OpenRouter API with Anthropic's Claude Sonnet 4
  • โšก Smart Updates: Only processes files that actually changed
  • ๐Ÿ”„ Overview Maintenance: Updates project overview when structure changes
  • ๐Ÿ›ก๏ธ Error Isolation: Git operations continue even if indexing fails
  • โฑ๏ธ Rate Limiting: Built-in retry logic with exponential backoff

๐ŸŽฏ Key Benefits

๐Ÿ’ก Zero Manual Work: Descriptions stay current without any effort โšก Performance: Only analyzes changed files, not entire codebase ๐Ÿ”’ Reliability: Robust error handling ensures git operations never fail ๐ŸŽ›๏ธ Configurable: Support for custom models and timeout settings

Learn More: See Git Hook Setup Guide for complete configuration options and troubleshooting.

๐Ÿง  Vector Mode (BETA)

๐Ÿš€ NEW Feature: Semantic code search with vector embeddings! Experience AI-powered code discovery that understands context and meaning, not just keywords.

๐ŸŽฏ What is Vector Mode?

Vector Mode transforms how you search and understand codebases by using AI embeddings:

  • ๐Ÿ” Semantic Search: Find code by meaning, not just text matching
  • โšก Real-time Indexing: Automatic embedding generation as code changes
  • ๐Ÿ›ก๏ธ Secure by Default: Comprehensive secret redaction before API calls
  • ๐ŸŒ Multi-language: Python, JavaScript, TypeScript with AST-based chunking
  • ๐Ÿ“Š Smart Chunking: Context-aware code segmentation for optimal embeddings

๐Ÿš€ Quick Start

# Install MCP Code Indexer (includes vector mode)
pip install mcp-code-indexer

# Set required API keys
export VOYAGE_API_KEY="pa-your-voyage-api-key"
export TURBOPUFFER_API_KEY="your-turbopuffer-api-key"

# Optional: Configure region (default: gcp-europe-west3)
export TURBOPUFFER_REGION="gcp-europe-west3" 

# Start with vector mode enabled
mcp-code-indexer --vector

# The daemon automatically starts and begins indexing your projects

๐Ÿ’ก Key Features

  • ๐Ÿ” Secret Redaction: 20+ pattern types automatically detected and redacted
  • ๐ŸŒณ Merkle Trees: Efficient change detection without full directory scans
  • ๐ŸŽ›๏ธ Circuit Breakers: Resilient API integration with automatic retry logic
  • ๐Ÿ“ˆ Production Ready: Built for high-concurrency with comprehensive monitoring

๐Ÿ”ง Advanced Configuration

# Custom configuration
mcp-code-indexer --vector --vector-config /path/to/config.yaml

# HTTP mode with vector search
mcp-code-indexer --vector --http --port 8080

๐Ÿ› ๏ธ Architecture

Vector Mode adds powerful new MCP tools:

  • vector_search - Semantic code search across projects
  • similarity_search - Find similar code patterns
  • dependency_search - Discover code relationships
  • vector_status - Monitor indexing progress

Status: Currently in BETA - foundations implemented, full pipeline in development.

๐Ÿ”ง Development Setup

๐Ÿ‘จโ€๐Ÿ’ป For Contributors

Contributing to MCP Code Indexer? Follow these steps for a proper development environment:

# Setup development environment
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Install with Poetry (recommended)
poetry install

# Or use pip with virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .[dev]

# Verify installation
python main.py --help
mcp-code-indexer --version

โš ๏ธ Important: The editable install (pip install -e .) is required for development. The project uses proper PyPI package structure with absolute imports like from mcp_code_indexer.database.database import DatabaseManager. Without editable installation, you'll get ModuleNotFoundError exceptions.

๐ŸŽฏ Development Workflow

# Activate virtual environment
source venv/bin/activate

# Run the server directly
python main.py --token-limit 32000

# Or use the installed CLI command
mcp-code-indexer --token-limit 32000

# Run tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

๐Ÿ› ๏ธ MCP Tools Available

The server provides 11 powerful MCP tools for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.

๐ŸŽฏ Essential Tools (Start Here)

Tool Purpose When to Use
check_codebase_size Get navigation recommendations First tool to call for any project
search_descriptions Find files by functionality When you need specific files
get_codebase_overview Project architectural summary Understanding system design

๐Ÿ”ง Core Operations

Tool Purpose Best For
get_file_description Retrieve file summaries Quick file understanding
update_file_description Store detailed file analysis AI agents updating descriptions
find_missing_descriptions Scan for undocumented files Maintenance and coverage

๐Ÿ” Advanced Features

Tool Purpose Use Case
get_all_descriptions Complete project structure Small-to-medium codebases
get_word_frequency Technical vocabulary analysis Domain understanding
update_codebase_overview Create project documentation Architecture documentation
search_codebase_overview Search in project overviews Finding specific topics

๐Ÿฅ System Health

Tool Purpose For
check_database_health Real-time performance monitoring Production deployments

๐Ÿ’ก Pro Tip: Always start with check_codebase_size to get personalized recommendations for navigating your specific codebase.

๐Ÿ“– Complete API Documentation: View all 11 tools with examples โ†’

๐Ÿ”— Git Hook Integration

Keep your codebase documentation automatically synchronized with automated analysis on every commit:

# Analyze current staged changes
mcp-code-indexer --githook

# Analyze a specific commit
mcp-code-indexer --githook abc123def

# Analyze using HEAD syntax
mcp-code-indexer --githook HEAD
mcp-code-indexer --githook HEAD~1
mcp-code-indexer --githook HEAD~3

# Analyze a commit range (perfect for rebases)
mcp-code-indexer --githook abc123 def456
mcp-code-indexer --githook HEAD~5 HEAD

๐ŸŽฏ Perfect for:

  • Automated documentation that never goes stale
  • Rebase-aware analysis that handles complex git operations
  • Zero-effort maintenance with background processing

See the Git Hook Setup Guide for complete installation instructions including post-commit, post-merge, and post-rewrite hooks.

๐Ÿ—๏ธ Architecture Highlights

๐Ÿš€ Performance Optimized

  • SQLite with WAL mode for high-concurrency access (800+ writes/sec)
  • Smart connection pooling with optimized pool size (3 connections default)
  • FTS5 full-text search with prefix indexing for sub-100ms queries
  • Token-aware caching to minimize expensive operations
  • Write operation serialization to eliminate database lock conflicts

๐Ÿ›ก๏ธ Production Ready

  • Database resilience features with <2% error rate under high load
  • Exponential backoff retry logic with intelligent failure recovery
  • Comprehensive health monitoring with automatic pool refresh
  • Structured JSON logging with performance metrics tracking
  • Async-first design with proper resource cleanup
  • MCP protocol compliant with clean stdio streams
  • Upstream inheritance for fork workflows
  • Git integration with .gitignore support

๐Ÿ‘จโ€๐Ÿ’ป Developer Friendly

  • 95%+ test coverage with async support and concurrent access tests
  • Integration tests for complete workflows including database stress testing
  • Performance benchmarks for large codebases with resilience validation
  • Clear error messages with MCP protocol compliance
  • Comprehensive configuration options for production tuning

๐Ÿ“– Documentation

Comprehensive documentation organized by user journey and expertise level.

๐Ÿš€ Getting Started (New Users)

Guide Purpose Time Investment
Quick Start Install and run your first server 2 minutes
API Reference Master all 11 MCP tools 15 minutes
HTTP API Reference REST API for web applications 10 minutes
Q&A Interface AI-powered codebase analysis 8 minutes
Git Hook Setup Automate your workflow 5 minutes

๐Ÿ—๏ธ Production Deployment (Teams & Admins)

Guide Focus Best For
CLI Reference Complete command documentation All users
Administrative Commands Project & database management System administrators
Configuration Guide Production setup & tuning System administrators
Performance Tuning High-concurrency optimization DevOps teams
Monitoring & Diagnostics Production monitoring Operations teams

๐Ÿ”ง Advanced Topics (Power Users)

Guide Depth For
Architecture Overview System design deep dive Developers & architects
Database Resilience Advanced error handling Senior developers
Contributing Guide Development workflow Contributors

๐Ÿ“‹ Quick References

๐Ÿ“š Reading Paths:

  • New to MCP Code Indexer? Quick Start โ†’ API Reference โ†’ HTTP API โ†’ Q&A Interface
  • Web developers? Quick Start โ†’ HTTP API Reference โ†’ Q&A Interface โ†’ Git Hooks
  • AI/ML engineers? Quick Start โ†’ Q&A Interface โ†’ API Reference โ†’ Git Hooks
  • Setting up for a team? CLI Reference โ†’ Configuration โ†’ Administrative Commands โ†’ Monitoring
  • Contributing to the project? Architecture โ†’ Contributing โ†’ API Reference

๐Ÿšฆ System Requirements

  • Python 3.8+ with asyncio support
  • SQLite 3.35+ (included with Python)
  • 4GB+ RAM for large codebases (1000+ files)
  • SSD storage recommended for optimal performance

๐Ÿ“Š Performance

Tested with codebases up to 10,000 files:

  • File description retrieval: < 10ms
  • Full-text search: < 100ms
  • Codebase overview generation: < 2s
  • Merge conflict detection: < 5s

๐Ÿ”ง Advanced Configuration

๐Ÿ‘จโ€๐Ÿ’ป For Developers: Basic Configuration

# Production setup with custom limits
mcp-code-indexer \
  --token-limit 50000 \
  --db-path /data/mcp-index.db \
  --cache-dir /tmp/mcp-cache \
  --log-level INFO

# Enable structured logging
export MCP_LOG_FORMAT=json
mcp-code-indexer

๐Ÿ”ง For System Administrators: Database Resilience Tuning

Configure advanced database resilience features for high-concurrency environments:

# High-performance production deployment
mcp-code-indexer \
  --token-limit 64000 \
  --db-path /data/mcp-index.db \
  --cache-dir /var/cache/mcp \
  --log-level INFO \
  --db-pool-size 5 \
  --db-retry-count 7 \
  --db-timeout 15.0 \
  --enable-wal-mode \
  --health-check-interval 20.0

# Environment variable configuration
export DB_POOL_SIZE=5
export DB_RETRY_COUNT=7
export DB_TIMEOUT=15.0
export DB_WAL_MODE=true
export DB_HEALTH_CHECK_INTERVAL=20.0
mcp-code-indexer --token-limit 64000

Configuration Options

Parameter Default Description Use Case
--db-pool-size 3 Database connection pool size Higher for more concurrent clients
--db-retry-count 5 Max retry attempts for failed operations Increase for unstable environments
--db-timeout 10.0 Transaction timeout (seconds) Increase for large operations
--enable-wal-mode true Enable WAL mode for concurrency Always enable for production
--health-check-interval 30.0 Health monitoring interval (seconds) Lower for faster issue detection

๐Ÿ’ก Performance Tip: For environments with 10+ concurrent clients, use --db-pool-size 5 and --health-check-interval 15.0 for optimal throughput.

๐Ÿค Integration Examples

With AI Agents

# Example: AI agent using MCP tools
async def analyze_codebase(project_path):
    # Check if codebase is large
    size_info = await mcp_client.call_tool("check_codebase_size", {
        "projectName": "my-project",
        "folderPath": project_path
    })

    if size_info["isLarge"]:
        # Use search for large codebases
        results = await mcp_client.call_tool("search_descriptions", {
            "projectName": "my-project",
            "folderPath": project_path,
            "query": "authentication logic"
        })
    else:
        # Get full overview for smaller projects
        overview = await mcp_client.call_tool("get_codebase_overview", {
            "projectName": "my-project",
            "folderPath": project_path
        })

With CI/CD Pipelines

# Example: GitHub Actions integration
- name: Update Code Descriptions
  run: |
    python -c "
    import asyncio
    from mcp_client import MCPClient

    async def update_descriptions():
        client = MCPClient('mcp-code-indexer')

        # Find files without descriptions
        missing = await client.call_tool('find_missing_descriptions', {
            'projectName': '${{ github.repository }}',
            'folderPath': '.'
        })

        # Process with AI and update...

    asyncio.run(update_descriptions())
    "

๐Ÿงช Testing

# Install with test dependencies using Poetry
poetry install --with test

# Or with pip
pip install mcp-code-indexer[test]

# Run full test suite
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Run performance tests
python -m pytest tests/ -m performance

# Run integration tests only
python -m pytest tests/integration/ -v

๐Ÿ“ˆ Monitoring

The server provides structured JSON logs for monitoring:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "message": "Tool search_descriptions completed",
  "tool_usage": {
    "tool_name": "search_descriptions",
    "success": true,
    "duration_seconds": 0.045,
    "result_size": 1247
  }
}

๐Ÿ“‹ Command Line Options

Server Mode (Default)

mcp-code-indexer [OPTIONS]

Options:
  --token-limit INT     Maximum tokens before recommending search (default: 32000)
  --db-path PATH        SQLite database path (default: ~/.mcp-code-index/tracker.db)
  --cache-dir PATH      Cache directory path (default: ~/.mcp-code-index/cache)
  --log-level LEVEL     Logging level: DEBUG|INFO|WARNING|ERROR|CRITICAL (default: INFO)

Git Hook Mode

mcp-code-indexer --githook [OPTIONS]

# Automated analysis of git changes using OpenRouter API
# Requires: OPENROUTER_API_KEY environment variable

HTTP Server Mode

# Start HTTP/REST API server
mcp-code-indexer --http [OPTIONS]

# HTTP server with authentication
mcp-code-indexer --http --auth-token "your-secret-token"

# Custom host and port configuration
mcp-code-indexer --http --host 0.0.0.0 --port 8080

Q&A Commands

# Simple AI-powered questions (requires OPENROUTER_API_KEY)
mcp-code-indexer --ask "What does this project do?" PROJECT_NAME

# Enhanced analysis with file search
mcp-code-indexer --deepask "How is authentication implemented?" PROJECT_NAME

# JSON output for programmatic use
mcp-code-indexer --ask "Question" PROJECT_NAME --json

Administrative Commands

# List all projects
mcp-code-indexer --getprojects

# Execute MCP tool directly
mcp-code-indexer --runcommand '{"method": "tools/call", "params": {...}}'

# Export descriptions for a project
mcp-code-indexer --dumpdescriptions PROJECT_ID

# Create local database for a project
mcp-code-indexer --makelocal /path/to/project

# Generate project documentation map
mcp-code-indexer --map PROJECT_NAME

๐Ÿ›ก๏ธ Security Features

  • Input validation on all MCP tool parameters
  • SQL injection protection via parameterized queries
  • File system sandboxing with .gitignore respect
  • Error sanitization to prevent information leakage
  • Async resource cleanup to prevent memory leaks

๐Ÿšจ Quick Troubleshooting

Common issues and instant solutions:

Issue Quick Fix Learn More
"No module named 'mcp_code_indexer'" pip install -e . (for development) Contributing Guide
"OPENROUTER_API_KEY not found" export OPENROUTER_API_KEY="your-key" Git Hook Setup
"Database is locked" Enable WAL mode: --enable-wal-mode CLI Reference
"Large codebase - use search" Normal for 200+ files. Use search_descriptions API Reference
HTTP authentication failed Check --auth-token configuration HTTP API Reference
Q&A commands not working Set OPENROUTER_API_KEY environment variable Q&A Interface
High memory usage Reduce token limit: --token-limit 10000 Configuration Guide

๐Ÿ’ก Not finding your issue? Check the complete troubleshooting guides in our documentation.

๐Ÿš€ Next Steps

Ready to supercharge your AI agents with intelligent codebase navigation?

๐ŸŽฏ Choose Your Path

๐Ÿ†• New to MCP Code Indexer?

  1. Install and run your first server - Get up and running in 2 minutes
  2. Master the API tools - Learn all 11 tools with examples
  3. Try HTTP API access - REST API for web applications
  4. Explore AI-powered Q&A - Ask questions about your code
  5. Set up git hooks - Automate your workflow

๐Ÿ‘ฅ Setting up for a team?

  1. Learn all CLI commands - Complete command reference
  2. Configure for production - Production deployment guide
  3. Set up administrative workflows - Project & database management
  4. Performance optimization - High-concurrency setup
  5. Monitoring & alerts - Production monitoring

๐Ÿ”ง Want to contribute?

  1. Understand the architecture - Technical deep dive
  2. Development setup - Contribution workflow
  3. Report issues - Share feedback and suggestions

๐Ÿ“š Learning Resources:

๐Ÿค Contributing

We welcome contributions! See our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Built With


Transform how your AI agents understand code! ๐Ÿš€

๐ŸŽฏ New User? Get started in 2 minutes ๐Ÿ‘จโ€๐Ÿ’ป Developer? Explore the complete API ๐Ÿ”ง Production? Deploy with confidence

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_code_indexer-4.2.7.tar.gz (928.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_code_indexer-4.2.7-py3-none-any.whl (951.1 kB view details)

Uploaded Python 3

File details

Details for the file mcp_code_indexer-4.2.7.tar.gz.

File metadata

  • Download URL: mcp_code_indexer-4.2.7.tar.gz
  • Upload date:
  • Size: 928.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_code_indexer-4.2.7.tar.gz
Algorithm Hash digest
SHA256 998c00ec2ea59ca0ca21084fca2a45ae3eac6062bcb167551252788ef9aeaad5
MD5 a319046b9ab06bffee75945cb96bfd65
BLAKE2b-256 cde564d29600d8db06437e6516039d3f67d8eefe5db99880a87f52e9073eb0df

See more details on using hashes here.

File details

Details for the file mcp_code_indexer-4.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_code_indexer-4.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b0397016cd85c80138fa12ca2694a754c76ede184d13db1407485e95cb7f7387
MD5 2c0fb861f76f115bd5698475b90567e9
BLAKE2b-256 7c28a90f002373a796c8680cbdd5dc1ca1fa5466fd789be5533196b2287badd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page