Skip to main content

MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.

Project description

MCP Code Indexer ๐Ÿš€

PyPI version Python License

A production-ready Model Context Protocol (MCP) server that revolutionizes how AI agents navigate and understand codebases. Instead of repeatedly scanning files, agents get instant access to intelligent descriptions, semantic search, and context-aware recommendations.

๐ŸŽฏ What It Does

The MCP Code Indexer solves a critical problem for AI agents working with large codebases: understanding code structure without repeatedly scanning files. Instead of reading every file, agents can:

  • Query file purposes instantly with natural language descriptions
  • Search across codebases using full-text search
  • Get intelligent recommendations based on codebase size (overview vs search)
  • Merge branch descriptions with conflict resolution
  • Inherit descriptions from upstream repositories automatically

Perfect for AI-powered code review, refactoring tools, documentation generation, and codebase analysis workflows.

โšก Quick Start

๐Ÿ‘จโ€๐Ÿ’ป For Developers

Get started integrating MCP Code Indexer into your AI agent workflow:

# Install the package
pip install mcp-code-indexer

# Start the MCP server
mcp-code-indexer

# Connect your MCP client and start using tools
# See API Reference for complete tool documentation

๐Ÿ”ง For System Administrators

Deploy and configure the server for your team:

# Production deployment with custom settings
mcp-code-indexer \
  --token-limit 64000 \
  --db-path /data/mcp-index.db \
  --cache-dir /var/cache/mcp \
  --log-level INFO

# Check installation
mcp-code-indexer --version

๐ŸŽฏ For Everyone

New to MCP Code Indexer? Start here:

  1. Install: pip install mcp-code-indexer
  2. Run: mcp-code-indexer --token-limit 32000
  3. Connect: Use your favorite MCP client
  4. Explore: Try the check_codebase_size tool first

Development Setup:

# Clone and setup for contributing
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Install in development mode (required)
pip install -e .

# Run the server
mcp-code-indexer --token-limit 32000

๐Ÿ”— Git Hook Integration

๐Ÿš€ NEW Feature: Automated code indexing with AI-powered analysis! Keep your file descriptions synchronized automatically as your codebase evolves.

๐Ÿ‘ค For Users: Quick Setup

# Set your OpenRouter API key
export OPENROUTER_API_KEY="sk-or-v1-your-api-key-here"

# Test git hook functionality
mcp-code-indexer --githook

# Install post-commit hook
cp examples/git-hooks/post-commit .git/hooks/
chmod +x .git/hooks/post-commit

๐Ÿ‘จโ€๐Ÿ’ป For Developers: How It Works

The git hook integration provides intelligent automation:

  • ๐Ÿ“Š Git Analysis: Automatically analyzes git diffs after commits/merges
  • ๐Ÿค– AI Processing: Uses OpenRouter API with Anthropic's Claude Sonnet 4
  • โšก Smart Updates: Only processes files that actually changed
  • ๐Ÿ”„ Overview Maintenance: Updates project overview when structure changes
  • ๐Ÿ›ก๏ธ Error Isolation: Git operations continue even if indexing fails
  • โฑ๏ธ Rate Limiting: Built-in retry logic with exponential backoff

๐ŸŽฏ Key Benefits

๐Ÿ’ก Zero Manual Work: Descriptions stay current without any effort
โšก Performance: Only analyzes changed files, not entire codebase
๐Ÿ”’ Reliability: Robust error handling ensures git operations never fail
๐ŸŽ›๏ธ Configurable: Support for custom models and timeout settings

Learn More: See Git Hook Setup Guide for complete configuration options and troubleshooting.

๐Ÿ”ง Development Setup

๐Ÿ‘จโ€๐Ÿ’ป For Contributors

Contributing to MCP Code Indexer? Follow these steps for a proper development environment:

# Setup development environment
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package in editable mode (REQUIRED for development)
pip install -e .

# Install development dependencies
pip install -e .[dev]

# Verify installation
python main.py --help
mcp-code-indexer --version

โš ๏ธ Important: The editable install (pip install -e .) is required for development. The project uses proper PyPI package structure with absolute imports like from mcp_code_indexer.database.database import DatabaseManager. Without editable installation, you'll get ModuleNotFoundError exceptions.

๐ŸŽฏ Development Workflow

# Activate virtual environment
source venv/bin/activate

# Run the server directly
python main.py --token-limit 32000

# Or use the installed CLI command
mcp-code-indexer --token-limit 32000

# Run tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

๐Ÿ› ๏ธ MCP Tools Available

The server provides 11 powerful MCP tools for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.

๐ŸŽฏ For Everyone: Start Here

  • check_codebase_size - Get instant recommendations for how to navigate your codebase
  • search_descriptions - Find files by what they do, not just their names
  • get_codebase_overview - Get a high-level understanding of any project

๐Ÿ‘จโ€๐Ÿ’ป For Developers: Core Operations

  • get_file_description - Retrieve stored file descriptions instantly
  • update_file_description - Store detailed file summaries and metadata
  • find_missing_descriptions - Scan projects for files without descriptions
  • update_missing_descriptions - Bulk update multiple file descriptions

๐Ÿ” For Advanced Users: Search & Discovery

  • get_all_descriptions - Complete hierarchical project structure
  • get_word_frequency - Technical vocabulary analysis with stop-word filtering
  • merge_branch_descriptions - Two-phase merge with conflict resolution
  • update_codebase_overview - Create comprehensive codebase documentation

๐Ÿ’ก Pro Tip: Always start with check_codebase_size to get personalized recommendations for navigating your specific codebase.

๐Ÿ”— Git Hook Integration

Keep your codebase documentation automatically synchronized with automated analysis on every commit, rebase, or merge:

# Analyze current staged changes
mcp-code-indexer --githook

# Analyze a specific commit
mcp-code-indexer --githook abc123def

# Analyze a commit range (perfect for rebases)
mcp-code-indexer --githook abc123 def456

๐ŸŽฏ Perfect for:

  • Automated documentation that never goes stale
  • Rebase-aware analysis that handles complex git operations
  • Zero-effort maintenance with background processing

See the Git Hook Setup Guide for complete installation instructions including post-commit, post-merge, and post-rewrite hooks.

๐Ÿ—๏ธ Architecture Highlights

Performance Optimized

  • SQLite with WAL mode for high-concurrency access
  • Connection pooling for efficient database operations
  • FTS5 full-text search with prefix indexing
  • Token-aware caching to minimize expensive operations

Production Ready

  • Comprehensive error handling with structured JSON logging
  • Async-first design with proper resource cleanup
  • MCP protocol compliant with clean stdio streams
  • Upstream inheritance for fork workflows
  • Git integration with .gitignore support

Developer Friendly

  • 95%+ test coverage with async support
  • Integration tests for complete workflows
  • Performance benchmarks for large codebases
  • Clear error messages with MCP protocol compliance

๐Ÿ“– Documentation

๐Ÿ‘ค For Users

๐Ÿ‘จโ€๐Ÿ’ป For Developers

๐Ÿค For Contributors

๐Ÿšฆ System Requirements

  • Python 3.8+ with asyncio support
  • SQLite 3.35+ (included with Python)
  • 4GB+ RAM for large codebases (1000+ files)
  • SSD storage recommended for optimal performance

๐Ÿ“Š Performance

Tested with codebases up to 10,000 files:

  • File description retrieval: < 10ms
  • Full-text search: < 100ms
  • Codebase overview generation: < 2s
  • Merge conflict detection: < 5s

๐Ÿ”ง Advanced Configuration

# Production setup with custom limits
mcp-code-indexer \
  --token-limit 50000 \
  --db-path /data/mcp-index.db \
  --cache-dir /tmp/mcp-cache \
  --log-level INFO

# Enable structured logging
export MCP_LOG_FORMAT=json
mcp-code-indexer

๐Ÿค Integration Examples

With AI Agents

# Example: AI agent using MCP tools
async def analyze_codebase(project_path):
    # Check if codebase is large
    size_info = await mcp_client.call_tool("check_codebase_size", {
        "projectName": "my-project",
        "folderPath": project_path,
        "branch": "main"
    })
    
    if size_info["isLarge"]:
        # Use search for large codebases
        results = await mcp_client.call_tool("search_descriptions", {
            "projectName": "my-project", 
            "folderPath": project_path,
            "branch": "main",
            "query": "authentication logic"
        })
    else:
        # Get full overview for smaller projects
        overview = await mcp_client.call_tool("get_codebase_overview", {
            "projectName": "my-project",
            "folderPath": project_path, 
            "branch": "main"
        })

With CI/CD Pipelines

# Example: GitHub Actions integration
- name: Update Code Descriptions
  run: |
    python -c "
    import asyncio
    from mcp_client import MCPClient
    
    async def update_descriptions():
        client = MCPClient('mcp-code-indexer')
        
        # Find files without descriptions
        missing = await client.call_tool('find_missing_descriptions', {
            'projectName': '${{ github.repository }}',
            'folderPath': '.',
            'branch': '${{ github.ref_name }}'
        })
        
        # Process with AI and update...
    
    asyncio.run(update_descriptions())
    "

๐Ÿงช Testing

# Install with test dependencies
pip install mcp-code-indexer[test]

# Run full test suite
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Run performance tests
python -m pytest tests/ -m performance

# Run integration tests only
python -m pytest tests/integration/ -v

๐Ÿ“ˆ Monitoring

The server provides structured JSON logs for monitoring:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "message": "Tool search_descriptions completed",
  "tool_usage": {
    "tool_name": "search_descriptions",
    "success": true,
    "duration_seconds": 0.045,
    "result_size": 1247
  }
}

๐Ÿ“‹ Command Line Options

Server Mode (Default)

mcp-code-indexer [OPTIONS]

Options:
  --token-limit INT     Maximum tokens before recommending search (default: 32000)
  --db-path PATH        SQLite database path (default: ~/.mcp-code-index/tracker.db)
  --cache-dir PATH      Cache directory path (default: ~/.mcp-code-index/cache)
  --log-level LEVEL     Logging level: DEBUG|INFO|WARNING|ERROR|CRITICAL (default: INFO)

Git Hook Mode

mcp-code-indexer --githook [OPTIONS]

# Automated analysis of git changes using OpenRouter API
# Requires: OPENROUTER_API_KEY environment variable

Utility Commands

# List all projects and branches
mcp-code-indexer --getprojects

# Execute MCP tool directly
mcp-code-indexer --runcommand '{"method": "tools/call", "params": {...}}'

# Export descriptions for a project
mcp-code-indexer --dumpdescriptions PROJECT_ID [BRANCH]

๐Ÿ›ก๏ธ Security Features

  • Input validation on all MCP tool parameters
  • SQL injection protection via parameterized queries
  • File system sandboxing with .gitignore respect
  • Error sanitization to prevent information leakage
  • Async resource cleanup to prevent memory leaks

๐Ÿš€ Next Steps

Ready to supercharge your AI agents with intelligent codebase navigation?

๐Ÿ‘ค Getting Started

  1. Install and run your first server - Get up and running in 2 minutes
  2. Set up git hooks - Automate your workflow
  3. Configure for production - Deploy for your team

๐Ÿ‘จโ€๐Ÿ’ป For Developers

  1. Explore the API tools - Master all 11 MCP tools
  2. Understand the architecture - Deep dive into the technical design

๐Ÿค Join the Community

  1. Contribute to the project - Help make it even better
  2. Report issues on GitHub - Share feedback and suggestions

๐Ÿค Contributing

We welcome contributions! See our Contributing Guide for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Built With


Transform how your AI agents understand code! ๐Ÿš€

๐ŸŽฏ New User? Get started in 2 minutes
๐Ÿ‘จโ€๐Ÿ’ป Developer? Explore the complete API
๐Ÿ”ง Production? Deploy with confidence

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_code_indexer-1.6.4.tar.gz (872.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_code_indexer-1.6.4-py3-none-any.whl (843.0 kB view details)

Uploaded Python 3

File details

Details for the file mcp_code_indexer-1.6.4.tar.gz.

File metadata

  • Download URL: mcp_code_indexer-1.6.4.tar.gz
  • Upload date:
  • Size: 872.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for mcp_code_indexer-1.6.4.tar.gz
Algorithm Hash digest
SHA256 18266b3793f61615ce7cffbff6058cf5511c1301dd104e3658b7c449153aed1d
MD5 ee9c84231a41396e41c560925889c40a
BLAKE2b-256 859e678a32949b2adf05cf9d0e63ea2065b3ee432f79ab374234a63b40ba3eb8

See more details on using hashes here.

File details

Details for the file mcp_code_indexer-1.6.4-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_code_indexer-1.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 0c5575a8eae29972487af35eb430af10efd813e9da3e5323ee314fcb02bca9e9
MD5 1cb5a73fa887b5c6226d7773ddc78d7e
BLAKE2b-256 9c547c8d9d9444137da62be93ad6980a4ba621d765700c4a23d7aae059cf657e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page