Skip to main content

Code indexing and analysis tools for LLMs using MCP

Project description

Code Indexer (MCP) - Complete Edition

MCP Server Python License Performance Version Database

๐Ÿš€ Large Scale Code Analysis Platform with Hybrid Database Architecture

Advanced MCP server with PostgreSQL metadata storage, Elasticsearch search, and comprehensive version tracking

๐Ÿš€ Unified Edition - v4.0.0

This version introduces the Meta-Registry System (v2.1.0) and the Unified Core Engine, a next-generation search and indexing architecture:

๐Ÿ—‚๏ธ Meta-Registry System (v2.1.0)

  • โœ… Centralized Project Tracking - Single SQLite registry for all indexed projects
  • โœ… Automatic Registration - Projects auto-register during indexing
  • โœ… MessagePack Format - 3-5x faster serialization than pickle
  • โœ… Automatic Backups - Periodic backups every 24 hours with 7-day rotation
  • โœ… Corruption Recovery - Automatic recovery from backup or filesystem scan
  • โœ… Orphan Detection - Automatic detection of orphaned index directories
  • โœ… Concurrent Access - WAL mode for multi-reader/concurrent-writer support
  • โœ… Migration Support - Automatic migration from pickle to MessagePack

๐ŸŒŸ Unified Core Engine

  • โœ… Local Vector Search - FAISS-based semantic search with zero cloud dependencies, zero cost, and full privacy
  • โœ… Hybrid Intelligence - Intelligently routes queries between Local Vector (Semantic), Zoekt (Regex/Symbolic), and Elasticsearch
  • โœ… Semantic Search & Reranking - Deep understanding of code intent with automatic reranking for high precision
  • โœ… Web Search Integration - Optional capability to augment code search with web results
  • โœ… Standalone Power - Operates efficiently without external databases (PostgreSQL/Elasticsearch) for many use cases
  • โœ… Dual-Mode Architecture - Supports both "Core Engine" (Vector-first) and "Legacy" (ES/PG) backends simultaneously

๐Ÿ—๏ธ Architecture Transformation

  • โœ… Unified Router - Centralized dispatch for all search and indexing operations.
  • โœ… PostgreSQL Integration - Robust metadata storage with ACID compliance
  • โœ… Elasticsearch Integration - High-performance full-text search capabilities
  • โœ… Hybrid Database Architecture - Best of both worlds for different data types
  • โœ… Version Tracking System - Complete file change history with diffs
  • โœ… Real-time Indexing - RabbitMQ-based async processing
  • โœ… Cross-Platform Compatibility - Robust path handling for all environments

๐Ÿ“Š Migration Achievements

  • โœ… Zero Data Loss - Complete ETL migration from SQLite
  • โœ… Backward Compatibility - Dual-write/read strategy during transition
  • โœ… Performance Gains - 10x faster search with Elasticsearch & Core Engine
  • โœ… Scalability - Enterprise-grade database backends
  • โœ… Version Control - Full file history tracking with PostgreSQL
  • โœ… Real-time Updates - Instant search index updates

๐Ÿ› ๏ธ New Features

  1. Local Vector Store - FAISS + sentence-transformers for zero-cost semantic search (BAAI/bge-small-en-v1.5 default)
  2. Unified Core Engine - Semantic search, Reranking, Web Search
  3. File Version Tracking - Complete change history with diffs
  4. PostgreSQL Metadata - Structured data with relationships
  5. Elasticsearch Search - Advanced full-text search capabilities
  6. Real-time Indexing - RabbitMQ message queue processing
  7. ETL Migration Tools - Seamless data migration utilities
  8. Database Migrations - Alembic-based schema management
  9. Backup & Recovery - Comprehensive backup strategies
  10. Monitoring & Logging - Enterprise-grade observability

๐Ÿ“‹ Verified Functionality

  • โœ… Unified Search - Semantic + Regex + Web search capabilities
  • โœ… File Operations - Create, modify, delete with full tracking
  • โœ… Search Capabilities - Basic, advanced, and pattern-based search
  • โœ… Version History - Complete file change tracking and retrieval
  • โœ… Database Integration - PostgreSQL + Elasticsearch + Core Vector Store working seamlessly
  • โœ… Migration Tools - ETL scripts for data migration
  • โœ… Backup Systems - Automated backup and recovery procedures

See CHANGELOG.md for migration details and docs/TOOLS_LIST.md for complete tool documentation.


What is Code Indexer?

Code Indexer is a specialized MCP server that provides intelligent code indexing and analysis capabilities. It enables Large Language Models to interact with your code repositories, offering real-time insights and navigation through complex codebases.

This server integrates with the Model Context Protocol (MCP), a standardized way for AI models to interact with external tools and data sources.

Key Features

๐Ÿš€ Performance & Optimization

  • Incremental Indexing: Only processes changed files, reducing re-indexing time by 90%+
  • Parallel Processing: 4x faster indexing with multi-core support
  • Memory Optimization: 70% memory reduction with lazy loading and intelligent caching
  • High-Performance Search: 10x faster searches with enterprise-grade tools (Zoekt, ripgrep, ugrep)
  • Smart Filtering: Advanced gitignore integration and size-based filtering

๐Ÿ” Advanced Search & Analysis

  • Local Semantic Search: FAISS-based vector search with zero cloud dependencies (p50 < 20ms after model loaded)
  • Async Search: Non-blocking search operations with real-time progress tracking
  • Multi-Pattern Search: Concurrent search across multiple patterns with scoped results
  • Intelligent Caching: 90% faster repeated searches with LRU cache
  • Fuzzy Search: Native fuzzy matching with safety checks
  • Search Result Pagination: Efficient handling of large result sets

๐Ÿ› ๏ธ Complete Feature Set

  • Progress Tracking: Real-time progress events with cancellation support
  • Performance Monitoring: Comprehensive metrics and Prometheus export
  • Memory Profiling: Real-time memory usage monitoring and limits
  • Configurable Settings: YAML-based configuration with per-project overrides
  • Storage Backends: SQLite and trie-based optimized storage

๐Ÿ”ง Plug and Play Experience

  • MCP Tools: 9 consolidated mega-tools for code analysis and management (replacing 50+ individual tools)
  • Action-Based Routing: Each mega-tool supports multiple actions for streamlined operations
  • Background Cleanup: Automatic cache management and garbage collection
  • Error Recovery: Graceful handling of failures with automatic fallbacks
  • Extensible Architecture: Pluggable storage and search backends

Supported File Types

The server supports multiple programming languages and file extensions including:

  • Python (.py)
  • JavaScript/TypeScript (.js, .ts, .jsx, .tsx, .mjs, .cjs)
  • Frontend Frameworks (.vue, .svelte, .astro)
  • Java (.java)
  • C/C++ (.c, .cpp, .h, .hpp)
  • C# (.cs)
  • Go (.go)
  • Ruby (.rb)
  • PHP (.php)
  • Swift (.swift)
  • Kotlin (.kt)
  • Rust (.rs)
  • Scala (.scala)
  • Shell scripts (.sh, .bash)
  • Zig (.zig)
  • Web files (.html, .css, .scss, .less, .sass, .stylus, .styl)
  • Template engines (.hbs, .handlebars, .ejs, .pug)
  • Database & SQL:
    • SQL files (.sql, .ddl, .dml)
    • Database-specific (.mysql, .postgresql, .psql, .sqlite, .mssql, .oracle, .ora, .db2)
    • Database objects (.proc, .procedure, .func, .function, .view, .trigger, .index)
    • Migration & tools (.migration, .seed, .fixture, .schema, .liquibase, .flyway)
    • NoSQL & modern (.cql, .cypher, .sparql, .gql)
  • Documentation/Config (.md, .mdx, .json, .xml, .yml, .yaml)

๐Ÿ“ฆ Installation & Setup

๐Ÿš€ Quick Start

For immediate use with AI applications:

uvx git+https://github.com/scooter-lacroix/code-indexer.git

For package installation:

uv add code-index-mcp
# or
pip install code-index-mcp

For local vector search support (recommended):

# Install FAISS and sentence-transformers dependencies
uv pip install 'faiss-cpu>=1.7.4' 'sentence-transformers>=2.2.0' 'numpy>=1.24.0'

๐Ÿ—๏ธ Comprehensive Setup (PostgreSQL + Elasticsearch)

1. Start Required Services:

# Using Podman Compose (recommended for AMD/CPU-only systems)
podman compose up -d

# Or using Docker Compose (alternative)
docker compose up -d

# Stop any local Elasticsearch before starting containers
sudo systemctl stop elasticsearch 2>/dev/null || true

This will start:

  • PostgreSQL on port 5432
  • Elasticsearch on port 9200
  • RabbitMQ on ports 5672 (AMQP) and 15672 (Management UI)

2. Environment Configuration:

# Set environment variables
export DAL_BACKEND_TYPE=postgresql_elasticsearch_only

# Local Vector Store Configuration (optional, has defaults)
export LOCAL_VECTOR_MODEL=BAAI/bge-small-en-v1.5  # Options: BAAI/bge-small-en-v1.5, microsoft/codebert-base, all-MiniLM-L6-v2
export FAISS_INDEX_THRESHOLD=100000  # Switch to IVFFlat after this many vectors
export FAISS_INDEX_PATH=./faiss_index  # Where to store the index

# PostgreSQL Configuration
export POSTGRES_HOST=localhost
export POSTGRES_PORT=5432
export POSTGRES_USER=codeindex
export POSTGRES_PASSWORD=your-secure-password
export POSTGRES_DB=code_index_db

# Elasticsearch Configuration
export ELASTICSEARCH_HOSTS=http://localhost:9200

3. Migration (if upgrading from SQLite):

# Backup existing data
python backup_script.py

# Run migration
python src/scripts/etl_script.py --mode full

# Verify migration
python src/scripts/etl_script.py --mode verify

๐Ÿ“š Complete Installation Guide

For detailed installation instructions, database setup, and troubleshooting, see: ๐Ÿ“– Installation Guide

For Elasticsearch security configuration (authentication, TLS/SSL, API keys), see: ๐Ÿ”’ Elasticsearch Security Guide

For local vector store architecture, configuration, and supported models, see: ๐Ÿ” Local Vector Store Guide

For Meta-Registry architecture and implementation details, see: ๐Ÿ—‚๏ธ Architecture Documentation

For migrating from v2.0 to v2.1.0 with the Meta-Registry system, see: ๐Ÿ”„ Migration Guide v2.1

For troubleshooting registry and migration issues, see: ๐Ÿ”ง Registry Troubleshooting Guide

๐Ÿค– Agent Integration

The Unified Core Engine is designed for deep integration with AI agents.

๐Ÿ› ๏ธ CLI & Skills

We provide a dedicated CLI tool code-search and a Skill definition for agents to autonomously navigate your codebase.

1. CLI Tool:

# Search code (Semantic + Symbolic)
code-search "How does the indexing work?"

# Ask questions (RAG)
code-search --answer "Explain the core engine architecture"

2. Agent Configuration: Run the installation helper to get configuration snippets for your agent:

python src/scripts/install_agent.py [claude-code|openai|vscode|all]

3. Skill Integration: For agents that support Skills or System Prompts, refer to plugins/code_indexer/skills/code-search/SKILL.md for the optimal prompt to ensure the agent uses the Unified Engine correctly.

๐Ÿ”Œ MCP Integration

The Code Index MCP server supports multiple integration methods:

Method 1: Direct Git Integration (Recommended)

For LM Studio, Claude Desktop, VS Code, and other MCP-compatible applications:

{
  "mcpServers": {
    "code-index": {
      "command": "uvx",
      "args": ["git+https://github.com/scooter-lacroix/code-indexer.git"],
      "env": {},
      "start_on_launch": true
    }
  }
}

Method 2: Package Installation

For package manager installed versions:

{
  "mcpServers": {
    "code-index": {
      "command": "code-index-mcp",
      "args": [],
      "env": {},
      "start_on_launch": true
    }
  }
}

Method 3: Local Development

For locally installed versions:

{
  "mcpServers": {
    "code-index": {
      "command": "uv",
      "args": ["run", "code_index_mcp"],
      "env": {},
      "start_on_launch": true
    }
  }
}

Method 4: HTTP Server Mode

For web-based integrations:

# Start HTTP server
python -m code_index.server --port 8765

# Or using uv
uv run code_index.server --port 8765

Then configure your application:

{
  "mcpServers": {
    "code-index": {
      "transport": "http",
      "url": "http://localhost:8765/mcp",
      "headers": {
        "Authorization": "Bearer your-token-here"
      }
    }
  }
}

Application-Specific Configurations

VS Code/Cursor/Windsurf (using MCP extension):

{
  "mcp.servers": {
    "code-index": {
      "command": "code-index-mcp",
      "args": [],
      "env": {},
      "transport": "stdio"
    }
  }
}

Jan AI:

{
  "mcp_servers": {
    "code-index": {
      "command": "code-index-mcp",
      "args": [],
      "env": {}
    }
  }
}

OpenHands:

{
  "mcp": {
    "servers": {
      "code-index": {
        "command": "code-index-mcp",
        "args": [],
        "env": {}
      }
    }
  }
}

๐Ÿ› ๏ธ Usage

Command Line Interface

# Start stdio server (MCP integration)
code-index-mcp

# Start HTTP server (web integration)
code-index-mcp --http --port 8765

# Local development (runs the server)
uv run code_index_mcp server

Local Development with Docker Compose

For local development using PostgreSQL and Elasticsearch via Docker Compose:

# Start PostgreSQL and Elasticsearch containers in detached mode
python run.py start-dev-dbs

# Stop PostgreSQL and Elasticsearch containers and remove their volumes
python run.py stop-dev-dbs

# Restart PostgreSQL and Elasticsearch containers
python run.py restart-dev-dbs

# Run the Code Index MCP server (connects to Dockerized databases if configured)
python run.py server

Debug with MCP Inspector

# For git installation
npx @modelcontextprotocol/inspector uvx git+https://github.com/scooter-lacroix/code-indexer.git

# For package installation
npx @modelcontextprotocol/inspector code-index-mcp

# For local development
npx @modelcontextprotocol/inspector uv run code_index_mcp

๐Ÿ› ๏ธ Available Tools

The Code Index MCP server provides 9 consolidated mega-tools for comprehensive code analysis and management. These mega-tools use action-based routing to streamline operations while preserving all functionality from the original 50+ individual tools.

๐ŸŒŸ Mega-Tools Architecture

Each mega-tool accepts an action, operation, or type parameter that routes to the specific functionality you need:

Mega-Tool Actions/Operations/Modes Consolidates
manage_project set_path, refresh, reindex, clear, reset 5 tools
search_content search, find, rank 3 tools
modify_file write, diff, insert, replace 4 tools
manage_files delete, rename, revert, history 4 tools
get_diagnostics memory, index, backend, performance, operations, settings, ignore, filtering, ranking 9 tools
manage_memory cleanup, configure, export 3 tools
manage_operations list, cancel, cleanup 4 tools
read_file smart, chunks, detect_errors, metadata 4 tools
manage_temp create, check 2 tools

๐Ÿ“‹ Detailed Mega-Tools Reference

1. manage_project - Project Lifecycle Management

# Set project path
manage_project(action="set_path", path="/path/to/project")

# Refresh with incremental indexing
manage_project(action="refresh")

# Force complete reindex
manage_project(action="reindex", clear_cache=True)

# Clear all settings
manage_project(action="clear")

# Reset server state
manage_project(action="reset")

2. search_content - Content Search and Discovery

# Advanced semantic search
search_content(action="search", pattern="function foo", fuzzy=True)

# Find files by glob pattern
search_content(action="find", pattern="*.py")

# Rank search results by relevance
search_content(action="rank", results=[...], query="authentication")

3. modify_file - File Content Modifications

# Write complete file
modify_file(operation="write", path="src/main.py", content="...", line_count=10)

# Apply diff with regex
modify_file(operation="diff", path="config.json", search="old", replace="new")

# Insert content at specific line
modify_file(operation="insert", path="README.md", line=10, content="new section")

# Search and replace
modify_file(operation="replace", path="api.py", search="todo", replace="TODO")

4. manage_files - File System Operations

# Delete a file
manage_files(action="delete", file_path="old_file.py")

# Rename/move a file
manage_files(action="rename", file_path="src/old.py", new_file_path="src/new.py")

# Revert to specific version
manage_files(action="revert", file_path="config.json", version_id="v1.2.3")

# Get file history
manage_files(action="history", file_path="src/main.py")

5. get_diagnostics - System Diagnostics and Metrics

# Check memory usage
get_diagnostics(type="memory")

# Get index statistics
get_diagnostics(type="index", force_refresh=True)

# Check backend health
get_diagnostics(type="backend")

# Get performance metrics
get_diagnostics(type="performance")

# List active operations
get_diagnostics(type="operations")

# View ignore patterns
get_diagnostics(type="ignore")

# Check ranking configuration
get_diagnostics(type="ranking")

6. manage_memory - Memory Management

# Trigger memory cleanup
manage_memory(action="cleanup")

# Configure memory limits
manage_memory(action="configure", soft_limit_mb=1024, hard_limit_mb=2048)

# Export memory profile
manage_memory(action="export", file_path="/tmp/memory.json")

7. manage_operations - Operation Tracking and Control

# List all active operations
manage_operations(action="list")

# Cancel specific operation
manage_operations(action="cancel", operation_id="op-123")

# Cleanup old completed operations
manage_operations(action="cleanup", max_age_hours=2.0)

8. read_file - File Reading Strategies

# Full smart analysis
read_file(mode="smart", file_path="src/main.py", include_content=True)

# Read large file in chunks
read_file(mode="chunks", file_path="large.json", chunk_size=8388608)

# Detect errors in a file
read_file(mode="detect_errors", file_path="config.py")

# Get file metadata only
read_file(mode="metadata", file_path="README.md")

9. manage_temp - Temporary Directory Management

# Ensure temp directory exists
manage_temp(action="create")

# Check temp directory status
manage_temp(action="check")

๐Ÿ“š Migration from Old Tools

If you were using the previous 50+ individual tools, see the Migration Guide for a complete mapping of old tool names to the new mega-tools.

Quick Migration Examples:

  • set_project_path() -> manage_project(action="set_path", path=...)
  • search_code_advanced() -> search_content(action="search", pattern=...)
  • write_to_file() -> modify_file(operation="write", path=..., content=...)
  • get_file_history() -> manage_files(action="history", file_path=...)

๐Ÿ“– Complete Documentation: See docs/TOOLS_LIST.md for detailed tool reference with examples and system prompt templates.

Common Workflows and Examples

Here's a typical workflow for using Code Index MCP with an AI assistant like Claude.

1. Set Project Path & Initial Indexing

This is the first and most important step. When you set the project path, the server automatically creates a file index for the first time or loads a previously cached one.

Example Prompt:

Please set the project path to C:\Users\username\projects\my-react-app

Mega-tool used:

manage_project(action="set_path", path="C:\\Users\\username\\projects\\my-react-app")

2. Refresh the Index (When Needed)

If you make significant changes to your project files after the initial setup, you can manually refresh the index to ensure all tools are working with the latest information.

Example Prompt:

I've just added a few new components, please refresh the project index.

Mega-tool used:

manage_project(action="refresh")

3. Explore the Project Structure

Once the index is ready, you can find files using patterns (globs) to understand the codebase and locate relevant files.

Example Prompt:

Find all TypeScript component files in the 'src/components' directory.

Mega-tool used:

search_content(action="find", pattern="src/components/**/*.tsx")

4. Analyze a Specific File

Before diving into the full content of a file, you can get a quick summary of its structure, including functions, classes, and imports.

Example Prompt:

Can you give me a summary of the 'src/api/userService.ts' file?

Mega-tool used:

read_file(mode="smart", file_path="src/api/userService.ts", include_content=True)

5. Search for Code

With an up-to-date index, you can search for code snippets, function names, or any text pattern to find where specific logic is implemented.

Example: Simple Search

Search for all occurrences of the "processData" function.
search_content(action="search", pattern="function processData")

Example: Search with Fuzzy Matching

I'm looking for a function related to user authentication, it might be named 'authUser', 'authenticateUser', or something similar. Can you do a fuzzy search for 'authUser'?
search_content(action="search", pattern="authUser", fuzzy=True)

Example: Search within Specific Files

Search for the string "API_ENDPOINT" only in Python files.
search_content(action="search", pattern="API_ENDPOINT", file_pattern="*.py")

6. Modify Files

When you need to make changes to your code, use the modify_file mega-tool with different operations.

Example: Write a New File

Create a new file 'src/utils/helpers.ts' with utility functions.
modify_file(
    operation="write",
    path="src/utils/helpers.ts",
    content="export function formatDate(date: Date): string { ... }",
    line_count=5
)

Example: Search and Replace

Replace all occurrences of 'TODO' with 'FIXME' in the api.ts file.
modify_file(
    operation="replace",
    path="src/api.ts",
    search="TODO",
    replace="FIXME"
)

7. Get System Diagnostics

Monitor the health and performance of your code indexing system.

Example: Check Memory Usage

How much memory is the indexer currently using?
get_diagnostics(type="memory")

Example: View Index Statistics

Show me the current index statistics.
get_diagnostics(type="index", force_refresh=True)

8. Manage File Versions

Track and revert changes to files using the version history system.

Example: Get File History

Show me the change history for config.json.
manage_files(action="history", file_path="config.json")

Example: Revert to Previous Version

Revert main.py to the version from yesterday.
manage_files(
    action="revert",
    file_path="main.py",
    timestamp="2024-01-15T10:30:00"
)

Development

Building from Source

  1. Clone the repository:
git clone https://github.com/scooter-lacroix/code-indexer.git
cd code-index-mcp
  1. Install dependencies:
uv sync
  1. Run the server locally:
uv run code_index_mcp

Debugging

You can use the MCP inspector to debug the server:

npx @modelcontextprotocol/inspector uvx code-index-mcp

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Languages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sc_code_indexer-2.1.0.tar.gz (346.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sc_code_indexer-2.1.0-py3-none-any.whl (378.5 kB view details)

Uploaded Python 3

File details

Details for the file sc_code_indexer-2.1.0.tar.gz.

File metadata

  • Download URL: sc_code_indexer-2.1.0.tar.gz
  • Upload date:
  • Size: 346.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sc_code_indexer-2.1.0.tar.gz
Algorithm Hash digest
SHA256 fad8d1b40e2331a56c365c72b0c39b69273cb5bf1f164117db8f18d3846673e9
MD5 2ee9d86506b3c14755a01c4b9ce0bcdf
BLAKE2b-256 9d43afcabccc48afcd06bff44a34bc92b45b759711fe4fce42752b293fed01e3

See more details on using hashes here.

File details

Details for the file sc_code_indexer-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sc_code_indexer-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a35feadca7379dd441b8a50598fbd3f5fa785ee4c2e822fbfb98934f6322d6b5
MD5 2d846b56f02140e6b8d2bce4a249e8c1
BLAKE2b-256 c72f36809b13d007445a7306748ff58a1825e8e0fd0fef876c5c141702c326ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page