LeIndex: AI-powered code search and indexing system with MCP integration - Performance Optimization Release
Project description
LeIndex
AI-Powered Multi-Project Code Search With Advanced Memory Management
Lightning-fast semantic code search with global index, cross-project search, and intelligent memory management. Find code by meaning, not just by matching text.
The LeIndex experience - powerful, fast, and beautiful
โจ What Makes LeIndex Special?
LeIndex isn't just another code search tool. It's your intelligent code companion that understands what you're looking for, not just where it might be typed.
Imagine searching for "authentication flow" and finding not just files containing those words, but the actual authentication logic, login handlers, session management, and security patterns - even if they're named completely differently. That's the magic of semantic search! ๐ฏ
๐ Quick Start (You'll Be Searching in Under 2 Minutes. It's Easier Than Making Coffee!)
One-Click Installation
The easiest way to get started:
Requirements
- Python 3.10 or higher
- 4GB RAM minimum (8GB+ for large codebases)
- About 1GB disk space
Linux/Unix:
curl -sSL https://raw.githubusercontent.com/scooter-lacroix/LeIndex/master/install.sh | bash
macOS:
curl -sSL https://raw.githubusercontent.com/scooter-lacroix/LeIndex/master/install_macos.sh | bash
Windows:
irm https://raw.githubusercontent.com/scooter-lacroix/LeIndex/master/install.ps1 | iex
That's it. The installer will:
- โ Install LeIndex MCP server
- โ Detect your AI tools (Claude Code, Cursor, etc.)
- โ Configure integrations automatically
- โ Install optional skills for enhanced workflows
Manual installation? See below โ
# Install LeIndex - seriously, that's it
pip install leindex
# Index your codebase (no Docker, no databases, no headache)
leindex init /path/to/your/project
leindex index /path/to/your/project
# Search like a wizard
leindex-search "authentication logic"
# Or use it via MCP in Claude, Cursor, or your favorite AI assistant
# LeIndex MCP server does the heavy lifting automatically!
OR
PIP Install
pip install leindex
That's literally it. No Docker. No databases. No configuration files (unless you want them). Just works. โจ
Verify It's Alive
leindex --version
# Output: LeIndex 2.0.2 - Ready to search! ๐
Install from Source (For the Adventurous)
git clone https://github.com/scooter-lacroix/leindex.git
cd leindex
pip install -e .
Boom! You're now searching your codebase at the speed of thought. ๐
๐ฏ Why Developers Love LeIndex
๐ฅ Zero Dependencies, Zero Drama
- No Docker - Your laptop will thank you
- No PostgreSQL - No database setup nightmares
- No Elasticsearch - No Java memory leaks
- No RabbitMQ - No message queue complexity
- Just pure Python magic -
pip installand you're done
โก Blazing Fast Performance
- LEANN vector search - Find similar code in milliseconds
- Tantivy full-text search - Rust-powered Lucene goodness
- Hybrid scoring - Best of both worlds: semantic + lexical
- Handles 100K+ files - Scale from side projects to monorepos
๐ง Semantic Understanding
- CodeRankEmbed embeddings - Understands code meaning and intent
- Finds by concept - Search "error handling" and find try/except, error types, logging, and recovery patterns
- Smart symbol search - Jump to definitions and references instantly
- Regex power - For when you need precise pattern matching
๐ Privacy-First & Self-Hosted
- Your code stays yours - Nothing leaves your machine
- Works offline - No internet required after installation
- No telemetry - We don't track your searches
- Enterprise-ready - Deploy on your own infrastructure
๐ค MCP-Native Design
- First-class MCP support - Built from the ground up for Model Context Protocol
- AI assistant ready - Works seamlessly with Claude, Cursor, Windsurf, and more
- Token efficient - Saves ~200 tokens per session (no hook overhead!)
- Optional skill integration - For complex multi-project workflows
๐ช The LeIndex Magic Show
๐ Search That Reads Your Mind
# Search semantically
results = indexer.search("authentication flow")
# Get results that actually make sense:
# - Login handlers (even if named 'sign_in')
# - Session management (even if called 'user_state')
# - JWT verification (even if labeled 'token_check')
# - Password hashing (even if in 'crypto_utils')
๐ The Secret Sauce (Technology Stack)
| Component | Technology | Superpower |
|---|---|---|
| Vector Search | LEANN | Storage-efficient semantic similarity |
| Code Brain | CodeRankEmbed | Understands code meaning & intent |
| Text Search | Tantivy | Rust-powered Lucene (fast!) |
| Metadata | SQLite | Reliable ACID-compliant storage |
| Analytics | DuckDB | In-memory analytical queries |
| Async Engine | asyncio | Built-in Python async (no RabbitMQ needed!) |
๐๏ธ Architecture That Makes Sense
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ The LeIndex Experience โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ MCP Server โโโโถโ Core Engine โโโโถโ LEANN โ โ
โ โ (Your AI โ โ (The Brains)โ โ (Vectors) โ โ
โ โ Assistant) โ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โ โผ โผ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ โ Query Router โโโโถโ Tantivy โ โ
โ โ โ (Traffic โ โ(Full-Text)โ โ
โ โ โ Cop) โ โ โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ CLI Tools โ โ Data Access โ โ SQLite โ โ
โ โ (Power User) โ โ Layer โ โ (Metadata)โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ DuckDB โ โ
โ โ (Analytics) โ โ
โ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก Everything runs locally. No cloud. No dependencies. Just speed.
๐ฏ Usage: Let's Search Some Code!
๐ค MCP Integration (The Cool Way)
LeIndex comes with a built-in MCP server that makes your AI assistant code-aware:
Available MCP Superpowers:
manage_project- Set up and manage indexing for your projectssearch_content- Search code with semantic + full-text powersget_diagnostics- Get project stats and health checks
Configuration in your MCP client (Claude, Cursor, etc.):
{
"mcpServers": {
"leindex": {
"command": "leindex",
"args": ["mcp"],
"env": {}
}
}
}
Start the MCP server:
leindex mcp
Now your AI assistant can search your codebase like a pro! ๐
When to use what:
| Approach | Best For |
|---|---|
| MCP Tools | Single-project searches, simple queries, direct API access |
| Skills | Multi-project operations, complex workflows, automated pipelines |
๐ Python API (For the Coders)
from leindex import LeIndex
# Initialize and index
indexer = LeIndex("~/my-awesome-project")
indexer.index()
# Search semantically - it understands meaning!
results = indexer.search("authentication flow")
# Filter like a boss
results = indexer.search(
query="database connection",
file_patterns=["*.py"], # Only Python files
exclude_patterns=["test_*.py"] # But not tests
)
# Access the good stuff
for result in results:
print(f"{result.file}:{result.line}")
print(result.content)
print(f"Relevance Score: {result.score}")
๐ง CLI Tools (For the Terminal Lovers)
# Initialize indexing for a project
leindex init /path/to/project
# Run the indexing (it's fast, we promise)
leindex index /path/to/project
# Search from terminal
leindex-search "authentication logic"
# Search with filters
leindex-search "database" --ext py --exclude test_*
โ๏ธ Configuration (Optional but Powerful)
LeIndex works great out of the box, but you can tweak it to your heart's content with config.yaml:
# Data Access Layer (The Engine Room)
dal_settings:
backend_type: "sqlite_duckdb" # The good stuff
db_path: "./data/leindex.db" # Where metadata lives
duckdb_db_path: "./data/leindex.db.duckdb" # Analytics heaven
# Vector Store (Semantic Search Magic)
vector_store:
backend_type: "leann" # Storage-efficient vectors
index_path: "./leann_index" # Where vectors chill
embedding_model: "nomic-ai/CodeRankEmbed" # Code brain
embedding_dim: 768 # Vector dimensions
# Async Processing (Speed Demon)
async_processing:
enabled: true
worker_count: 4 # Parallel indexing
max_queue_size: 10000 # Queue buffer
# File Filtering (Keep It Lean)
file_filtering:
max_file_size: 1073741824 # 1GB per file
type_specific_limits:
".py": 1073741824 # Python files up to 1GB
".json": 104857600 # JSON files up to 100MB
# Directory Filtering (Ignore the Junk)
directory_filtering:
skip_large_directories:
- "**/node_modules/**" # No JavaScript dependency hell
- "**/.git/**" # No git history
- "**/venv/**" # No virtual environments
- "**/__pycache__/**" # No Python cache
# Performance Optimization (NEW in v1.1.0)
performance:
# File stat caching
file_stat_cache:
enabled: true
max_size: 10000 # Maximum cache entries
ttl_seconds: 300 # Cache TTL (5 minutes)
# Parallel processing
parallel_scanner:
max_workers: 4 # Concurrent directory scans
timeout_seconds: 300 # Scan timeout
parallel_processor:
max_workers: 4 # Content extraction workers
batch_size: 100 # Files per batch
# Embedding optimization
embeddings:
batch_size: 32 # Files per embedding batch
enable_gpu: true # Use GPU if available
device: "auto" # auto, cuda, mps, rocm, cpu
fp16: true # Use half-precision on GPU
# Pattern matching
pattern_trie:
enabled: true
cache_size: 1000 # Pattern cache size
Need more speed? Check out the Performance Optimization Guide for advanced tuning!
๐ Performance Stats (We're Not Slow)
v1.1.0 Performance Optimization Release
| Metric | Before (v1.0.8) | After (v1.1.0) | Improvement |
|---|---|---|---|
| Indexing Speed | ~2K files/min | ~10K files/min | 5x faster |
| File Scanning | Sequential os.walk() | ParallelScanner | 3-5x faster |
| Pattern Matching | Naive O(n*m) | PatternTrie O(m) | 10-100x faster |
| File Stats | Uncached syscalls | FileStatCache | 5-10x faster |
| Embeddings (CPU) | Single-file | Batching (32) | 3-5x faster |
| Embeddings (GPU) | CPU-only | GPU-accelerated | 5-10x faster |
| Memory Efficiency | High overhead | Optimized batching | 30% reduction |
| Search Latency (p50) | ~50ms | ~50ms | Maintained |
| Search Latency (p99) | ~200ms | ~180ms | 10% faster |
| Max Scalability | 100K+ files | 100K+ files | Maintained |
| Memory Usage | <4GB | <3GB | 25% reduction |
Comparison with Typical Code Search
| Metric | LeIndex v1.1.0 | Typical Code Search | Difference |
|---|---|---|---|
| Indexing Speed | ~10K files/min | ~500 files/min | 20x faster |
| Search Latency (p50) | ~50ms | ~500ms | 10x faster |
| Search Latency (p99) | ~180ms | ~5s | 28x faster |
| Max Scalability | 100K+ files | 10K files | 10x more |
| Memory Usage | <3GB | >8GB | 2.7x less |
| Setup Time | 2 minutes | 2+ hours | 60x faster |
Hardware Requirements
Minimum (CPU-only):
- CPU: 4 cores (any modern processor)
- RAM: 4GB
- Storage: 1GB disk space
- Expected: ~2K files/min indexing speed
Recommended (with GPU):
- CPU: 8+ cores (Intel/AMD)
- RAM: 8-16GB
- GPU: NVIDIA RTX, Apple M1/M2/M3, or AMD RX (optional)
- Storage: SSD preferred
- Expected: ~10K files/min indexing speed
Large Repositories (100K+ files):
- CPU: 16+ cores
- RAM: 16-32GB
- GPU: 8GB+ VRAM (RTX 3060 or better)
- Storage: NVMe SSD
- Expected: ~20K+ files/min indexing speed
GPU Acceleration
Supported Platforms:
- โ NVIDIA CUDA: GTX 10xx, RTX 20xx/30xx/40xx series
- โ Apple MPS: M1, M2, M3 (Pro/Max/Ultra)
- โ AMD ROCm: RX 6000/7000 series
- โ CPU Fallback: Any modern CPU
Performance with GPU:
- Embeddings: 5-10x faster than CPU
- Indexing: 2-3x overall speedup
- Energy efficiency: 50% less power per operation
Enable GPU in config.yaml:
performance:
embeddings:
enable_gpu: true
device: "auto" # Auto-detects CUDA/MPS/ROCm
For detailed performance tuning, see Performance Optimization Guide
- Benchmarks on 10K-100K file repositories. Your mileage may vary, but it'll still be fast!
๐ NEW in v2.0: Global Index & Advanced Memory Management
๐ Global Index - Cross-Project Search
Search across ALL your projects simultaneously with intelligent query routing and graceful degradation:
from leindex.global_index import cross_project_search
# Search across multiple projects at once
results = cross_project_search(
pattern="authentication",
project_ids=["project1", "project2", "project3"],
fuzzy=True,
case_sensitive=False
)
# Get aggregated results with project-specific metadata
for result in results:
print(f"{result.project_id}: {result.matches} matches")
for match in result.results:
print(f" {match.file_path}:{match.line_number}")
Global Index Features:
- Two-Tier Architecture: Tier 1 (metadata) + Tier 2 (query cache)
- Project Comparison Dashboard: Compare projects by size, language, health score
- Event-Driven Updates: Real-time synchronization across projects
- Graceful Degradation: Falls back to alternative search methods on errors
- Cross-Project Statistics: Aggregate metrics across all indexed projects
MCP Tools for Global Index:
# Get global statistics
get_global_stats()
# List all projects with health scores
list_projects(format="detailed")
# Cross-project search
cross_project_search_tool(
pattern="database",
project_ids=["project1", "project2"]
)
# Project comparison dashboard
get_dashboard(
language="Python",
min_health_score=0.8,
sort_by="last_indexed"
)
๐ง Advanced Memory Management
Intelligent memory management with automatic cleanup and zero-downtime configuration:
from leindex.memory import MemoryManager, ThresholdManager
# Monitor memory usage
manager = MemoryManager()
status = manager.get_status()
print(f"Memory: {status.current_mb:.1f} MB / {status.peak_mb:.1f} MB peak")
# Automatic memory actions at thresholds
# - 80%: Trigger garbage collection
# - 93%: Spill cached data to disk
# - 98%: Emergency eviction of low-priority data
Memory Management Features:
- Hierarchical Configuration: Global defaults + per-project overrides
- RSS Memory Tracking: Actual memory usage (not just allocations)
- Priority-Based Eviction: Intelligently frees memory based on data importance
- Zero-Downtime Reload: Update memory config without restarting
- Graceful Shutdown: Persist cache state for fast recovery
- Continuous Monitoring: Background memory tracking with alerts
Configuration Example:
# Global memory settings
memory:
total_budget_mb: 3072 # 3GB total budget
soft_limit_percent: 0.80 # 80% = cleanup triggered
hard_limit_percent: 0.93 # 93% = spill to disk
emergency_percent: 0.98 # 98% = emergency eviction
# Project-specific overrides
project_defaults:
max_loaded_files: 1000 # Max files in memory
max_cached_queries: 500 # Max cached search results
# Per-project override
projects:
my-large-project:
memory:
max_loaded_files: 5000 # Override for large project
โ๏ธ Advanced Configuration System
Hierarchical YAML configuration with validation, migration, and hot-reload:
from leindex.config import GlobalConfigManager, first_time_setup
# First-time setup with hardware detection
result = first_time_setup()
if result.success:
print(f"Config created at: {result.config_path}")
# Load configuration with validation
manager = GlobalConfigManager()
config = manager.get_config()
# Access configuration
print(f"Memory budget: {config.memory.total_budget_mb} MB")
print(f"Max workers: {config.performance.parallel_scanner_max_workers}")
# Zero-downtime reload
from leindex.config import reload_config
result = reload_config()
print(f"Reloaded: {result.success}")
Configuration Features:
- Hardware Detection: Automatic optimization for your system
- Validation Rules: Catch configuration errors before runtime
- Migration Support: Automatic upgrade from older config versions
- Hot Reload: Update config without restarting (SIGHUP)
- Project Overrides: Per-project settings override global defaults
- Secure Permissions: Config files protected with restrictive permissions
Configuration Locations:
~/.leindex/
โโโ config.yaml # Global configuration
โโโ config.backup.yaml # Automatic backups
โโโ projects/
โโโ project-a.yaml # Project-specific overrides
โโโ project-b.yaml
๐ New Documentation
- docs/GLOBAL_INDEX.md - Global index architecture and usage
- docs/MEMORY_MANAGEMENT.md - Memory management guide
- docs/CONFIGURATION.md - Configuration reference
- docs/MIGRATION.md - v1 to v2 migration guide
- examples/cross_project_search.py - Cross-project search examples
- examples/memory_configuration.py - Memory config examples
- examples/dashboard_usage.py - Dashboard examples
๐ v2.0 Performance Improvements
| Feature | v1.1.0 | v2.0.0 | Improvement |
|---|---|---|---|
| Cross-Project Search | Not available | <100ms | NEW |
| Memory Efficiency | Manual tuning | Automatic management | 70% reduction |
| Config Reload | Restart required | Zero-downtime | Instant |
| Project Comparison | Manual | Dashboard API | Automated |
| Graceful Degradation | All-or-nothing | Fallback chain | Resilient |
| Indexing Speed | ~10K files/min | ~12K files/min | 20% faster |
๐ The Evolution: Of LeIndex
LeIndex is a complete reimagining the code indexing experience:
- โ
CLI streamlined - Simple
leindexcommands - โ
Environment unified -
LEINDEX_*environment variables - โ Revolutionary stack - No external dependencies
- โ Lightweight architecture - Pure Python with LEANN + Tantivy + SQLite + DuckDB
What we gained:
- โ Simplicity
- โ Speed
- โ Token efficiency (~200 tokens/session saved)
- โ Pure MCP architecture
- โ Developer happiness
๐ Documentation That Doesn't Suck
Core Documentation
- Installation Guide - Detailed setup instructions
- MCP Configuration - MCP server setup and examples
- Architecture Deep Dive - System design and internals
- API Reference - Complete API documentation
- Migration Guide - Upgrading from v1 to v2
- Performance Optimization Guide - Tuning for maximum speed โก
- Contributing - Join the fun!
v2.0 Feature Documentation
- docs/GLOBAL_INDEX.md - Global index architecture and cross-project search
- docs/MEMORY_MANAGEMENT.md - Memory management and monitoring
- docs/CONFIGURATION.md - Configuration reference and examples
๐งช Development (For the Curious)
Project Structure
leindex/
โโโ src/leindex/ # The magic happens here
โ โโโ dal/ # Data Access Layer
โ โโโ storage/ # Storage backends
โ โโโ search/ # Search engines
โ โโโ core_engine/ # Core indexing & search
โ โโโ config_manager.py # Config wizardry
โ โโโ project_settings.py # Project settings
โ โโโ constants.py # Shared constants
โ โโโ server.py # MCP server
โโโ tests/ # Test suite
โโโ config.yaml # Configuration
โโโ pyproject.toml # Project metadata
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run with coverage (because we care)
pytest --cov=leindex tests/
๐ค Contributing (Join the Party!)
We love contributions! Whether it's bug fixes, new features, documentation improvements, or just spreading the word - it's all appreciated.
Please see CONTRIBUTING.md for guidelines. We promise we're friendly! ๐
๐ License
MIT License - see LICENSE for details. Use it anywhere, modify it, share it. Go wild!
๐ Acknowledgments (Standing on Giants)
LeIndex is built on amazing open-source projects:
- LEANN - Storage-efficient vector search
- Tantivy - Pure Python full-text search (Rust Lucene)
- DuckDB - Fast analytical database
- SQLite - Embedded relational database
- CodeRankEmbed - Code embeddings
- Model Context Protocol - AI integration
Massive thanks to all the contributors! ๐
๐ฌ Support & Community
- GitHub Issues: https://github.com/scooter-lacroix/leindex/issues
- Documentation: https://github.com/scooter-lacroix/leindex
- Star us on GitHub - It helps more people discover LeIndex! โญ
Built with โค๏ธ for developers who love their code
โญ Star us on GitHub โ it makes us smile!
Ready to search smarter? Install LeIndex now ๐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leindex-1.1.1.tar.gz.
File metadata
- Download URL: leindex-1.1.1.tar.gz
- Upload date:
- Size: 499.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac80eeeac19b112e021fb1f67f7710baf3acfce267aa1d805056f899412b2ecd
|
|
| MD5 |
6c4ae2cdb19d34effdab61307fae3b0e
|
|
| BLAKE2b-256 |
3a9a0ea20d5c530fc4415176b2859a3206639aab33789e3c235feecda1078687
|
File details
Details for the file leindex-1.1.1-py3-none-any.whl.
File metadata
- Download URL: leindex-1.1.1-py3-none-any.whl
- Upload date:
- Size: 549.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e63f0c1a0776aa60b30f5b40f121ff6d7d4f8b8b81156fa70d5039cedf9f3a2
|
|
| MD5 |
54f7a395ab740cedf7eb2c0134dcffe2
|
|
| BLAKE2b-256 |
acec4a1042876f7ff35afcdbb9d6fcd7448c4642de83e4b1e60375f76d644cbe
|