Lightweight semantic code search engine — hybrid vector + FTS + AST graph + regex fusion + MCP server
Project description
codexlens-search
Semantic code search engine with MCP server for Claude Code.
Hybrid search: vector + FTS + AST graph + ripgrep regex — with RRF fusion and reranking.
Quick Start (Claude Code MCP)
Add to your project .mcp.json:
{
"mcpServers": {
"codexlens": {
"command": "uvx",
"args": ["--from", "codexlens-search[mcp]", "codexlens-mcp"],
"env": {
"CODEXLENS_EMBED_API_URL": "https://api.openai.com/v1",
"CODEXLENS_EMBED_API_KEY": "${OPENAI_API_KEY}",
"CODEXLENS_EMBED_API_MODEL": "text-embedding-3-small",
"CODEXLENS_EMBED_DIM": "1536",
"CODEXLENS_AST_CHUNKING": "true"
}
}
}
}
That's it. Claude Code will auto-discover the tools: index_project → Search.
Install
# Standard install
uv pip install codexlens-search
# With MCP server
uv pip install codexlens-search[mcp]
# With AST parsing (symbol extraction, cross-references, graph search)
uv pip install codexlens-search[mcp,ast]
Optional extras:
| Extra | Description |
|---|---|
mcp |
MCP server (codexlens-mcp command) |
ast |
tree-sitter AST parsing (symbol extraction, graph search) |
gpu |
GPU-accelerated embedding (onnxruntime-gpu) |
faiss-cpu |
FAISS ANN backend |
watcher |
File watcher for auto-indexing |
gitignore |
Recursive .gitignore filtering |
MCP Tools
Search
Unified code search with 4 modes:
| Mode | Description | Requires Index | Requires rg |
|---|---|---|---|
auto (default) |
Semantic + regex parallel, falls back to regex if no index | - | - |
symbol |
Find definitions by name (class, function, method) | ✓ | |
refs |
Find cross-references (imports, calls, inheritance) | ✓ | |
regex |
Ripgrep regex on live files | ✓ |
Parameters:
project_path— Absolute path to the project rootquery— Natural language, code symbol, or regex patternmode—auto/symbol/refs/regextop_k— Max results (default 10)scope— Relative path to restrict search (e.g.src/auth)
Auto mode behavior:
- Has index + has rg → semantic and regex run in parallel, results merged with dedup
- Has index + no rg → semantic only
- No index + has rg → regex fallback
- No index + no rg → error with guidance
index_project
Build, update, or inspect the search index.
| Action | Description |
|---|---|
sync (default) |
Incremental update — only re-indexes changed files |
rebuild |
Full re-index from scratch |
status |
Show index statistics (files, chunks, symbols, refs) |
Parameters:
project_path— Absolute path to the project rootaction—sync/rebuild/statusscope— Relative directory to limit indexingforce— Alias foraction="rebuild"
find_files
Glob-based file discovery.
project_path— Absolute path to the project rootpattern— Glob pattern (default**/*)max_results— Max file paths to return (default 100)
AST Features
When CODEXLENS_AST_CHUNKING=true and [ast] extra is installed:
- Smart chunking — Splits code at symbol boundaries (functions, classes, methods) instead of fixed-size windows
- Symbol extraction — Indexes 12 symbol kinds: function, class, method, module, variable, constant, interface, type_alias, enum, struct, trait, property
- Cross-references — Extracts import, call, inherit, type_ref edges between symbols
- Graph search — BFS expansion from matched symbols, fused into hybrid results with adaptive weights
# Install AST support (tree-sitter 0.23+)
uv pip install codexlens-search[ast]
# Or individual grammar packages for Python 3.13+
pip install tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript
Supported languages: Python, JavaScript, TypeScript, Go, Java, Rust, C, C++, Ruby, PHP, Scala, Kotlin, Swift, C#, Bash, Lua, Haskell, Elixir, Erlang.
MCP Configuration Examples
API Embedding + AST (recommended)
{
"mcpServers": {
"codexlens": {
"command": "uvx",
"args": ["--from", "codexlens-search[mcp,ast]", "codexlens-mcp"],
"env": {
"CODEXLENS_EMBED_API_URL": "https://api.openai.com/v1",
"CODEXLENS_EMBED_API_KEY": "${OPENAI_API_KEY}",
"CODEXLENS_EMBED_API_MODEL": "text-embedding-3-small",
"CODEXLENS_EMBED_DIM": "1536",
"CODEXLENS_AST_CHUNKING": "true"
}
}
}
}
API Embedding + API Reranker (best quality)
{
"mcpServers": {
"codexlens": {
"command": "uvx",
"args": ["--from", "codexlens-search[mcp,ast]", "codexlens-mcp"],
"env": {
"CODEXLENS_EMBED_API_URL": "https://api.openai.com/v1",
"CODEXLENS_EMBED_API_KEY": "${OPENAI_API_KEY}",
"CODEXLENS_EMBED_API_MODEL": "text-embedding-3-small",
"CODEXLENS_EMBED_DIM": "1536",
"CODEXLENS_RERANKER_API_URL": "https://api.jina.ai/v1",
"CODEXLENS_RERANKER_API_KEY": "${JINA_API_KEY}",
"CODEXLENS_RERANKER_API_MODEL": "jina-reranker-v2-base-multilingual",
"CODEXLENS_AST_CHUNKING": "true"
}
}
}
}
Multi-Endpoint Load Balancing
{
"mcpServers": {
"codexlens": {
"command": "uvx",
"args": ["--from", "codexlens-search[mcp]", "codexlens-mcp"],
"env": {
"CODEXLENS_EMBED_API_ENDPOINTS": "https://api1.example.com/v1|sk-key1|model,https://api2.example.com/v1|sk-key2|model",
"CODEXLENS_EMBED_DIM": "1536"
}
}
}
}
Format: url|key|model,url|key|model,...
Local Models (Offline, No API)
uv pip install codexlens-search[mcp]
codexlens-search download-models
{
"mcpServers": {
"codexlens": {
"command": "codexlens-mcp",
"env": {}
}
}
}
CLI
codexlens-search --db-path .codexlens sync --root ./src
codexlens-search --db-path .codexlens search -q "auth handler" -k 10
codexlens-search --db-path .codexlens status
codexlens-search list-models
codexlens-search download-models
Environment Variables
Embedding
| Variable | Description | Example |
|---|---|---|
CODEXLENS_EMBED_API_URL |
Embedding API base URL | https://api.openai.com/v1 |
CODEXLENS_EMBED_API_KEY |
API key | sk-xxx |
CODEXLENS_EMBED_API_MODEL |
Model name | text-embedding-3-small |
CODEXLENS_EMBED_API_ENDPOINTS |
Multi-endpoint: url|key|model,... |
See above |
CODEXLENS_EMBED_DIM |
Vector dimension | 1536 |
Reranker
| Variable | Description | Example |
|---|---|---|
CODEXLENS_RERANKER_API_URL |
Reranker API base URL | https://api.jina.ai/v1 |
CODEXLENS_RERANKER_API_KEY |
API key | jina-xxx |
CODEXLENS_RERANKER_API_MODEL |
Model name | jina-reranker-v2-base-multilingual |
AST & Filtering
| Variable | Default | Description |
|---|---|---|
CODEXLENS_AST_CHUNKING |
false |
Enable tree-sitter AST chunking + symbol extraction |
CODEXLENS_GITIGNORE_FILTERING |
false |
Enable recursive .gitignore filtering |
Tuning
| Variable | Default | Description |
|---|---|---|
CODEXLENS_BINARY_TOP_K |
200 |
Binary coarse search candidates |
CODEXLENS_ANN_TOP_K |
50 |
ANN fine search candidates |
CODEXLENS_FTS_TOP_K |
50 |
FTS results per method |
CODEXLENS_FUSION_K |
60 |
RRF fusion k parameter |
CODEXLENS_RERANKER_TOP_K |
20 |
Results to rerank |
CODEXLENS_EMBED_BATCH_SIZE |
32 |
Max texts per API batch (auto-splits on 413) |
CODEXLENS_EMBED_MAX_TOKENS |
8192 |
Max tokens per text (truncate if exceeded, 0=no limit) |
CODEXLENS_INDEX_WORKERS |
2 |
Parallel indexing workers |
CODEXLENS_MAX_FILE_SIZE |
1000000 |
Max file size in bytes |
Architecture
Query → [Embedder] → query vector
├→ [BinaryStore] → candidates (Hamming)
│ └→ [ANNIndex] → ranked IDs (cosine)
├→ [FTS exact] → exact matches
├→ [FTS fuzzy] → fuzzy matches
├→ [GraphSearcher] → symbol neighbors (BFS)
└→ [ripgrep] → regex matches (parallel)
└→ [RRF Fusion] → merged ranking
└→ [Reranker] → final top-k
Development
git clone https://github.com/catlog22/codexlens-search.git
cd codexlens-search
uv pip install -e ".[dev,ast]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codexlens_search-0.5.1.tar.gz.
File metadata
- Download URL: codexlens_search-0.5.1.tar.gz
- Upload date:
- Size: 92.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e11cf90591323e401539fbda30ee7c059a1eff5e64ddd34d877f499e721e83
|
|
| MD5 |
a7e8f14c7c45c111701179898fa595f0
|
|
| BLAKE2b-256 |
31dd00660964f028e99352f5cf0d58fa7b0e27fb4e99c91d865f5853a6835871
|
File details
Details for the file codexlens_search-0.5.1-py3-none-any.whl.
File metadata
- Download URL: codexlens_search-0.5.1-py3-none-any.whl
- Upload date:
- Size: 89.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15feadbfc72f220e87a90903b6769410a32a9a300f875dd8c1ed539a12befa46
|
|
| MD5 |
01635a78b18c5f5ee1f2b27328819a84
|
|
| BLAKE2b-256 |
d22f45900644d082942e9b9eda29dc44a474178e6d25db718a0e4f3b3402b902
|