Lightweight semantic code search engine — hybrid vector + FTS + AST graph + regex fusion + MCP server
Project description
codexlens-search
Semantic code search engine with MCP server for Claude Code.
Hybrid search: vector + FTS + AST graph + ripgrep regex — with RRF fusion and reranking.
Quick Start
uv pip install codexlens-search
Add to your project .mcp.json:
{
"mcpServers": {
"codexlens": {
"command": "uvx",
"args": ["--from", "codexlens-search", "codexlens-mcp"],
"env": {
"CODEXLENS_EMBED_API_URL": "https://api.openai.com/v1",
"CODEXLENS_EMBED_API_KEY": "${OPENAI_API_KEY}",
"CODEXLENS_EMBED_API_MODEL": "text-embedding-3-small",
"CODEXLENS_EMBED_DIM": "1536"
}
}
}
}
That's it. Claude Code will auto-discover the tools: index_project → Search.
All features are included by default — MCP server, AST parsing, USearch + FAISS backends, file watcher, gitignore filtering. GPU acceleration is auto-detected when available.
Install
# One command — GPU auto-detected on Windows, no extra steps
uv pip install codexlens-search
# Linux with NVIDIA GPU (requires CUDA + cuDNN)
uv pip install codexlens-search[gpu]
Default install includes:
- MCP server —
codexlens-mcpcommand - AST parsing — tree-sitter symbol extraction + graph search (on by default)
- USearch — high-performance HNSW ANN backend (default, cross-platform)
- FAISS — ANN + binary index backend (Hamming coarse search)
- File watcher — watchdog auto-indexing
- Gitignore filtering — recursive
.gitignoresupport (on by default) - GPU acceleration — Windows auto-installs
onnxruntime-directml, works with any DirectX 12 GPU (NVIDIA/AMD/Intel), no CUDA needed. GPU is auto-detected at runtime — no config needed
[gpu] adds onnxruntime-gpu + faiss-gpu for Linux CUDA setups.
ANN Backend Selection
Three backends for approximate nearest neighbor search, auto-selected in order:
| Backend | Install | Best for |
|---|---|---|
usearch (default) |
Included | Cross-platform, fastest CPU HNSW |
faiss |
Included | GPU acceleration, binary Hamming search |
hnswlib |
Included | Lightweight fallback |
Override with CODEXLENS_ANN_BACKEND:
# Force specific backend
CODEXLENS_ANN_BACKEND=faiss # use FAISS (GPU when available)
CODEXLENS_ANN_BACKEND=usearch # use USearch (default)
CODEXLENS_ANN_BACKEND=hnswlib # use hnswlib
CODEXLENS_ANN_BACKEND=auto # auto-select (usearch → faiss → hnswlib)
MCP Tools
Search
Hybrid code search combining semantic vector, FTS, AST graph, and ripgrep regex.
| Mode | Description | Requires |
|---|---|---|
auto (default) |
Semantic + regex parallel. Auto-triggers background indexing if none exists. | |
symbol |
Find definitions by exact/fuzzy name match | Index |
refs |
Find cross-references — incoming and outgoing edges | Index |
regex |
Ripgrep regex on live files | rg |
Parameters: project_path, query, mode, scope (restricts auto/regex to subdirectory)
Results capped by CODEXLENS_TOP_K env var (default 10).
index_project
Build, update, or inspect the search index.
| Action | Description |
|---|---|
sync (default) |
Incremental — only changed files |
rebuild |
Full re-index from scratch |
status |
Index statistics (files, chunks, symbols, refs) |
Parameters: project_path, action, scope
find_files
Glob-based file discovery. Parameters: project_path, pattern (default **/*)
Max results controlled by CODEXLENS_FIND_MAX_RESULTS env var (default 100).
watch_project
Manage file watcher for automatic re-indexing on file changes.
Parameters: project_path, action (start / stop / status)
AST Features
Enabled by default. Disable with CODEXLENS_AST_CHUNKING=false.
- Smart chunking — splits at symbol boundaries instead of fixed-size windows
- Symbol extraction — 12 kinds: function, class, method, module, variable, constant, interface, type_alias, enum, struct, trait, property
- Cross-references — import, call, inherit, type_ref edges
- Graph search — BFS expansion from matches, fused with adaptive weights
Languages: Python, JavaScript, TypeScript, Go, Java, Rust, C, C++, Ruby, PHP, Scala, Kotlin, Swift, C#, Bash, Lua, Haskell, Elixir, Erlang.
Configuration Examples
Reranker (best quality)
Add reranker API on top of the Quick Start config:
"CODEXLENS_RERANKER_API_URL": "https://api.jina.ai/v1",
"CODEXLENS_RERANKER_API_KEY": "${JINA_API_KEY}",
"CODEXLENS_RERANKER_API_MODEL": "jina-reranker-v2-base-multilingual"
Multi-Endpoint Load Balancing
"CODEXLENS_EMBED_API_ENDPOINTS": "https://api1.example.com/v1|sk-key1|model,https://api2.example.com/v1|sk-key2|model",
"CODEXLENS_EMBED_DIM": "1536"
Format: url|key|model,url|key|model,... — replaces single-endpoint EMBED_API_URL/KEY/MODEL.
Local Models (Offline)
codexlens-search download-models
{
"mcpServers": {
"codexlens": {
"command": "codexlens-mcp",
"env": {}
}
}
}
GPU
Windows: GPU acceleration is included by default — onnxruntime-directml is auto-installed and works with any DirectX 12 GPU (NVIDIA/AMD/Intel). No CUDA, no extra install, no config.
Linux: uv pip install codexlens-search[gpu] adds CUDA support (requires CUDA + cuDNN).
Auto-detection priority: CUDA > DirectML > CPU
- Embedding — ONNX runtime selects best available GPU provider, ~12x faster than CPU
- FAISS — index auto-transfers to GPU 0 (CUDA only)
Force specific device: CODEXLENS_DEVICE=directml / cuda / cpu
CLI
codexlens-search --db-path .codexlens sync --root ./src
codexlens-search --db-path .codexlens search -q "auth handler" -k 10
codexlens-search --db-path .codexlens status
codexlens-search list-models
codexlens-search download-models
Environment Variables
Embedding
| Variable | Description |
|---|---|
CODEXLENS_EMBED_API_URL |
API base URL (e.g. https://api.openai.com/v1) |
CODEXLENS_EMBED_API_KEY |
API key |
CODEXLENS_EMBED_API_MODEL |
Model name (e.g. text-embedding-3-small) |
CODEXLENS_EMBED_API_ENDPOINTS |
Multi-endpoint: url|key|model,... |
CODEXLENS_EMBED_DIM |
Vector dimension (e.g. 1536) |
Reranker
| Variable | Description |
|---|---|
CODEXLENS_RERANKER_API_URL |
Reranker API base URL |
CODEXLENS_RERANKER_API_KEY |
API key |
CODEXLENS_RERANKER_API_MODEL |
Model name |
Features
| Variable | Default | Description |
|---|---|---|
CODEXLENS_AST_CHUNKING |
true |
AST chunking + symbol extraction |
CODEXLENS_GITIGNORE_FILTERING |
true |
Recursive .gitignore filtering |
CODEXLENS_DEVICE |
auto |
auto / cuda / cpu |
CODEXLENS_AUTO_WATCH |
false |
Auto-start file watcher after indexing |
MCP Tool Defaults
| Variable | Default | Description |
|---|---|---|
CODEXLENS_TOP_K |
10 |
Search result limit |
CODEXLENS_FIND_MAX_RESULTS |
100 |
find_files result limit |
Tuning
| Variable | Default | Description |
|---|---|---|
CODEXLENS_BINARY_TOP_K |
200 |
Binary coarse search candidates |
CODEXLENS_ANN_TOP_K |
50 |
ANN fine search candidates |
CODEXLENS_FTS_TOP_K |
50 |
FTS results per method |
CODEXLENS_FUSION_K |
60 |
RRF fusion k parameter |
CODEXLENS_RERANKER_TOP_K |
20 |
Results to rerank |
CODEXLENS_EMBED_BATCH_SIZE |
32 |
Texts per API batch |
CODEXLENS_EMBED_MAX_TOKENS |
8192 |
Max tokens per text (0=no limit) |
CODEXLENS_INDEX_WORKERS |
2 |
Parallel indexing workers |
CODEXLENS_MAX_FILE_SIZE |
1000000 |
Max file size in bytes |
Architecture
Query → [Embedder] → query vector
├→ [FAISS Binary] → candidates (Hamming)
│ └→ [USearch/FAISS HNSW] → ranked IDs (cosine)
├→ [FTS exact + fuzzy] → text matches
├→ [GraphSearcher] → symbol neighbors (BFS)
└→ [ripgrep] → regex matches
└→ [RRF Fusion] → merged ranking
└→ [Reranker] → final top-k
Development
git clone https://github.com/catlog22/codexlens-search.git
cd codexlens-search
uv pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codexlens_search-0.6.4.tar.gz.
File metadata
- Download URL: codexlens_search-0.6.4.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f55a15bf55e436780ac85fa935202a00149ffd53d6e9e9a9378d08066e956ef
|
|
| MD5 |
1347905ed8abe1e9ce87a381699386ef
|
|
| BLAKE2b-256 |
398de26c2a122f7def726a5fa6e46097bead3b2fe62421e881502ac65cf6622c
|
File details
Details for the file codexlens_search-0.6.4-py3-none-any.whl.
File metadata
- Download URL: codexlens_search-0.6.4-py3-none-any.whl
- Upload date:
- Size: 95.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
901d6a6332be3dd5617b98640bd13392e5fba8fa4474401cf39e8c9c93b1fd35
|
|
| MD5 |
99fdfa78ce45b1a929d3a6f68bb75fcb
|
|
| BLAKE2b-256 |
d1193099a5e6743b2b38df7643f3ad86f8a97de9c92036ec2cbcaff025f4db24
|