Skip to main content

Lightweight semantic code search engine — hybrid vector + FTS + AST graph + regex fusion + MCP server

Project description

codexlens-search

Semantic code search engine with MCP server for Claude Code.

Hybrid search: vector + FTS + AST graph + ripgrep regex — with RRF fusion and reranking.

Quick Start

uv pip install codexlens-search

Add to your project .mcp.json:

{
  "mcpServers": {
    "codexlens": {
      "command": "uvx",
      "args": ["--from", "codexlens-search", "codexlens-mcp"],
      "env": {
        "CODEXLENS_EMBED_API_URL": "https://api.openai.com/v1",
        "CODEXLENS_EMBED_API_KEY": "${OPENAI_API_KEY}",
        "CODEXLENS_EMBED_API_MODEL": "text-embedding-3-small",
        "CODEXLENS_EMBED_DIM": "1536"
      }
    }
  }
}

That's it. Claude Code will auto-discover the tools: index_projectSearch.

All features are included by default — MCP server, AST parsing, FAISS backend, file watcher, gitignore filtering. GPU acceleration is auto-detected when available.

Install

# Standard (batteries included)
uv pip install codexlens-search

# GPU acceleration (CUDA)
uv pip install codexlens-search[gpu]

Default install includes:

  • MCP servercodexlens-mcp command
  • AST parsing — tree-sitter symbol extraction + graph search (on by default)
  • FAISS — ANN + binary index backend
  • File watcher — watchdog auto-indexing
  • Gitignore filtering — recursive .gitignore support (on by default)

[gpu] adds onnxruntime-gpu + faiss-gpu. When GPU is detected, embedding and FAISS indexing automatically use CUDA — no config needed.

MCP Tools

Search

Hybrid code search combining semantic vector, FTS, AST graph, and ripgrep regex.

Mode Description Requires
auto (default) Semantic + regex parallel. Auto-triggers background indexing if none exists.
symbol Find definitions by exact/fuzzy name match Index
refs Find cross-references — incoming and outgoing edges Index
regex Ripgrep regex on live files rg

Parameters: project_path, query, mode, scope (restricts auto/regex to subdirectory)

Results capped by CODEXLENS_TOP_K env var (default 10).

index_project

Build, update, or inspect the search index.

Action Description
sync (default) Incremental — only changed files
rebuild Full re-index from scratch
status Index statistics (files, chunks, symbols, refs)

Parameters: project_path, action, scope

find_files

Glob-based file discovery. Parameters: project_path, pattern (default **/*)

Max results controlled by CODEXLENS_FIND_MAX_RESULTS env var (default 100).

watch_project

Manage file watcher for automatic re-indexing on file changes.

Parameters: project_path, action (start / stop / status)

AST Features

Enabled by default. Disable with CODEXLENS_AST_CHUNKING=false.

  • Smart chunking — splits at symbol boundaries instead of fixed-size windows
  • Symbol extraction — 12 kinds: function, class, method, module, variable, constant, interface, type_alias, enum, struct, trait, property
  • Cross-references — import, call, inherit, type_ref edges
  • Graph search — BFS expansion from matches, fused with adaptive weights

Languages: Python, JavaScript, TypeScript, Go, Java, Rust, C, C++, Ruby, PHP, Scala, Kotlin, Swift, C#, Bash, Lua, Haskell, Elixir, Erlang.

Configuration Examples

Reranker (best quality)

Add reranker API on top of the Quick Start config:

"CODEXLENS_RERANKER_API_URL": "https://api.jina.ai/v1",
"CODEXLENS_RERANKER_API_KEY": "${JINA_API_KEY}",
"CODEXLENS_RERANKER_API_MODEL": "jina-reranker-v2-base-multilingual"

Multi-Endpoint Load Balancing

"CODEXLENS_EMBED_API_ENDPOINTS": "https://api1.example.com/v1|sk-key1|model,https://api2.example.com/v1|sk-key2|model",
"CODEXLENS_EMBED_DIM": "1536"

Format: url|key|model,url|key|model,... — replaces single-endpoint EMBED_API_URL/KEY/MODEL.

Local Models (Offline)

codexlens-search download-models
{
  "mcpServers": {
    "codexlens": {
      "command": "codexlens-mcp",
      "env": {}
    }
  }
}

GPU

uv pip install codexlens-search[gpu]

Auto-detection handles everything:

  • Embedding — ONNX runtime selects CUDA provider
  • FAISS — index auto-transfers to GPU 0

Force CPU: CODEXLENS_DEVICE=cpu

CLI

codexlens-search --db-path .codexlens sync --root ./src
codexlens-search --db-path .codexlens search -q "auth handler" -k 10
codexlens-search --db-path .codexlens status
codexlens-search list-models
codexlens-search download-models

Environment Variables

Embedding

Variable Description
CODEXLENS_EMBED_API_URL API base URL (e.g. https://api.openai.com/v1)
CODEXLENS_EMBED_API_KEY API key
CODEXLENS_EMBED_API_MODEL Model name (e.g. text-embedding-3-small)
CODEXLENS_EMBED_API_ENDPOINTS Multi-endpoint: url|key|model,...
CODEXLENS_EMBED_DIM Vector dimension (e.g. 1536)

Reranker

Variable Description
CODEXLENS_RERANKER_API_URL Reranker API base URL
CODEXLENS_RERANKER_API_KEY API key
CODEXLENS_RERANKER_API_MODEL Model name

Features

Variable Default Description
CODEXLENS_AST_CHUNKING true AST chunking + symbol extraction
CODEXLENS_GITIGNORE_FILTERING true Recursive .gitignore filtering
CODEXLENS_DEVICE auto auto / cuda / cpu
CODEXLENS_AUTO_WATCH false Auto-start file watcher after indexing

MCP Tool Defaults

Variable Default Description
CODEXLENS_TOP_K 10 Search result limit
CODEXLENS_FIND_MAX_RESULTS 100 find_files result limit

Tuning

Variable Default Description
CODEXLENS_BINARY_TOP_K 200 Binary coarse search candidates
CODEXLENS_ANN_TOP_K 50 ANN fine search candidates
CODEXLENS_FTS_TOP_K 50 FTS results per method
CODEXLENS_FUSION_K 60 RRF fusion k parameter
CODEXLENS_RERANKER_TOP_K 20 Results to rerank
CODEXLENS_EMBED_BATCH_SIZE 32 Texts per API batch
CODEXLENS_EMBED_MAX_TOKENS 8192 Max tokens per text (0=no limit)
CODEXLENS_INDEX_WORKERS 2 Parallel indexing workers
CODEXLENS_MAX_FILE_SIZE 1000000 Max file size in bytes

Architecture

Query → [Embedder] → query vector
         ├→ [FAISS Binary] → candidates (Hamming)
         │     └→ [FAISS HNSW] → ranked IDs (cosine)
         ├→ [FTS exact + fuzzy] → text matches
         ├→ [GraphSearcher] → symbol neighbors (BFS)
         └→ [ripgrep] → regex matches
              └→ [RRF Fusion] → merged ranking
                    └→ [Reranker] → final top-k

Development

git clone https://github.com/catlog22/codexlens-search.git
cd codexlens-search
uv pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codexlens_search-0.6.0.tar.gz (152.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codexlens_search-0.6.0-py3-none-any.whl (90.1 kB view details)

Uploaded Python 3

File details

Details for the file codexlens_search-0.6.0.tar.gz.

File metadata

  • Download URL: codexlens_search-0.6.0.tar.gz
  • Upload date:
  • Size: 152.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for codexlens_search-0.6.0.tar.gz
Algorithm Hash digest
SHA256 15b031a86fd0b8fc2d997c76103ba7b2240c71d2be8030429d4c5f63271c3b55
MD5 5be853d495152a1e181604b83987253d
BLAKE2b-256 4898d59cf8ed71dd72286ecdaca98102b17b0d3d04272e6c4b2bc2e0f1db104c

See more details on using hashes here.

File details

Details for the file codexlens_search-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for codexlens_search-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1dbf592b633f918bfc22ad81c4b5c32c58e8a924f642e521f8eabcb289533bb
MD5 5e50ba15cb61f515d21734c8c02aebe6
BLAKE2b-256 f3740e5852f20e27e5a65d19d3eeffc98ab0abfdaccdffdc46b28b3db1f1d1c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page