Skip to main content

AST-based semantic code search that knows the neighborhood — every result comes with its call graph. MCP server included.

Project description

codelumen logo

codelumen

AST-based semantic code search that knows the neighborhood — every result comes with what it calls and what calls it.

License: MIT Build Status MCP Tree-sitter

Install · Quickstart · MCP Setup · Architecture · Config · Troubleshooting


   ┌──────────────┐    parse + embed     ┌──────────────┐    LLM enrich    ┌──────────────┐
   │  source tree │ ───────────────────► │ code  vector │ ──────────────► │  description │
   │              │                      │ + BM25 sparse│                  │  + queries   │
   └──────────────┘                      └──────────────┘                  └──────────────┘

                                Search via CLI, HTTP, or MCP.

codelumen indexes your codebase the way a developer thinks about it: every function, method, and class becomes a chunk, with its call graph, docstring, and signature attached. You search by intent ("how do we retry transient payment failures?") and get back the few chunks that actually answer the question.

Two stages: structural (always, free, ~seconds) and enrichment (optional, LLM-paid, drains a pending queue). The index is queryable after stage 1; stage 2 just makes natural-language matches sharper.

The package ships an MCP server so Claude Desktop, Cursor, Cline, Continue, Kiro, Zed, and any other MCP-aware client can call codelumen as a tool — your AI agent gets search, find_symbol, and get_chunk_context next to its built-in read_file and grep.


Features

  • AST-aware chunking via Tree-sitter — every function, method, and class becomes a first-class searchable unit
  • Call graph included — each chunk knows what it calls and what calls it
  • Hybrid retrieval — dense vectors (code, description, developer_queries) + BM25 sparse, all in one query
  • Token-efficient for AI agents — replaces dozens of read_file + grep calls with one search
  • MCP server out of the box — works with Claude Desktop, Cursor, Cline, Continue, Kiro, Zed
  • 10 languages — Python, JavaScript, TypeScript, Java, Go, PHP, C#, Ruby, Rust, C++
  • Free tier works — local embeddings + structural pipeline need zero API keys
  • Incremental indexing--changed re-indexes only what git diff touched

Install

# Local-only (free, offline embeddings via sentence-transformers)
pip install "codelumen[local]"

# With everything: voyage + openai + anthropic + cohere + chroma
pip install "codelumen[all]"

# Pick exactly what you need
pip install "codelumen[anthropic,local]"

Prefer an isolated CLI install? pipx keeps codelumen and its deps out of your global environment — recommended for a command-line tool:

pipx install "codelumen[local]"

Just want to try it without installing? With uv, run it straight from PyPI:

uvx --from "codelumen[local]" codelumen index .

Requires Python 3.11+. First index downloads the embedding model (~420 MB for the default all-mpnet-base-v2) and caches it.


60-second quickstart

No config files, no API keys, no setup. codelumen works offline with a local embedder by default. Just cd into any project and index it:

# 1. structural index — no LLM, free, no config needed
cd ~/code/your-project
codelumen index .

# 2. search — pure retrieval, ~80 ms (a background daemon stays warm)
codelumen search "how does the payment retry logic work?"

# 3. (optional) LLM enrichment — needs an Anthropic/OpenAI/OpenRouter/Ollama key
codelumen enrich

# 4. (optional) full RAG with answer generation
codelumen query "where is auth handled?"

The first command auto-starts a background daemon that loads the embedding model once and keeps it warm — so every later search (from any terminal or your editor) is instant. Each project gets its own index automatically under ~/.codelumen/indexes/, keyed by repo root. One global config lives at ~/.codelumen/config.yaml (created on first run); there is no per-project config file to manage.


CLI commands

Command What it does
codelumen index <path> Stage 1: parse + embed + upsert. No LLM.
codelumen index <path> --changed Incremental — only files in git diff HEAD~1.
codelumen index <path> --reset Rebuild from scratch (after changing the embedding model).
codelumen enrich Stage 2: drain pending chunks through an LLM.
codelumen enrich --force Re-enrich every chunk.
codelumen compact Prune orphaned records from .codelumen/enrichment.jsonl.
codelumen search "query" Pure retrieval. --format json/paths/compact/table.
codelumen query "question" Full RAG: retrieval + answer generation.
codelumen status Per-state chunk counts + provider summary.
codelumen projects List every indexed project in ~/.codelumen/indexes/.
codelumen serve Run the daemon in the foreground (it otherwise auto-starts).
codelumen stop Stop the background daemon.
codelumen doctor Daemon + config health check.

All commands act on the current project (nearest git root of your cwd). Override with --root /path/to/repo. They're thin clients to the daemon — no model loading, no config.yaml flag.


MCP — wire it into your AI editor

The package ships an MCP server (codelumen-mcp) that exposes eight tools — five for search, three for agent-driven enrichment:

Tool Use it for
search Semantic search over the index.
search_many Several searches in one call — results grouped per query.
find_symbol Exact-name lookup for a function/method/class.
get_chunk_context Full source + calls + called_by for a symbol.
index_status Sanity-check the index.
list_pending_enrichments Get a batch of chunks needing LLM enrichment.
save_enrichment Persist a summary the agent wrote.
enrichment_progress Loop sentinel — pending vs done.

Register it once, globally — it then works in every project you open, with no per-project setup. This is the entire config — drop it into your editor's user-level MCP file:

{
  "mcpServers": {
    "codelumen": {
      "command": "codelumen-mcp"
    }
  }
}
Where that file lives, per editor (click to expand)
Editor Config file
Claude Desktop (macOS) ~/Library/Application Support/Claude/claude_desktop_config.json
Claude Desktop (Windows) %APPDATA%\Claude\claude_desktop_config.json
Cursor ~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project)
Windsurf ~/.codeium/windsurf/mcp_config.json
Cline (VS Code) Cline panel → MCP ServersConfigure (edits cline_mcp_settings.json)
Continue ~/.continue/config.yaml → under a mcpServers: block
Kiro ~/.kiro/settings/mcp.json
Claude Code claude mcp add codelumen codelumen-mcp

A few clients use a slightly different shape — e.g. VS Code's native MCP uses a top-level "servers" key instead of "mcpServers". If yours differs, keep the command: "codelumen-mcp" part and match the client's MCP docs for the wrapper.

That's the whole config — no config.yaml path, no per-project entry. The server is a thin proxy: it forwards each call to the warm daemon, tagged with the project the editor currently has open (its nearest git root). Most editors launch the server with the workspace as the working directory, so this Just Works; if yours doesn't, pass the root explicitly:

{ "command": "codelumen-mcp", "args": ["--root", "${workspaceFolder}"] }

After restarting the editor, the agent sees codelumen.search, codelumen.find_symbol, etc. The daemon loads the model once for the whole machine — so no matter how many editors and terminals you have open, there's one model in memory and every call is ~80 ms.

Teach the agent when to use it (optional)

The MCP server gives the agent the tools; a short skill teaches it to reach for semantic search before grep/read. Install it once, user-level, into every AI editor you use:

codelumen install-skill            # auto-detects installed editors
codelumen install-skill -t all     # or force every supported target

Supported: Claude Code (skill), Cursor & Windsurf (rules), Kiro (steering), Codex (AGENTS.md). It writes to each editor's own convention at the user level — so, like everything else here, it's set up once and applies to every project. Re-run any time to update in place.

Why this matters for token usage

Without semantic search, an AI agent asked "where do we validate emails?" runs grep -r email, gets 200 hits, and reads dozens of files. With codelumen it calls search("validate email"), gets 3 ranked chunks back as JSON (~500 tokens), and reads only what it needs.

Worked example — "How does the OrderService validate orders?"

I indexed sample_project/, then asked exactly that. The agent calls two MCP tools and is done — no read_file, no grep.

Step 1 — search narrows the question to a few candidates
{
  "query": "how does OrderService validate an order before processing",
  "embedding_dim": 768,
  "results": [
    {
      "score": 0.455,
      "qualified_name": "OrderService::_validate",
      "file": "orders.py", "line_start": 115, "line_end": 124,
      "type": "method", "enrichment_state": "pending"
    },
    {
      "score": 0.397,
      "qualified_name": "OrderService",
      "file": "orders.py", "line_start": 49, "line_end": 124,
      "type": "class",
      "summary": "High-level orchestrator for placing and processing orders. Coordinates payment authorization (PaymentProcessor) and customer notification (NotificationService)."
    },
    {
      "score": 0.363,
      "qualified_name": "OrderService::place_order",
      "file": "orders.py", "line_start": 67, "line_end": 93,
      "summary": "Validate, charge, and confirm an order in one shot."
    }
  ]
}

The _validate method ranks #1. The class and place_order rank just below — useful context, not the answer.

Step 2 — get_chunk_context pulls the body, calls, and called_by without opening the file
{
  "found": true,
  "chunk": {
    "qualified_name": "OrderService::_validate",
    "file": "orders.py", "line_start": 115, "line_end": 124,
    "source": "def _validate(self, order: Order) -> None:\n    if order.is_empty():\n        raise ValueError(\"Order has no line items\")\n    if order.total_cents() < self.MIN_ORDER_CENTS:\n        raise ValueError(...)\n    if \"@\" not in order.customer_email:\n        raise ValueError(\"customer_email must be a valid email address\")",
    "calls": ["order.is_empty", "ValueError", "order.total_cents"],
    "called_by": [
      {"function": "place_order", "class_name": "OrderService", "file": "orders.py", "line": 67}
    ]
  }
}

That's the entire answer: three checks (non-empty, minimum total, email contains @), called once from place_order. Total cost: ~1.4 KB of JSON, two MCP calls, sub-second.

Compared to the grep-then-read alternative — open orders.py (3.5 KB), scan for validate, then trace into place_order to confirm the call — the MCP path uses roughly 40% of the tokens and skips file I/O entirely.


Two ways to enrich

Stage 2 (description / developer_queries vectors) needs an LLM. You pick how:

Option A — direct API

codelumen calls Claude / GPT / OpenRouter / Ollama itself.

codelumen enrich

Best for CI, batch jobs, or users without an AI editor. Set llm.provider + llm.<provider>.api_key in config.yaml.

Option B — your AI agent does it

Skip the API key. The agent in your editor (Claude in Cursor, Kiro, Cline, Continue, etc.) loops over pending chunks via the MCP tools and writes summaries itself. No codelumen-side LLM cost. Just say:

"enrich the codebase index"

The agent uses list_pending_enrichments → writes summary → save_enrichment → repeats until enrichment_progress reports zero pending. Uses the model + subscription you already pay for.


Portable enrichment — pay once, share, survive model changes

Enrichment (the LLM-written summaries + developer queries) is the only part of codelumen that costs tokens. codelumen persists that text to .codelumen/enrichment.jsonl in your repo, keyed by each chunk's content hash — separate from the vector index (which lives in ~/.codelumen/indexes/ and is disposable).

Commit .codelumen/enrichment.jsonl to git. Doing so gives you:

  • Share across a team — a teammate clones, runs codelumen index, and the enrichment is restored from the file and re-embedded locally. They pay zero LLM tokens for code that's already enriched.
  • Survive embedding-model changes — switch from a local model to Voyage/OpenAI, run codelumen index --reset, and your enrichment is re-embedded into the new vector space with no re-enrichment. (Without this, changing models meant re-paying for everything.)
  • Survive branch switches / reverts — old records are kept as a content-addressed cache, so checking out an older revision reuses its enrichment instantly.

It's content-addressed, so it's always safe: if a chunk's code changed, its hash misses and that chunk is simply re-enriched — stale summaries can never attach to changed code. Run codelumen compact to prune records no longer referenced by any code. The file holds only text and contains no secrets.

.codelumen/enrichment.jsonl   ← commit this (portable enrichment, ~text)
~/.codelumen/indexes/<id>/    ← never committed (vectors, machine-local)

OpenRouter — one key, every model

If you don't want to manage Anthropic + OpenAI + Google accounts separately, point codelumen at OpenRouter:

llm:
  provider: "openrouter"
  openrouter:
    api_key: "${OPENROUTER_API_KEY}"
    model: "anthropic/claude-sonnet-4.6"   # or openai/gpt-5.5, anthropic/claude-opus-4.7, etc.

Use any supported model slug. Same codelumen enrich and codelumen query commands as before.


Architecture

Three named dense vectors per chunk: code (always), description (post-enrichment), developer_queries (post-enrichment), plus a bm25 sparse vector. Pre-enrichment, search still works on code + BM25 alone.

Stage 1 (codelumen index) parses files with Tree-sitter, builds the call graph, computes a source hash, embeds the raw source into the code vector, and upserts to Qdrant. No LLM calls. Every chunk lands in state pending.

Stage 2 (codelumen enrich) drains the pending queue through an LLM, generating a logic_summary + 5–10 developer_queries per chunk, embedding them into the description and developer_queries vectors and mirroring the text to the portable .codelumen/enrichment.jsonl store. State flips to fresh. Trivial chunks get a free template; tests get pattern-skipped.


Configuration

codelumen uses one global config at ~/.codelumen/config.yaml (created with working defaults on first run) — the daemon loads it for every project. You can still point at an explicit file with $CODELUMEN_CONFIG. Environment-variable refs (${VAR}) resolve from ~/.codelumen/.env, a local .env.codelumen/.env, or the real environment. Indexes are stored centrally under ~/.codelumen/indexes/<id>/, one per project root — you don't set a path.

embeddings:
  provider: "local"
  local:
    model: "sentence-transformers/all-mpnet-base-v2"

llm:
  provider: "anthropic"
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    model: "claude-sonnet-4-6"

vector_db:
  provider: "qdrant"        # path is managed per-project by the daemon

enrichment:
  enabled: true
  version: 1
  llm_concurrency: 4
  skip_patterns: ["**/test_*.py", "**/*_test.go", "**/tests/**"]
  skip_trivial:
    enabled: true
    max_lines: 3

Supported languages

Python JavaScript TypeScript Java Go PHP C# Ruby Rust C++


Troubleshooting

Symptom Cause Fix
qdrant... already accessed by another instance Something opened an index directly while the daemon holds it The daemon is the sole index owner by design — use the CLI/MCP, don't open ~/.codelumen/indexes/* yourself. Restart with codelumen stop.
Search returns nothing on description / developer_queries Stage 2 hasn't run Run codelumen enrich, or use the MCP list_pending_enrichments flow.
Dimension mismatch on query Embedding model changed since last index Delete that project's dir under ~/.codelumen/indexes/ and re-codelumen index .
First command takes ~15 s Daemon is loading the model (one time, machine-wide) Normal on first use; every call after is ~80 ms. Run codelumen doctor to confirm it's warm.
Daemon won't start / commands hang Bad config or port in use Check ~/.codelumen/daemon.log; set CODELUMEN_DAEMON_PORT if 7711 is taken.
429 rate limit on Voyage / Anthropic Free tier, low limits Add billing or lower enrichment.llm_concurrency.

Status

Version 3.1.2
Python 3.11, 3.12
License MIT
Stage 1 (structural) Stable
MCP server Stable — main shipping surface
Stage 2 (enrichment) Beta — rate-limit-sensitive on free LLM tiers

Tune enrichment.llm_concurrency, or use the agent-driven enrichment flow if you hit rate limits.


Contributing

Issues and PRs welcome.

# Run tests
pytest

# Build a release
python -m build

See open issues for things to pick up.


Developed and designed by Ahmed Gamil

MIT licensed — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codelumen-3.1.2.tar.gz (87.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codelumen-3.1.2-py3-none-any.whl (101.5 kB view details)

Uploaded Python 3

File details

Details for the file codelumen-3.1.2.tar.gz.

File metadata

  • Download URL: codelumen-3.1.2.tar.gz
  • Upload date:
  • Size: 87.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codelumen-3.1.2.tar.gz
Algorithm Hash digest
SHA256 fac7e15b4d660e520c3aabfbc53f12b7a24b3945b077b538935f0fd358af957e
MD5 2f31cf6be0d9234f2ebd717afe26e4b3
BLAKE2b-256 87f42faf22192befe692590ae154cd72a73297361bd070cc1e35f212920b304d

See more details on using hashes here.

Provenance

The following attestation bundles were made for codelumen-3.1.2.tar.gz:

Publisher: publish.yml on AhmeedGamil/codelumen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codelumen-3.1.2-py3-none-any.whl.

File metadata

  • Download URL: codelumen-3.1.2-py3-none-any.whl
  • Upload date:
  • Size: 101.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codelumen-3.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 38d76e8cc605e362d470f88fee129fbe0456e7630679917b83c191cbe7841fb0
MD5 9240692448ab8f12a4f06f52be7d5aa6
BLAKE2b-256 fb4eaf4203269a895b1ac05bc1eaba582e0ac56e010d91680ded34a0d1e17edb

See more details on using hashes here.

Provenance

The following attestation bundles were made for codelumen-3.1.2-py3-none-any.whl:

Publisher: publish.yml on AhmeedGamil/codelumen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page