Local-first code graph builder with 5-signal hybrid search for AI coding agents
Project description
codeloom
"With codeloom, your coding agent knows what to read."
Quick Start · 한국어 · 日本語 · 中文 · Deutsch
Why codeloom?
raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki - Andrej Karpathy
codeloom builds a queryable code graph and knowledge base from codebases with 10,000+ files and knowledge documents, powered by lightweight local LLM models. Hybrid vector + keyword search with subgraph response (vector + keyword → RRF fusion with MST subgraph) lets coding agents truly understand your entire project, not just search keywords. Install it, and Claude Code sees the full picture — no extra tokens, no extra commands, everything runs 100% locally.
Quick Start
pip install codeloom
cd your-project/
codeloom opencode install # for OpenCode
# or: codeloom claude install # for Claude Code
Then tell Claude Code or OpenCode:
"Build a code graph for this project"
That's it. Your agent will build the graph, and from then on, consult it before every search. The graph auto-rebuilds when your session ends.
AI Agent Integrations
codeloom integrates with major AI coding agents in one command:
| Agent | Install | What it does |
|---|---|---|
| Claude Code | codeloom claude install |
Skill + CLAUDE.md + PreToolUse hook |
| OpenCode | codeloom opencode install |
Skill in .opencode/skills/ |
| Codex CLI | codeloom codex install |
AGENTS.md + PreToolUse hook |
| Gemini CLI | codeloom gemini install |
GEMINI.md + BeforeTool hook |
| Cursor IDE | codeloom cursor install |
.cursor/rules/ rule file |
| Windsurf IDE | codeloom windsurf install |
.windsurf/rules/ rule file |
| Cline | codeloom cline install |
.clinerules file |
| Aider CLI | codeloom aider install |
CONVENTIONS.md + .aider.conf.yml |
| MCP Server | claude mcp add codeloom -- codeloom mcp |
5 tools over Model Context Protocol |
Each install does two things: writes a context file with rules, and (where supported) registers a hook that fires before tool calls. To remove: codeloom <platform> uninstall.
Supported Languages
Structural Extraction (20+ languages)
codeloom extracts functions, classes, methods, calls, imports, and inheritance from source code using tree-sitter and native parsers.
| Python | JavaScript | TypeScript | Go |
| Rust | Java | C | C++ |
| C# | Ruby | Swift | Scala |
| Lua | PHP | Elixir | Kotlin |
| Objective-C | Terraform/HCL |
Also extracts structure from config and document formats: YAML, JSON, TOML, Markdown, PDF, HTML, CSV, Shell, R, and more.
Multilingual Natural Language
Text nodes (docs, comments, markdown) are embedded with intfloat/multilingual-e5-small supporting 100+ natural languages — Korean, Japanese, Chinese, German, French, and more. Search in your language, find results in any language.
Features
Auto-Rebuild
When integrated with AI coding agents (Claude Code, Codex, etc.), codeloom automatically rebuilds the graph when code changes. The Stop/SessionEnd hook detects modified files via git diff and triggers an incremental rebuild in the background — zero manual intervention.
Smart Ignore
codeloom respects ignore patterns from three sources, all using full gitignore spec (negation !, ** globs, directory-only patterns):
| Source | Description |
|---|---|
| Built-in | .git, node_modules, __pycache__, dist, build, etc. |
.gitignore |
Auto-read from project root — your existing git ignores just work |
.codeloom-ignore |
Project-specific overrides for the code graph |
Incremental Builds
SHA-256 content hashing per file. Only changed files are re-extracted and re-embedded. Unchanged files are merged from the existing graph — typically 95%+ faster than a full rebuild.
Memory Management
4GB memory budget with stage-wise release. The pipeline generates → stores → frees at each stage: extraction results are freed after graph build, embeddings are streamed in batches and freed after DB write, and the full graph is released after persistence. GC triggers proactively at 75% threshold.
100% Local
No cloud services, no API keys, no telemetry. SQLite + FAISS for storage, sentence-transformers for embeddings. All data stays on your machine.
Hybrid Search with Subgraph Response
Every query returns seed nodes and a subgraph showing how they connect:
Search Pipeline
| Signal | What it finds |
|---|---|
| Vector Search | Semantically similar code and documents (dual-model: code + text) |
| Keyword Search | Exact name matches via FTS5 (BM25) |
Results are fused via Weighted Reciprocal Rank Fusion (RRF), then connected through MST-based shortest paths to reveal how seed nodes relate.
Response Format
seeds:
codeloom/core/pipeline.py:71
codeloom/query/embeddings.py:70
edges:
codeloom/core/pipeline.py:71 -calls-> codeloom/core/extract.py:747
codeloom/core/pipeline.py:0 -co_change-> codeloom/query/embeddings.py:0
seeds: Node IDs (file:line) found by searchedges: Subgraph connecting seeds through shortest paths (intermediate nodes appear in edges)
CLI Reference
All commands output compact text by default (designed for AI agent consumption).
| Command | Description |
|---|---|
build <dir> |
Build code graph (--incremental) |
search <query> |
Hybrid vector + keyword search with subgraph (--top-k, --fast) |
search-vector <query> |
Vector similarity only (code + text dual model) |
search-keyword <query> |
FTS5 keyword matching only (BM25 ranking) |
query |
Interactive search REPL |
communities |
List and search communities (--search, --level) |
stats |
Graph statistics |
node <id> |
Node details with fuzzy matching |
export |
Export as JSON, GraphML, or D3.js |
visualize |
Interactive HTML visualization |
clean |
Remove .codeloom/ database |
doctor |
Check installation health |
mcp |
Start MCP server (stdio) |
claude install|uninstall |
Manage Claude Code integration |
codex install|uninstall |
Manage Codex CLI integration |
gemini install|uninstall |
Manage Gemini CLI integration |
cursor install|uninstall |
Manage Cursor IDE integration |
windsurf install|uninstall |
Manage Windsurf IDE integration |
cline install|uninstall |
Manage Cline integration |
aider install|uninstall |
Manage Aider CLI integration |
opencode install|uninstall |
Manage OpenCode integration |
Performance
Benchmarks on codeloom's own codebase (~3,500 lines, 90 files, 1,300 nodes):
| Operation | Time |
|---|---|
| Full build | ~14s |
| Incremental (changes) | ~4s |
| Incremental (no changes) | ~0.4s |
| Cold search (dual model) | ~2.8s |
Cold search (--fast) |
~0.2s |
| Warm search | ~0.08s |
| Cached search | <1ms |
- Embedding models: ~180MB, downloaded once to
~/.codeloom/models/ - Database: ~2MB (SQLite + FTS5 + FAISS indices)
- Incremental builds: SHA-256 hashing, 95%+ faster than full rebuild
Requirements
- Python 3.10+
- ~180MB disk for embedding models (cached on first use)
# Optional: PDF extraction
pip install codeloom[docs]
Development
pip install -e ".[dev]"
pytest
ruff check codeloom/
License
MIT License. See LICENSE for details.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codeloom-0.1.0.tar.gz.
File metadata
- Download URL: codeloom-0.1.0.tar.gz
- Upload date:
- Size: 408.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bac59a33ce7d5f94a0cb0a97a17e0c956e9fc0145235c88aaaeca9113672d45
|
|
| MD5 |
1ad59765eeb28fd594fefbc5cfd1f7bc
|
|
| BLAKE2b-256 |
c0eb01718ccbdcd55ea993e55fd80e7c43cd0dd98ba3b945cc7df06d08ca62f8
|
Provenance
The following attestation bundles were made for codeloom-0.1.0.tar.gz:
Publisher:
release.yml on algodesigner/codeloom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codeloom-0.1.0.tar.gz -
Subject digest:
9bac59a33ce7d5f94a0cb0a97a17e0c956e9fc0145235c88aaaeca9113672d45 - Sigstore transparency entry: 1552732352
- Sigstore integration time:
-
Permalink:
algodesigner/codeloom@01d52166ecccda4d5df046974541c383746d02b2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/algodesigner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01d52166ecccda4d5df046974541c383746d02b2 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file codeloom-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codeloom-0.1.0-py3-none-any.whl
- Upload date:
- Size: 193.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e50659d6247de900f0ce5eb7803e0fcdbbf653aa3dc7cefb5f442bce5f4d2b72
|
|
| MD5 |
42ce42bd8fa0164e9f4ecc6d2ed623b6
|
|
| BLAKE2b-256 |
350c51bb68544c2a6ff4079e26670d437bcdf7800183197810be0ab07c7d867a
|
Provenance
The following attestation bundles were made for codeloom-0.1.0-py3-none-any.whl:
Publisher:
release.yml on algodesigner/codeloom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codeloom-0.1.0-py3-none-any.whl -
Subject digest:
e50659d6247de900f0ce5eb7803e0fcdbbf653aa3dc7cefb5f442bce5f4d2b72 - Sigstore transparency entry: 1552732359
- Sigstore integration time:
-
Permalink:
algodesigner/codeloom@01d52166ecccda4d5df046974541c383746d02b2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/algodesigner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01d52166ecccda4d5df046974541c383746d02b2 -
Trigger Event:
workflow_dispatch
-
Statement type: