Semantic code search with graph-aware context retrieval, built on treeloom
Project description
greploom
Semantic code search with graph-aware context retrieval, built on treeloom.
greploom reads a treeloom Code Property Graph (CPG JSON), indexes it for hybrid search (vector embeddings + BM25), and returns structurally-complete context neighborhoods for LLM consumption. Vector search finds the right neighborhood; graph traversal expands it to include callers, callees, imports, and data flow sources.
Installation
pip install greploom # Core — CLI and search engine
pip install greploom[mcp] # Adds MCP server (requires fastmcp)
The default embedding model is nomic-embed-text via a local Ollama instance. Any OpenAI-compatible embedding endpoint works via GREPLOOM_EMBEDDING_URL.
Quick Start
# 1. Build a CPG with treeloom
treeloom build src/ -o cpg.json
# 2. Index for search (creates .greploom/index.db)
greploom index cpg.json
# 3. Search
greploom query "where is authentication handled?"
How It Works
Source code
|
v
treeloom build --> CPG (JSON)
|
v
greploom index --> vector store + BM25 index (SQLite)
|
v
greploom query "how is auth handled?" --> context bundle
|
v
LLM agent receives focused, graph-aware context
Storage is a single SQLite file using sqlite-vec for vectors and FTS5 for BM25. No server required, no Docker, portable and inspectable.
CLI Reference
greploom index
Build or update the search index from a treeloom CPG JSON file.
greploom index CPG_JSON [OPTIONS]
Arguments:
CPG_JSON Path to the treeloom CPG JSON file
Options:
--db PATH SQLite database path (default: .greploom/index.db)
--tier [fast|enhanced] Summary tier (default: enhanced)
--model TEXT Embedding model name
--ollama-url URL Ollama server URL
--force Re-index all nodes, ignoring content hashes
Re-indexing is incremental by default — only nodes whose content has changed are re-embedded. Use --force to rebuild from scratch.
Summary tiers:
fast— function signatures only; fastest to buildenhanced— signatures, parameters, callees, decorators, and class methods; better recall
# Index with defaults
greploom index cpg.json
# Use a custom database path and force full re-index
greploom index cpg.json --db /tmp/myproject.db --force
# Point at a non-default Ollama instance
greploom index cpg.json --ollama-url http://gpu-box:11434
greploom query
Search the index and return graph-aware context.
greploom query [QUERY_TEXT] [OPTIONS]
Arguments:
QUERY_TEXT Natural language or symbol query (mutually exclusive with --node)
Options:
--db PATH SQLite database path (default: .greploom/index.db)
--cpg PATH CPG JSON path for graph expansion
--budget INT Token budget (default: 8192)
--top-k INT Number of search results (default: 5)
--format [context|json] Output format (default: context)
--model TEXT Embedding model name
--ollama-url URL Ollama server URL
--node NODE_ID CPG node ID for direct lookup (repeatable; bypasses search)
Without --cpg, the query returns ranked search hits with scores and summaries. With --cpg, hits are expanded through the graph and assembled into a context bundle trimmed to the token budget. When --format json is used with text search and --cpg, the output is {"metadata": {...}, "blocks": [...]} including index metadata. The --node JSON output is a bare list of blocks (no metadata envelope, since the index is not consulted).
Use --node to retrieve context for known CPG node IDs directly, bypassing the search step entirely. Requires --cpg.
# Simple search — ranked hits with summaries
greploom query "user authentication"
# Full graph-expanded context, ready for an LLM
greploom query "where is authentication handled?" --cpg cpg.json
# JSON output with index metadata
greploom query "UserService" --cpg cpg.json --format json | jq '.metadata'
# Direct lookup by CPG node ID (no index required)
greploom query --node "function:src/auth.py:10:0:3" --cpg cpg.json
# Narrow token budget for smaller context windows
greploom query "error handling" --cpg cpg.json --budget 4096
greploom serve
Start the MCP server.
greploom serve [OPTIONS]
Options:
--db PATH SQLite database path
--cpg PATH Default CPG JSON path
--host TEXT Host to bind (default: 0.0.0.0)
--port INT Port to listen on (default: 8901)
--transport [streamable-http|stdio] MCP transport (default: streamable-http)
# Start the MCP server on default port 8901
greploom serve --db .greploom/index.db --cpg cpg.json
# stdio transport for direct agent integration
greploom serve --transport stdio
MCP Server
The MCP server exposes three tools:
search_code — Search code semantically and return graph-aware context.
Parameters: query (required), cpg_path (required), db_path, budget, top_k
get_node_context — Return graph-aware context for specific CPG node IDs, bypassing search.
Parameters: node_ids (required), cpg_path (required), budget
index_code — Build or update the search index from a CPG JSON file.
Parameters: cpg_path (required), db_path, tier
Example MCP server URL for agent configuration: http://localhost:8901/mcp
Configuration
All settings can be provided via environment variables. CLI flags override environment variables for individual commands.
| Variable | Default | Description |
|---|---|---|
GREPLOOM_EMBEDDING_URL |
http://localhost:11434 |
Ollama or OpenAI-compatible endpoint |
GREPLOOM_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name |
GREPLOOM_DB_PATH |
.greploom/index.db |
SQLite database path |
GREPLOOM_TOKEN_BUDGET |
8192 |
Default token budget for context assembly |
GREPLOOM_SUMMARY_TIER |
enhanced |
Summary tier (fast or enhanced) |
To use an OpenAI-compatible embedding API instead of Ollama:
export GREPLOOM_EMBEDDING_URL=https://api.openai.com/v1
export GREPLOOM_EMBEDDING_MODEL=text-embedding-3-small
Relationship to treeloom
greploom reads treeloom's CPG JSON format but does not import treeloom at runtime. greploom index reads the CPG JSON to build the search index; greploom query reads both the index and the CPG JSON for graph expansion. Any tool that produces treeloom-compatible CPG JSON will work.
greploom's query output includes structural summaries and graph context (callers, callees, parameters), not raw source code. Source text inclusion is a treeloom CPG concern — when treeloom adds source spans to CPG nodes, greploom will surface them automatically.
Changelog
Version 0.2.0
--nodemode forgreploom query: retrieve graph context for known CPG node IDs without running a search query.- Index metadata: embedding model, greploom version, and timestamps are stored in the index and surfaced in JSON output (text search with
--cpg --format jsonwraps results in{"metadata": ..., "blocks": ...};--nodeJSON remains a bare list). - MCP server: added
get_node_contexttool (direct node ID lookup, parallel to--nodein the CLI).
Version 0.1.0
Initial release — full indexing pipeline (summarize, embed, store), hybrid search with RRF, graph expansion for context neighborhoods, token budget management, CLI (index/query/serve), and MCP server with search_code/index_code tools.
LLM Documentation
This project provides llms.txt and llms-full.txt files following the llmstxt.org specification for LLM-friendly documentation.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file greploom-0.2.0.tar.gz.
File metadata
- Download URL: greploom-0.2.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d01adb02d0e653ca8ce1645c0f70d61aaf3f3b88640204f68e6c50fe9578dcab
|
|
| MD5 |
63a2ca78bf2152771cd255e8fa2fe604
|
|
| BLAKE2b-256 |
19b02867a7fb8342b65aa7ae905c6315c66c405e975d7181889048d8bfb421a6
|
Provenance
The following attestation bundles were made for greploom-0.2.0.tar.gz:
Publisher:
release.yml on rdwj/greploom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
greploom-0.2.0.tar.gz -
Subject digest:
d01adb02d0e653ca8ce1645c0f70d61aaf3f3b88640204f68e6c50fe9578dcab - Sigstore transparency entry: 1266693826
- Sigstore integration time:
-
Permalink:
rdwj/greploom@993019c81fc4051826f7e37d7819208c658859c1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rdwj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@993019c81fc4051826f7e37d7819208c658859c1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file greploom-0.2.0-py3-none-any.whl.
File metadata
- Download URL: greploom-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ea361ac88db7f0d660f465efb807c1d869a4350a0a844f6b1dbacea63a0dbeb
|
|
| MD5 |
fcb01da0ac5e39f12917a0c4c2839c3e
|
|
| BLAKE2b-256 |
e0bd9aadac3e977f2c401c4c973f060a87899a3412254acb53ee5f15e1f768fe
|
Provenance
The following attestation bundles were made for greploom-0.2.0-py3-none-any.whl:
Publisher:
release.yml on rdwj/greploom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
greploom-0.2.0-py3-none-any.whl -
Subject digest:
2ea361ac88db7f0d660f465efb807c1d869a4350a0a844f6b1dbacea63a0dbeb - Sigstore transparency entry: 1266693970
- Sigstore integration time:
-
Permalink:
rdwj/greploom@993019c81fc4051826f7e37d7819208c658859c1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rdwj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@993019c81fc4051826f7e37d7819208c658859c1 -
Trigger Event:
push
-
Statement type: