Free local semantic code search using Ollama
Project description
Quickstart · 30s demo · Performance · How it works · Releases · Docs
skygrep is a fully-offline semantic code-search CLI for natural-language
questions about your codebase. Ask in plain English, get the right file
and line range. Indexing, retrieval, and optional answer synthesis all
run locally against your own Ollama server. No remote service, no
subscription, no data leaves your machine.
In 30 seconds
$ pip install skylakegrep
$ ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b # one-time
$ cd ~/your-project
$ skygrep "where is the cascade tau threshold defined?"
=== skylakegrep/src/storage.py:578-602 (score: 0.781) ===
CASCADE_DEFAULT_TAU = 0.015
def cascade_search(...
[0.51s · cascade=cheap (gap=0.020 τ=0.015) · index 20s ago · 36 files · L2 symbols on · graph prior on]
That is the entire happy path. First query in a fresh project completes in under 1 s via a ripgrep fallback while a background process builds the semantic index. Every query after that uses the full cascade with a local LLM kept warm in memory.
Quickstart
pip install skylakegrep
ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b # ~3 GB total
# One-time: register skylakegrep with detected LLM CLIs
# (Claude Code / Codex / OpenCode / Gemini CLI / Cursor)
skygrep setup
cd /your/project
skygrep "<your question>"
skygrep doctor # verify runtime + models + index + integrations
skygrep stats # show current project's index info
skygrep derives the project root from git rev-parse --show-toplevel
(falling back to the working directory) and keeps a per-project index
under ~/.skylakegrep/repos/. Subcommand names (index, doctor,
stats, watch, serve, setup, enrich) take precedence — anything
else is treated as a query, so skygrep "stats and metrics" (quoted) is
unambiguous.
skygrep setup writes a small markdown snippet to each detected LLM
CLI's user-level instructions file (e.g. ~/.claude/CLAUDE.md,
~/.codex/AGENTS.md, ~/.gemini/GEMINI.md) telling the agent to
prefer skygrep for natural-language code search and fall back to rg
otherwise. Snippets are delimited by markers; skygrep setup --uninstall
removes them cleanly without touching your other instructions.
Performance
Public, reproducible benchmark on three popular open-source codebases.
30 hand-labelled questions total (10 per repo). Anyone can clone the
repos and re-run with one command — see
benchmarks/public_oss_bench.py
and docs/parity-benchmarks.md.
| Repo | Language | LOC ≈ | Tasks | skygrep recall | rg recall | Token reduction |
|---|---|---|---|---|---|---|
| Django | Python | 524 K | 10 | 10 / 10 | 10 / 10 | 703 × |
| Tokio | Rust | 80 K | 10 | 10 / 10 | 10 / 10 | 61 × |
| React | JS+TS | 270 K | 10 | 10 / 10 | 10 / 10 | 773 × |
| Aggregate | 30 | 30 / 30 (100 %) | 30 / 30 | 60×–770× |
Honest reading:
- Hit-rate parity across all three (10/10 each on Django,
Tokio, React). The two React misses that previously surfaced
(test-fixture path bias on
react-007, devtools-vs-reconciler onreact-010) were resolved by upgrading the embedding substrate tobge-m3and extending the non-canonical-path filter — seedocs/parity-benchmarks.mdfor the failure analysis and the resolution. rg"100 %" is a recall-ceiling baseline. It returns 20 M+ tokens per task (term-OR scan with 2-line context windows). Yes the answer is in the dump; no, the agent has to read the 20 M tokens to find it.- skygrep delivers the answer ranked top-10 in 30 / 30 cases while emitting 60×–770× less context for the agent's LLM round-trip downstream. That is the user-facing claim.
Recall counts a query as a hit when at least one returned chunk
matches the canonical expected path or any of the question's
expected_alternatives.
Cascade-only ablation (Django, in-bench numbers)
| Tier | What it does | Cold first query | Warm avg s/q |
|---|---|---|---|
| cascade ⭐ default | rg prefilter → file-mean cosine → escalate to HyDE only when uncertain | ~10 s (Ollama loads) | 0.5–2 s (warm) |
| cascade-cheap | early-exit only, no LLM call | <1 s | <0.2 s |
cascade + small HyDE (OLLAMA_HYDE_MODEL=qwen2.5:1.5b) |
uses 1.5 B for HyDE — faster, slightly lower recall | ~5 s | 2.0 s |
| chunk + rerank | classic chunk cosine + cross-encoder rerank | ~10 s + 30 s reranker load | ~10 s |
| ripgrep raw | rg -il -F token-OR (file membership only) |
<1 s | <1 s |
The cascade is bimodal by design: ~80 % of queries take the cheap path
(file-mean cosine, no LLM call) and complete in under 200 ms warm; the
remaining ~20 % escalate to a HyDE-augmented retrieval and complete in
the 1–2 s band. With Ollama models kept resident in memory
(OLLAMA_KEEP_ALIVE=-1, the 0.6.0 default) the second query in a shell
session no longer pays the 5–10 s Ollama cold-load.
A second self-test benchmark compares
skygrep against a simulated grep agent over 30 navigation tasks against
this repo: 30 / 30 recall at top-k 10 with 2× total-token reduction
and 2.9× context-token reduction vs the agent baseline.
How it works
your query
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ 1. ripgrep prefilter Fast surface-token narrowing │
│ 2. file-mean cosine Rank files by mean of chunk vectors │
│ 3. cascade decision Confident? return cheap. Else escalate│
│ 4. HyDE escalation LLM rewrites query → cosine union │
│ 5. symbol + graph Tree-sitter symbol boost + PageRank │
│ (L2 + L4) tiebreaker on near-tied candidates │
└─────────────────────────────────────────────────────────────────┘
│
▼
top-K chunks (path · line range · score · snippet)
Each layer is offline-paid and query-time-free where possible: embeddings are precomputed at index time, symbol extraction runs once per project, the file-export PageRank is one regex pass over the corpus. Only the cascade's HyDE-escalation path makes a query-time LLM call, and it only runs on the ~20 % of queries the cheap path is uncertain about.
The full architecture diagram and module-by-module walk-through is at
docs/skylakegrep-0.1.0.md and
docs/roadmap.md.
When to use what
| You want | Use |
|---|---|
| Find code by concept ("how does X work?") | skygrep "<query>" |
| Find code with a known token | rg <token> (it's faster, no setup) |
| Synthesize an answer with citations | skygrep "<query>" --answer |
| Decompose a broad question | skygrep "<query>" --agentic --max-subqueries 3 --answer |
| Machine-readable output for an agent | skygrep "<query>" --json |
| Re-rank candidates with a cross-encoder | skygrep "<query>" --no-cascade --rerank |
| Continuously index a watched dir | skygrep watch /path |
| Keep the cross-encoder warm across queries | skygrep serve & ; skygrep "<q>" --daemon-url http://127.0.0.1:7878 |
Configuration
| Variable | Default | Effect |
|---|---|---|
OLLAMA_URL |
http://localhost:11434 |
Ollama server URL. |
OLLAMA_EMBED_MODEL |
nomic-embed-text |
Embedding model. Switching requires skygrep index --reset. |
OLLAMA_LLM_MODEL |
qwen2.5:3b |
Used for --answer and --agentic. |
OLLAMA_HYDE_MODEL |
qwen2.5:3b |
Used for cascade-escalation HyDE. Falls back to OLLAMA_LLM_MODEL if not installed. Set to qwen2.5:1.5b for ~30 % speedup at the cost of 1 task on 16-task Rust. |
OLLAMA_KEEP_ALIVE |
-1 |
Passed to every Ollama call. -1 keeps models resident indefinitely (recommended). |
SKYGREP_DB_PATH |
per-project | When set, skygrep treats the index as curated and disables auto-mutation. |
SKYGREP_AUTO_PULL |
unset | Set yes to auto-ollama pull missing models without prompting. |
SKYGREP_AUTO_REFRESH_THROTTLE_SECONDS |
30 |
Skip the mtime scan if the previous refresh ran more recently. |
SKYGREP_RERANK_MODEL |
mixedbread-ai/mxbai-rerank-large-v2 |
Cross-encoder for --rerank. |
SKYGREP_RERANK_POOL |
50 |
Candidate pool before reranking. |
Releases
This is the first public release of skylakegrep. See
docs/skylakegrep-0.1.0.md for the full
description of capabilities, architecture, CLI flags, and
environment variables.
- 0.1.0 — first public release. LLM-driven query routing
(filename / lexical / semantic cascade tiers, intent-aware
merge), confidence-gated semantic cascade with HyDE escalation +
cross-encoder rerank + PageRank tiebreaker, lazy PDF / docx
content extraction (
pdftotext+pypdffallback, optional--ocr), framed Pygments-highlighted card rendering, four--detaillevels,skygrep setupauto-registration with major LLM CLIs (Claude Code, Codex, OpenCode, Gemini CLI, Cursor). PolyForm Noncommercial 1.0.0 license.
CLI reference
skygrep setup [--list|--uninstall|--yes] # register with Claude Code / Codex / OpenCode / Gemini / Cursor
skygrep "<query>" [OPTIONS] # bare-form search
skygrep search "<query>" [OPTIONS] # explicit search
skygrep doctor # health check
skygrep stats # project index info
skygrep index [PATH] [--reset] # explicit reindex
skygrep watch [PATH] --interval N # poll for changes
skygrep serve [--host H] [--port P] # warm-reranker daemon
skygrep enrich [--max N] [--batch B] # opt-in doc2query enrichment
skygrep search options
| Option | Default | Effect |
|---|---|---|
-m, -n, --top |
5 | Number of final results. |
--json |
off | Emit a JSON array; suppresses human formatting. |
--answer |
off | Synthesize an answer from retrieved snippets via Ollama. |
--content / --no-content |
on | Show or hide snippet bodies in human output. |
--language |
— | Restrict to one or more language keys; repeatable. |
--include / --exclude |
— | Glob filter (repeatable). |
--cascade / --no-cascade |
on | Confidence-gated retrieval. Off = chunk-only legacy path. |
--cascade-tau |
0.015 | Top-1 / top-2 file-mean cosine gap above which to early-exit. |
--rerank / --no-rerank |
on | Cross-encoder rerank on the non-cascade path. |
--rerank-pool |
50 | Candidate pool before reranking. |
--rerank-model |
env or default | HuggingFace cross-encoder id. |
--hyde / --no-hyde |
off | Force HyDE outside the cascade (rare; cascade decides per query). |
--multi-resolution / --no-multi-resolution |
on | File-level cosine top-N → chunk-level inside those files. |
--file-top |
30 | Files surfaced by file-level retrieval. |
--lexical-prefilter / --no-lexical-prefilter |
on | Use ripgrep to narrow the candidate file set. |
--lexical-root |
cwd / git toplevel | Root directory ripgrep scans. |
--lexical-min-candidates |
2 | Fall back to corpus-wide cosine when ripgrep returns fewer files. |
--rank-by |
chunk |
chunk (per-file diversity cap) or file (one chunk per file). |
--auto-index / --no-auto-index |
on | Auto-build the index on first query and refresh on mtime change. |
--daemon-url |
— | Send the search to a running skygrep serve daemon. |
--agentic |
off | Decompose into subqueries via Ollama before search. |
--max-subqueries |
3 | Upper bound on agentic subqueries. |
--semantic-only |
off | Skip lexical reranking; rank by cosine alone. |
Capability matrix (every feature, when introduced)
| Capability | Since |
|---|---|
| Semantic code search via local Ollama | 0.1.0 |
| Tree-sitter chunking + line-window fallback | 0.2.0 |
.gitignore / .skygrepignore hygiene |
0.2.0 |
| Incremental indexing (mtime-based) | 0.2.0 |
| Stale row cleanup | 0.2.0 |
| Watch mode | 0.2.0 |
| Hybrid lexical + semantic ranking | 0.2.0 |
| Stable JSON output | 0.2.0 |
| Local answer mode | 0.2.0 |
| Local agentic decomposition | 0.2.0 |
| Cross-encoder rerank | 0.3.0 |
| Asymmetric query/document embedding prefixes | 0.3.0 |
| HyDE query rewriting | 0.3.0 |
| Multi-resolution retrieval | 0.3.0 |
| Lexical prefilter (ripgrep first stage) | 0.3.0 |
| File-rank (one chunk per file) | 0.3.0 |
| Daemon mode | 0.3.0 |
| Quantisation / device knobs | 0.3.0 |
| Confidence-gated cascade | 0.3.0 (default in 0.4.0) |
Bare-form invocation skygrep "<q>" |
0.4.0 |
| Per-project auto-index | 0.4.0 |
skygrep doctor health check |
0.4.0 |
| Ripgrep fallback for first query | 0.4.1 |
| Symbol-aware indexing (L2) | 0.5.0 |
doc2query enrichment (L3, opt-in via skygrep enrich) |
0.5.0 |
| File-export PageRank tiebreaker (L4) | 0.5.0 |
| Cascade file-mean cosine corpus-wide | 0.5.1 |
Smaller default HyDE model + keep_alive=-1 |
0.6.0 |
Development
git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[rerank]"
.venv/bin/pytest -q tests/
.venv/bin/python benchmarks/agent_context_benchmark.py --top-k 10 --summary-only
To reproduce the public OSS benchmark numbers, follow the instructions
in docs/parity-benchmarks.md.
License
PolyForm Noncommercial 1.0.0 — see LICENSE.
Personal, academic, research, hobby, and any other non-commercial use is fully permitted, including modification and redistribution. Commercial use is NOT permitted under this license. To obtain a commercial license, open a GitHub issue titled "Commercial license inquiry" or email chentianchi@gmail.com.
Acknowledgments
- Ollama for the local embedding and generation runtime.
- tree-sitter for syntax-aware parsing.
- ripgrep for the lexical prefilter stage.
- Mixedbread for the open-source
mxbai-rerank-*-v2cross-encoder family. - nomic-embed-text for the embedding model.
- Click, NumPy, and SQLite for the core runtime dependencies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skylakegrep-0.2.0.tar.gz.
File metadata
- Download URL: skylakegrep-0.2.0.tar.gz
- Upload date:
- Size: 129.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f5e9ffdd6d7a29a84321a30192e710c2e3466c41c2d54ce4ea74c6323a25386
|
|
| MD5 |
370350c112ba5ec17aaeb74e84dc8fc9
|
|
| BLAKE2b-256 |
dcd31656b611457169497e23e40ab29f78a4d662f94c39c460780ee44674e15f
|
File details
Details for the file skylakegrep-0.2.0-py3-none-any.whl.
File metadata
- Download URL: skylakegrep-0.2.0-py3-none-any.whl
- Upload date:
- Size: 110.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7901b90e2071ce853664cde9087e1ea436f4662b9c0d16faf4c65560b9c029d
|
|
| MD5 |
4746f143c1bbc642f4613b77c5229359
|
|
| BLAKE2b-256 |
a327402ff92d16dbd29d907606ed3273e3086c40d7e6ed36955c08cbcf76848d
|