Skip to main content

Free local semantic code search using Ollama

Project description

skylakegrep — semantic code search over a local index

PyPI Python 3.9+ PolyForm Noncommercial 1.0.0 Documentation Latest release

Quickstart  ·  30s demo  ·  Performance  ·  How it works  ·  Releases  ·  Docs


skygrep is a fully-offline semantic code-search CLI for natural-language questions about your codebase. Ask in plain English, get the right file and line range. Indexing, retrieval, and optional answer synthesis all run locally against your own Ollama server. No remote service, no subscription, no data leaves your machine.

In 30 seconds

$ pip install skylakegrep
$ ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b   # one-time
$ cd ~/your-project

$ skygrep "where is the cascade tau threshold defined?"
=== skylakegrep/src/storage.py:578-602 (score: 0.781) ===
CASCADE_DEFAULT_TAU = 0.015

def cascade_search(...
[0.51s · cascade=cheap (gap=0.020 τ=0.015) · index 20s ago · 36 files · L2 symbols on · graph prior on]

That is the entire happy path. First query in a fresh project completes in under 1 s via a ripgrep fallback while a background process builds the semantic index. Every query after that uses the full cascade with a local LLM kept warm in memory.

Quickstart

pip install skylakegrep
ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b   # ~3 GB total

# One-time: register skylakegrep with detected LLM CLIs
# (Claude Code / Codex / OpenCode / Gemini CLI / Cursor)
skygrep setup

cd /your/project
skygrep "<your question>"
skygrep doctor                # verify runtime + models + index + integrations
skygrep stats                 # show current project's index info

skygrep derives the project root from git rev-parse --show-toplevel (falling back to the working directory) and keeps a per-project index under ~/.skylakegrep/repos/. Subcommand names (index, doctor, stats, watch, serve, setup, enrich) take precedence — anything else is treated as a query, so skygrep "stats and metrics" (quoted) is unambiguous.

skygrep setup writes a small markdown snippet to each detected LLM CLI's user-level instructions file (e.g. ~/.claude/CLAUDE.md, ~/.codex/AGENTS.md, ~/.gemini/GEMINI.md) telling the agent to prefer skygrep for natural-language code search and fall back to rg otherwise. Snippets are delimited by markers; skygrep setup --uninstall removes them cleanly without touching your other instructions.

Performance

Measured on a Mac M-series CPU, no GPU. Three repositories, three languages, 40 hand-labelled questions:

Repo Language Tasks Recall Avg s/q
repo-A Rust 16 16 / 16 4.17
repo-B Python 12 11 / 12 2.45
repo-C TypeScript 12 11 / 12 3.83
Aggregate 40 38 / 40 (95 %) 3.55

Reproducible runner at benchmarks/v0_7_multilang_bench.py. Per-repo breakdown, per-tier comparison (cascade only / +L2 / +L4 / full default), and the two honest misses are documented in docs/parity-benchmarks.md.

Recall counts a query as a hit when at least one of the top-10 returned chunks matches the canonical answer dirs (or any listed expected_alternatives).

Tier breakdown (repo-A 16-task)

Tier What it does Recall Cold first query Warm avg s/q
cascade ⭐ default rg prefilter → file-mean cosine → escalate to HyDE only when uncertain 16 / 16 ~10 s (Ollama loads) 4.2 s
cascade-cheap early-exit only, no LLM call 11 / 16 <1 s <0.2 s
cascade + small HyDE (OLLAMA_HYDE_MODEL=qwen2.5:1.5b) uses 1.5 B for HyDE — faster, slightly lower recall 15 / 16 ~5 s 2.0 s
chunk + rerank classic chunk cosine + cross-encoder rerank 11 / 16 ~10 s + 30 s reranker load ~10 s
ripgrep raw rg -il -F token-OR (file membership only) 16 / 16 <1 s <1 s

The cascade is bimodal by design: ~80 % of queries take the cheap path (file-mean cosine, no LLM call) and complete in under 200 ms warm; the remaining ~20 % escalate to a HyDE-augmented retrieval and complete in the 1–2 s band. With Ollama models kept resident in memory (OLLAMA_KEEP_ALIVE=-1, the 0.6.0 default) the second query in a shell session no longer pays the 5–10 s Ollama cold-load.

A second self-test benchmark compares skygrep against a simulated grep agent over 30 navigation tasks against this repo: 30 / 30 recall at top-k 10 with 2× total-token reduction and 2.9× context-token reduction vs the agent baseline.

How it works

your query
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│ 1.  ripgrep prefilter      Fast surface-token narrowing          │
│ 2.  file-mean cosine       Rank files by mean of chunk vectors   │
│ 3.  cascade decision       Confident? return cheap. Else escalate│
│ 4.  HyDE escalation        LLM rewrites query → cosine union     │
│ 5.  symbol + graph         Tree-sitter symbol boost + PageRank   │
│     (L2 + L4)              tiebreaker on near-tied candidates    │
└─────────────────────────────────────────────────────────────────┘
    │
    ▼
top-K chunks (path · line range · score · snippet)

Each layer is offline-paid and query-time-free where possible: embeddings are precomputed at index time, symbol extraction runs once per project, the file-export PageRank is one regex pass over the corpus. Only the cascade's HyDE-escalation path makes a query-time LLM call, and it only runs on the ~20 % of queries the cheap path is uncertain about.

The full architecture diagram and module-by-module walk-through is at docs/skylakegrep-0.6.0.md and docs/roadmap.md.

When to use what

You want Use
Find code by concept ("how does X work?") skygrep "<query>"
Find code with a known token rg <token> (it's faster, no setup)
Synthesize an answer with citations skygrep "<query>" --answer
Decompose a broad question skygrep "<query>" --agentic --max-subqueries 3 --answer
Machine-readable output for an agent skygrep "<query>" --json
Re-rank candidates with a cross-encoder skygrep "<query>" --no-cascade --rerank
Continuously index a watched dir skygrep watch /path
Keep the cross-encoder warm across queries skygrep serve & ; skygrep "<q>" --daemon-url http://127.0.0.1:7878

Configuration

Variable Default Effect
OLLAMA_URL http://localhost:11434 Ollama server URL.
OLLAMA_EMBED_MODEL nomic-embed-text Embedding model. Switching requires skygrep index --reset.
OLLAMA_LLM_MODEL qwen2.5:3b Used for --answer and --agentic.
OLLAMA_HYDE_MODEL qwen2.5:3b Used for cascade-escalation HyDE. Falls back to OLLAMA_LLM_MODEL if not installed. Set to qwen2.5:1.5b for ~30 % speedup at the cost of 1 task on repo-A 16-task.
OLLAMA_KEEP_ALIVE -1 Passed to every Ollama call. -1 keeps models resident indefinitely (recommended).
SKYGREP_DB_PATH per-project When set, skygrep treats the index as curated and disables auto-mutation.
SKYGREP_AUTO_PULL unset Set yes to auto-ollama pull missing models without prompting.
SKYGREP_AUTO_REFRESH_THROTTLE_SECONDS 30 Skip the mtime scan if the previous refresh ran more recently.
SKYGREP_RERANK_MODEL mixedbread-ai/mxbai-rerank-large-v2 Cross-encoder for --rerank.
SKYGREP_RERANK_POOL 50 Candidate pool before reranking.

Releases

Each release ships with comprehensive notes covering the architecture change, benchmark deltas, compatibility notes, and download artifacts. Browse them at https://github.com/danielchen26/skylakegrep/releases — the latest is also installable from PyPI.

The full sequence so far:

  • 0.15.1 — fix: editor/app session-lock and swap files (~$*.docx, *.swp, .#*, *~) no longer leak into filename-lookup results. Two-layer fix: find filter at source
    • lock-file detection in extract_docx with friendly hint. Resolves Word ~$pert Letter ....docx showing up with the cryptic "Package not found" error.
  • 0.15.0 — LLM-driven query routing replaces hand-rolled heuristics as the primary source of routing decisions. A small local Ollama model (qwen2.5:3b) reads each query and returns structured {intent, primary_token, skip_cascade, extract_content, confidence} JSON. Three-layer fallback chain (LLM → v0.14.0 rules → mixed) keeps the CLI working air-gapped or when Ollama is down. New binary_extract.py extracts PDF/docx content inline for filename matches; --detail=brief|standard|full|summary and --ocr (opt-in tesseract) round out the verbosity story. Confidence threshold (0.7) protects accuracy: an unsure LLM never skips the cascade. Filename queries return to ~150 ms.
  • 0.14.0 — hierarchical merge: every enabled tier (filename / lexical / semantic cascade) always runs, results dedupe by path, and classify_intent(query) decides which tier wins top slots. A query like where is config file returns the config.py file and the cascade chunks discussing config loading in one pass. Latency note: every query now pays cascade cost; pass --no-cascade to opt out.
  • 0.13.0 — three-tier smart routing + framed card rendering. New filename-lookup tier (~10 ms) handles where is eb1b file? / find package.json / show me README queries via find -iname — no index needed. Result rendering rewritten to proper rounded card frames with Pygments-driven syntax highlighting (300+ languages) tuned to the website hero. --filename-shortcut/--no-filename-shortcut flag; pygments>=2.0 is now a hard dependency.
  • 0.12.1 — terminal output rework: cyan repo-relative paths, right-aligned language pill + bold-green score, dim separator rule, lightweight ANSI syntax highlighting on the code body. Visually aligned with the landing-page hero. --json and pipe/redirect behaviour unchanged; NO_COLOR=1 opt-out honoured. No retrieval-pipeline change.
  • 0.12.0 — smart-routing: a four-condition lexical pre-gate short-circuits ripgrep-friendly queries (~50 ms) so calling skygrep is no longer ever a tax over rg for the easy cases. Vocabulary-mismatch queries still run the full semantic cascade. New --rg-shortcut/--no-rg-shortcut flag (default on); skygrep setup snippet rewritten to reflect auto-routing.
  • 0.11.0skygrep setup auto-registers skylakegrep as the preferred semantic search with Claude Code, Codex, OpenCode, Gemini CLI, and Cursor. First-run banner nudges new users; skygrep setup --uninstall removes all snippets cleanly. New skygrep doctor row shows registration state per CLI.
  • 0.10.0 — multi-turn agent benchmark + single-turn sample expanded to 20 tasks. −82 % tool calls in multi-turn repo-A session, −37.6 % across 20 single-turn tasks. On 5 / 6 medium-difficulty single-turn tasks, skygrep finds the canonical file in 1 tool call vs rg-only's 4-8.
  • 0.9.0 — e2e Claude Code agent benchmark extended to 14 hand-labelled tasks (8 hard semantic + 6 easy single-shot) across Rust + Python + TypeScript. −30 % tool calls and +2 / 14 answer-correctness with skygrep on. Best-case task: 25× fewer tool calls on the repo-A editor cursor query. Worst-case task: skygrep slightly worse on lexical-friendly signin question — both published.
  • 0.8.0 — first e2e Claude Code agent benchmark (6 easy single-shot questions). Superseded by 0.9.0 with larger sample.
  • 0.7.0 — multi-language benchmark across Rust + Python + TypeScript: 38 / 40 (95 %) recall at 3.55 s/q on Mac CPU. New benchmarks/cross_repo/repo-b.json (12 Python tasks) and repo-c.json (12 TypeScript tasks); unified runner at benchmarks/v0_7_multilang_bench.py. Two honest misses documented in docs/parity-benchmarks.md.
  • 0.6.2 — Ollama preheat (fire-and-forget warm-up at search start); GitHub Actions CI workflows (pytest + auto-PyPI on tag); 1200×630 social preview card.
  • 0.6.1 — Ollama keep_alive=-1 correctness fix (was sending string "-1" causing 400 Bad Request); HyDE default model reverted to qwen2.5:3b after the repo-A benchmark showed qwen2.5:1.5b cost 1 task in recall; tag-aware model presence check in skygrep doctor (no more false-positives when only a different tag of the same base name is installed).
  • 0.6.0 — introduced OLLAMA_HYDE_MODEL and Ollama keep_alive plumbing; superseded by 0.6.1 for default correctness.
  • 0.5.1 — cascade file-mean cosine corpus-wide; repo-A benchmark relabeled to acceptable-alternatives form (16/16 with corrected labels).
  • 0.5.0 — symbol-aware indexing, doc2query enrichment, file-export PageRank tiebreaker.
  • 0.4.1 — ripgrep fallback for the first query in a fresh project.
  • 0.4.0 — bare-form skygrep "<query>", per-project auto-index, skygrep doctor, cascade default.
  • 0.3.0–0.3.1 — confidence-gated cascade.
  • 0.2.0 — vectorized retrieval, lexical reranker, agentic decomposition.
  • 0.1.0 — initial release.

CLI reference

skygrep setup    [--list|--uninstall|--yes] # register with Claude Code / Codex / OpenCode / Gemini / Cursor
skygrep "<query>" [OPTIONS]                 # bare-form search
skygrep search   "<query>" [OPTIONS]        # explicit search
skygrep doctor                              # health check
skygrep stats                               # project index info
skygrep index    [PATH] [--reset]           # explicit reindex
skygrep watch    [PATH] --interval N        # poll for changes
skygrep serve    [--host H] [--port P]      # warm-reranker daemon
skygrep enrich   [--max N] [--batch B]      # opt-in doc2query enrichment
skygrep search options
Option Default Effect
-m, -n, --top 5 Number of final results.
--json off Emit a JSON array; suppresses human formatting.
--answer off Synthesize an answer from retrieved snippets via Ollama.
--content / --no-content on Show or hide snippet bodies in human output.
--language Restrict to one or more language keys; repeatable.
--include / --exclude Glob filter (repeatable).
--cascade / --no-cascade on Confidence-gated retrieval. Off = chunk-only legacy path.
--cascade-tau 0.015 Top-1 / top-2 file-mean cosine gap above which to early-exit.
--rerank / --no-rerank on Cross-encoder rerank on the non-cascade path.
--rerank-pool 50 Candidate pool before reranking.
--rerank-model env or default HuggingFace cross-encoder id.
--hyde / --no-hyde off Force HyDE outside the cascade (rare; cascade decides per query).
--multi-resolution / --no-multi-resolution on File-level cosine top-N → chunk-level inside those files.
--file-top 30 Files surfaced by file-level retrieval.
--lexical-prefilter / --no-lexical-prefilter on Use ripgrep to narrow the candidate file set.
--lexical-root cwd / git toplevel Root directory ripgrep scans.
--lexical-min-candidates 2 Fall back to corpus-wide cosine when ripgrep returns fewer files.
--rank-by chunk chunk (per-file diversity cap) or file (one chunk per file).
--auto-index / --no-auto-index on Auto-build the index on first query and refresh on mtime change.
--daemon-url Send the search to a running skygrep serve daemon.
--agentic off Decompose into subqueries via Ollama before search.
--max-subqueries 3 Upper bound on agentic subqueries.
--semantic-only off Skip lexical reranking; rank by cosine alone.
Capability matrix (every feature, when introduced)
Capability Since
Semantic code search via local Ollama 0.1.0
Tree-sitter chunking + line-window fallback 0.2.0
.gitignore / .mgrepignore hygiene 0.2.0
Incremental indexing (mtime-based) 0.2.0
Stale row cleanup 0.2.0
Watch mode 0.2.0
Hybrid lexical + semantic ranking 0.2.0
Stable JSON output 0.2.0
Local answer mode 0.2.0
Local agentic decomposition 0.2.0
Cross-encoder rerank 0.3.0
Asymmetric query/document embedding prefixes 0.3.0
HyDE query rewriting 0.3.0
Multi-resolution retrieval 0.3.0
Lexical prefilter (ripgrep first stage) 0.3.0
File-rank (one chunk per file) 0.3.0
Daemon mode 0.3.0
Quantisation / device knobs 0.3.0
Confidence-gated cascade 0.3.0 (default in 0.4.0)
Bare-form invocation skygrep "<q>" 0.4.0
Per-project auto-index 0.4.0
skygrep doctor health check 0.4.0
Ripgrep fallback for first query 0.4.1
Symbol-aware indexing (L2) 0.5.0
doc2query enrichment (L3, opt-in via skygrep enrich) 0.5.0
File-export PageRank tiebreaker (L4) 0.5.0
Cascade file-mean cosine corpus-wide 0.5.1
Smaller default HyDE model + keep_alive=-1 0.6.0

Development

git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[rerank]"

.venv/bin/pytest -q tests/
.venv/bin/python benchmarks/agent_context_benchmark.py --top-k 10 --summary-only

To reproduce a repo-A benchmark row, follow the indexing/run instructions in docs/parity-benchmarks.md.

License

PolyForm Noncommercial 1.0.0 — see LICENSE.

⚠️ License change as of v0.16.0. This project is now licensed under PolyForm Noncommercial 1.0.0. Personal, academic, research, hobby, and any other non-commercial use is fully permitted, including modification and redistribution. Commercial use is NOT permitted under this license. To obtain a commercial license, open a GitHub issue titled "Commercial license inquiry" or email chentianchi@gmail.com.

Earlier releases (v0.2.0 – v0.15.1) were originally distributed under the MIT License at release time. The git history has been rewritten retroactively so all checked-in LICENSE files now reflect the new license; binaries that were already published under MIT remain so wherever they exist in the wild.

Acknowledgments

  • Ollama for the local embedding and generation runtime.
  • tree-sitter for syntax-aware parsing.
  • ripgrep for the lexical prefilter stage.
  • Mixedbread for the open-source mxbai-rerank-*-v2 cross-encoder family.
  • nomic-embed-text for the embedding model.
  • Click, NumPy, and SQLite for the core runtime dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skylakegrep-0.1.0.tar.gz (117.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skylakegrep-0.1.0-py3-none-any.whl (98.5 kB view details)

Uploaded Python 3

File details

Details for the file skylakegrep-0.1.0.tar.gz.

File metadata

  • Download URL: skylakegrep-0.1.0.tar.gz
  • Upload date:
  • Size: 117.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 698a690f51c15a476ce70a149855c3d416e4d9baf52ddbb15776df85104a3b4a
MD5 66cdae7f7348e49d797b41144f2ddd15
BLAKE2b-256 901ba76e047d90db39a6046b8b683fe32b83aa2b25d9f447192b0d7b473ae720

See more details on using hashes here.

File details

Details for the file skylakegrep-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: skylakegrep-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 98.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ba61f2cfa0f9722bd698774d8dbbe3e28ba25dc694cd023e77e15e472464bd9
MD5 a36232d539ddbce20b994f1429b0c8b3
BLAKE2b-256 177fac100f1e25eed5900434a6dc5619884030a3a8ad30a9a298547757adaad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page