Skip to main content

Fully-offline semantic search over your local files — powered by Ollama

Project description

skylakegrep — fully-offline semantic search over your local files

PyPI Python 3.9+ PolyForm Noncommercial 1.0.0 Documentation Latest release

Quickstart  ·  30s demo  ·  What you can search  ·  What's new in 0.2.x  ·  Performance  ·  How it works  ·  Releases  ·  Docs


skygrep is a fully-offline semantic search CLI for your local files — code, markdown, PDFs, Word docs, plain text, anything you index. Ask in plain English, get the right file and line range. The content-agnostic retrieval substrate (Karpathy "knowledge graph as prior") works across content types via a pluggable extractor registry — not a code-only tool. Indexing, retrieval, and optional answer synthesis all run locally against your own Ollama server. No remote service, no subscription, no data leaves your machine.

What you can search

The retrieval substrate is content-agnostic by design — the embedder (bge-m3, multilingual XLM-RoBERTa), the cascade, and the reference graph all abstract over "A references B" rather than over any specific programming language or document format. New content types plug in via a one-line register_extractor() call.

Content type How it's parsed Reference graph Since
Code — Rust · Python · JS · TS tree-sitter symbol-aware chunking + line-window fallback imports / use / require / dynamic import() 0.1.0
Markdown line-window chunks; [](link), ![](), [[wiki]] link extraction relative-path resolution + Obsidian-style wiki links 0.2.0
PDF pypdf text extraction; opt-in OCR for scanned pages 0.1.0
Word docs (.docx) python-docx paragraph extraction 0.1.0
Plain text · TOML · YAML · CSV · JSON · … line-window chunking via the default text path 0.1.0
Custom (your content type) register an extractor returning (source, target) edges your call 0.2.0

Register a new content type in one line:

from skylakegrep.src.reference_graph import register_extractor

def yaml_anchor_extractor(path):
    """Return list of (source, target) reference edges."""
    ...

register_extractor("yaml", [".yaml", ".yml"], yaml_anchor_extractor)

What's new in 0.2.x

What Why it matters
Substrate bge-m3 default embedder (1024-d, multilingual, symmetric, 8k context) replaces mxbai-embed-large Single largest accuracy contributor — React bench 8/10 → 10/10. Existing indexes must rebuild (--reset).
Reference graph register_extractor() registry; code_graph.py is a back-compat facade Content-agnostic abstraction ("A references B"). Markdown shipped in 0.2.0. New types in one line, no retrieval-code changes.
Cascade σ-adaptive threshold max(τ_floor, k·σ_topK) Derived from cosine evidence (MacKay/Williams Bayesian framing), not a magic number. Auto-recalibrates when embedder swaps. New tau_mode telemetry.
Path filter 24 universal aux-path conventions (/fixtures/, /vendor/, /dist/, .development.js, .min.js, …) Structural prior, not language-specific. Applies to any corpus with similar conventions.
Bench 30 / 30 on Django + React + Tokio public OSS bench Was 28 / 30 in 0.1.0; latency aggregate −19 % (Tokio +57 % is the real trade-off).
Symbol channel multi_channel_search with RRF k=60 fusion (internal, opt-in) Tree-sitter symbol-as-retriever experimental primitive. Not in default CLI yet — auto-router is a 0.3.0 follow-up.

Full release notes: docs/skylakegrep-0.2.0.md · docs/skylakegrep-0.2.1.md

In 30 seconds

$ pip install skylakegrep
$ ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b   # one-time
$ cd ~/your-project

$ skygrep "where is the cascade tau threshold defined?"
=== skylakegrep/src/storage.py:578-602 (score: 0.781) ===
CASCADE_DEFAULT_TAU = 0.015

def cascade_search(...
[0.51s · cascade=cheap (gap=0.020 τ=0.015) · index 20s ago · 36 files · L2 symbols on · graph prior on]

That is the entire happy path. First query in a fresh project completes in under 1 s via a ripgrep fallback while a background process builds the semantic index. Every query after that uses the full cascade with a local LLM kept warm in memory.

Quickstart

pip install skylakegrep
ollama pull nomic-embed-text qwen2.5:1.5b qwen2.5:3b   # ~3 GB total

# One-time: register skylakegrep with detected LLM CLIs
# (Claude Code / Codex / OpenCode / Gemini CLI / Cursor)
skygrep setup

cd /your/project
skygrep "<your question>"
skygrep doctor                # verify runtime + models + index + integrations
skygrep stats                 # show current project's index info

skygrep derives the project root from git rev-parse --show-toplevel (falling back to the working directory) and keeps a per-project index under ~/.skylakegrep/repos/. Subcommand names (index, doctor, stats, watch, serve, setup, enrich) take precedence — anything else is treated as a query, so skygrep "stats and metrics" (quoted) is unambiguous.

skygrep setup writes a small markdown snippet to each detected LLM CLI's user-level instructions file (e.g. ~/.claude/CLAUDE.md, ~/.codex/AGENTS.md, ~/.gemini/GEMINI.md) telling the agent to prefer skygrep for natural-language search across local files and fall back to rg otherwise. Snippets are delimited by markers; skygrep setup --uninstall removes them cleanly without touching your other instructions.

Performance

Public, reproducible benchmark on three popular open-source codebases. 30 hand-labelled questions total (10 per repo). Anyone can clone the repos and re-run with one command — see benchmarks/public_oss_bench.py and docs/parity-benchmarks.md.

Repo Language LOC ≈ Tasks skygrep recall rg recall Token reduction
Django Python 524 K 10 10 / 10 10 / 10 703 ×
Tokio Rust 80 K 10 10 / 10 10 / 10 61 ×
React JS+TS 270 K 10 10 / 10 10 / 10 773 ×
Aggregate 30 30 / 30 (100 %) 30 / 30 60×–770×

Honest reading:

  • Hit-rate parity across all three (10/10 each on Django, Tokio, React). The two React misses that previously surfaced (test-fixture path bias on react-007, devtools-vs-reconciler on react-010) were resolved by upgrading the embedding substrate to bge-m3 and extending the non-canonical-path filter — see docs/parity-benchmarks.md for the failure analysis and the resolution.
  • rg "100 %" is a recall-ceiling baseline. It returns 20 M+ tokens per task (term-OR scan with 2-line context windows). Yes the answer is in the dump; no, the agent has to read the 20 M tokens to find it.
  • skygrep delivers the answer ranked top-10 in 30 / 30 cases while emitting 60×–770× less context for the agent's LLM round-trip downstream. That is the user-facing claim.

Recall counts a query as a hit when at least one returned chunk matches the canonical expected path or any of the question's expected_alternatives.

Cascade-only ablation (Django, in-bench numbers)

Tier What it does Cold first query Warm avg s/q
cascade ⭐ default rg prefilter → file-mean cosine → escalate to HyDE only when uncertain ~10 s (Ollama loads) 0.5–2 s (warm)
cascade-cheap early-exit only, no LLM call <1 s <0.2 s
cascade + small HyDE (OLLAMA_HYDE_MODEL=qwen2.5:1.5b) uses 1.5 B for HyDE — faster, slightly lower recall ~5 s 2.0 s
chunk + rerank classic chunk cosine + cross-encoder rerank ~10 s + 30 s reranker load ~10 s
ripgrep raw rg -il -F token-OR (file membership only) <1 s <1 s

The cascade is bimodal by design: ~80 % of queries take the cheap path (file-mean cosine, no LLM call) and complete in under 200 ms warm; the remaining ~20 % escalate to a HyDE-augmented retrieval and complete in the 1–2 s band. With Ollama models kept resident in memory (OLLAMA_KEEP_ALIVE=-1, the 0.6.0 default) the second query in a shell session no longer pays the 5–10 s Ollama cold-load.

A second self-test benchmark compares skygrep against a simulated grep agent over 30 navigation tasks against this repo: 30 / 30 recall at top-k 10 with 2× total-token reduction and 2.9× context-token reduction vs the agent baseline.

How it works

your query
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│ 1.  ripgrep prefilter      Fast surface-token narrowing          │
│ 2.  file-mean cosine       Rank files by mean of chunk vectors   │
│ 3.  cascade decision       Confident? return cheap. Else escalate│
│ 4.  HyDE escalation        LLM rewrites query → cosine union     │
│ 5.  symbol + graph         Tree-sitter symbol boost + PageRank   │
│     (L2 + L4)              tiebreaker on near-tied candidates    │
└─────────────────────────────────────────────────────────────────┘
    │
    ▼
top-K chunks (path · line range · score · snippet)

Each layer is offline-paid and query-time-free where possible: embeddings are precomputed at index time, symbol extraction runs once per project, the file-export PageRank is one regex pass over the corpus. Only the cascade's HyDE-escalation path makes a query-time LLM call, and it only runs on the ~20 % of queries the cheap path is uncertain about.

The full architecture diagram and module-by-module walk-through is at docs/skylakegrep-0.1.0.md and docs/roadmap.md.

When to use what

You want Use
Find code by concept ("how does X work?") skygrep "<query>"
Find code with a known token rg <token> (it's faster, no setup)
Synthesize an answer with citations skygrep "<query>" --answer
Decompose a broad question skygrep "<query>" --agentic --max-subqueries 3 --answer
Machine-readable output for an agent skygrep "<query>" --json
Re-rank candidates with a cross-encoder skygrep "<query>" --no-cascade --rerank
Continuously index a watched dir skygrep watch /path
Keep the cross-encoder warm across queries skygrep serve & ; skygrep "<q>" --daemon-url http://127.0.0.1:7878

Configuration

Variable Default Effect
OLLAMA_URL http://localhost:11434 Ollama server URL.
OLLAMA_EMBED_MODEL nomic-embed-text Embedding model. Switching requires skygrep index --reset.
OLLAMA_LLM_MODEL qwen2.5:3b Used for --answer and --agentic.
OLLAMA_HYDE_MODEL qwen2.5:3b Used for cascade-escalation HyDE. Falls back to OLLAMA_LLM_MODEL if not installed. Set to qwen2.5:1.5b for ~30 % speedup at the cost of 1 task on 16-task Rust.
OLLAMA_KEEP_ALIVE -1 Passed to every Ollama call. -1 keeps models resident indefinitely (recommended).
SKYGREP_DB_PATH per-project When set, skygrep treats the index as curated and disables auto-mutation.
SKYGREP_AUTO_PULL unset Set yes to auto-ollama pull missing models without prompting.
SKYGREP_AUTO_REFRESH_THROTTLE_SECONDS 30 Skip the mtime scan if the previous refresh ran more recently.
SKYGREP_RERANK_MODEL mixedbread-ai/mxbai-rerank-large-v2 Cross-encoder for --rerank.
SKYGREP_RERANK_POOL 50 Candidate pool before reranking.

Releases

This is the first public release of skylakegrep. See docs/skylakegrep-0.1.0.md for the full description of capabilities, architecture, CLI flags, and environment variables.

  • 0.1.0 — first public release. LLM-driven query routing (filename / lexical / semantic cascade tiers, intent-aware merge), confidence-gated semantic cascade with HyDE escalation + cross-encoder rerank + PageRank tiebreaker, lazy PDF / docx content extraction (pdftotext + pypdf fallback, optional --ocr), framed Pygments-highlighted card rendering, four --detail levels, skygrep setup auto-registration with major LLM CLIs (Claude Code, Codex, OpenCode, Gemini CLI, Cursor). PolyForm Noncommercial 1.0.0 license.

CLI reference

skygrep setup    [--list|--uninstall|--yes] # register with Claude Code / Codex / OpenCode / Gemini / Cursor
skygrep "<query>" [OPTIONS]                 # bare-form search
skygrep search   "<query>" [OPTIONS]        # explicit search
skygrep doctor                              # health check
skygrep stats                               # project index info
skygrep index    [PATH] [--reset]           # explicit reindex
skygrep watch    [PATH] --interval N        # poll for changes
skygrep serve    [--host H] [--port P]      # warm-reranker daemon
skygrep enrich   [--max N] [--batch B]      # opt-in doc2query enrichment
skygrep search options
Option Default Effect
-m, -n, --top 5 Number of final results.
--json off Emit a JSON array; suppresses human formatting.
--answer off Synthesize an answer from retrieved snippets via Ollama.
--content / --no-content on Show or hide snippet bodies in human output.
--language Restrict to one or more language keys; repeatable.
--include / --exclude Glob filter (repeatable).
--cascade / --no-cascade on Confidence-gated retrieval. Off = chunk-only legacy path.
--cascade-tau 0.015 Top-1 / top-2 file-mean cosine gap above which to early-exit.
--rerank / --no-rerank on Cross-encoder rerank on the non-cascade path.
--rerank-pool 50 Candidate pool before reranking.
--rerank-model env or default HuggingFace cross-encoder id.
--hyde / --no-hyde off Force HyDE outside the cascade (rare; cascade decides per query).
--multi-resolution / --no-multi-resolution on File-level cosine top-N → chunk-level inside those files.
--file-top 30 Files surfaced by file-level retrieval.
--lexical-prefilter / --no-lexical-prefilter on Use ripgrep to narrow the candidate file set.
--lexical-root cwd / git toplevel Root directory ripgrep scans.
--lexical-min-candidates 2 Fall back to corpus-wide cosine when ripgrep returns fewer files.
--rank-by chunk chunk (per-file diversity cap) or file (one chunk per file).
--auto-index / --no-auto-index on Auto-build the index on first query and refresh on mtime change.
--daemon-url Send the search to a running skygrep serve daemon.
--agentic off Decompose into subqueries via Ollama before search.
--max-subqueries 3 Upper bound on agentic subqueries.
--semantic-only off Skip lexical reranking; rank by cosine alone.
Capability matrix (every feature, when introduced)
Capability Since
Semantic file search via local Ollama (code + docs + PDFs) 0.1.0
bge-m3 multilingual substrate as default embedder 0.2.0
Content-agnostic reference-graph registry (register_extractor()) 0.2.0
Built-in markdown link extractor ([](link), ![](), [[wiki]]) 0.2.0
σ-adaptive cascade threshold (max(τ_floor, k·σ_topK)) 0.2.0
Universal non-canonical-path filter (24 patterns: fixtures/vendor/dist/.min.js/…) 0.2.0
Symbol-as-retriever channel (internal, opt-in) 0.2.0
Public-OSS bench: 30 / 30 (Django · React · Tokio) 0.2.0
Tree-sitter chunking + line-window fallback 0.2.0
.gitignore / .skygrepignore hygiene 0.2.0
Incremental indexing (mtime-based) 0.2.0
Stale row cleanup 0.2.0
Watch mode 0.2.0
Hybrid lexical + semantic ranking 0.2.0
Stable JSON output 0.2.0
Local answer mode 0.2.0
Local agentic decomposition 0.2.0
Cross-encoder rerank 0.3.0
Asymmetric query/document embedding prefixes 0.3.0
HyDE query rewriting 0.3.0
Multi-resolution retrieval 0.3.0
Lexical prefilter (ripgrep first stage) 0.3.0
File-rank (one chunk per file) 0.3.0
Daemon mode 0.3.0
Quantisation / device knobs 0.3.0
Confidence-gated cascade 0.3.0 (default in 0.4.0)
Bare-form invocation skygrep "<q>" 0.4.0
Per-project auto-index 0.4.0
skygrep doctor health check 0.4.0
Ripgrep fallback for first query 0.4.1
Symbol-aware indexing (L2) 0.5.0
doc2query enrichment (L3, opt-in via skygrep enrich) 0.5.0
File-export PageRank tiebreaker (L4) 0.5.0
Cascade file-mean cosine corpus-wide 0.5.1
Smaller default HyDE model + keep_alive=-1 0.6.0

Development

git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[rerank]"

.venv/bin/pytest -q tests/
.venv/bin/python benchmarks/agent_context_benchmark.py --top-k 10 --summary-only

To reproduce the public OSS benchmark numbers, follow the instructions in docs/parity-benchmarks.md.

License

PolyForm Noncommercial 1.0.0 — see LICENSE.

Personal, academic, research, hobby, and any other non-commercial use is fully permitted, including modification and redistribution. Commercial use is NOT permitted under this license. To obtain a commercial license, open a GitHub issue titled "Commercial license inquiry" or email chentianchi@gmail.com.

Acknowledgments

  • Ollama for the local embedding and generation runtime.
  • tree-sitter for syntax-aware parsing.
  • ripgrep for the lexical prefilter stage.
  • Mixedbread for the open-source mxbai-rerank-*-v2 cross-encoder family.
  • nomic-embed-text for the embedding model.
  • Click, NumPy, and SQLite for the core runtime dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skylakegrep-0.2.1.tar.gz (132.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skylakegrep-0.2.1-py3-none-any.whl (112.0 kB view details)

Uploaded Python 3

File details

Details for the file skylakegrep-0.2.1.tar.gz.

File metadata

  • Download URL: skylakegrep-0.2.1.tar.gz
  • Upload date:
  • Size: 132.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ae6e057650a16e1b763c972b35d0cf63098a32a8cff2983171da753cebc114a7
MD5 18817f41eeb9ea39191e27d8b8467099
BLAKE2b-256 b08896c382379cd33ae3bd09c00c4a10b05584ef913717304f35ca3d92dfbdc4

See more details on using hashes here.

File details

Details for the file skylakegrep-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: skylakegrep-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 112.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 20be226f66650aa95c0488a5be73c5ef0cb4edd25124a4d1742f84cdb78efd00
MD5 cf50f9e290ce7558ec38ebda7eba7ab6
BLAKE2b-256 0bc6bda7093d166906c5a6d319f2f528355aa67ef92eec32d3ad960e69038abd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page