Fully-offline semantic search over your local files — powered by Ollama

These details have not been verified by PyPI

Project links

Project description

skylakegrep — fully-offline semantic search over your local files

Install · Scenarios · Why? · How it works · Benchmarks · Docs site

Find anything on your machine.

Semantic search for code, PDFs, notes, and docs. Fully offline. No cloud. No telemetry. No subscription. Ask in plain English (or any of 100+ languages) and get the right file + line range in under a second.

$ skygrep "where does the auth token get refreshed?"

═══ auth/middleware.py:78-94          score 0.91 · python
async def renew_session(req: Request):
    # swap the access cookie when the refresh JWT is still valid
    if req.cookies.get("rt") and access_expired(req):
        return await refresh_token(claims, key)

[0.5s · path=cosine-cheap · σ-gap=0.082 ≥ τ=0.005 (adaptive) → high-confidence early-exit · ✓ quality=BEST]

Install in 30 s → · How it works → · Benchmarks →

30 / 30 public-OSS recall · ~1 s warm queries · 100 % local · 14 releases shipped

Three ways people use it

🧠 Code by concept

Find code by what it does, not what it's called. The semantic substrate (bge-m3) bridges your phrasing to the actual identifier even when the function name uses different words.

$ skygrep "where does session refresh logic live?"

→ auth/middleware.py:78  ·  renew_session()

No rg hit for "session refresh"; cosine bridges to renew_session in 0.5 s.

📄 Cross-content

One query across code, PDFs, notes, and docs. Markdown, PDF, Word, plain text — all indexed via the same content-agnostic substrate. Your query searches all of them at once, ranked by semantic relevance.

$ skygrep "the design doc on rate limiter rewrite"

→ docs/rate-limiter-redesign.md  ·  designs/q3-rewrite.pdf

Markdown link graph + PDF text-layer extraction in one cascade.

🌐 Multilingual · private

bge-m3 understands 100+ languages out of the box. Index, retrieval, ranking, optional answer synthesis — all run locally via Ollama. Zero network calls.

$ skygrep "我昨天写的 cascade 调度代码"

→ src/storage.py:847  ·  cascade_search()

Mixed Chinese / English query. Zero network. Audit-friendly.

Why skylakegrep?

Sized against four named alternatives, not generic categories.

skylakegrep — comparison matrix vs ripgrep, mgrep (predecessor), autodev-codebase, Sourcegraph Cody

Same data as a plain markdown table

	skylakegrep v0.2.15	`ripgrep` (lexical)	mgrep (predecessor)	autodev-codebase	Sourcegraph Cody (cloud)
Find by concept, not just token	✓	✗	legacy substrate	✓	✓
Privacy — no data egress	✓	✓	✓	✓	✗ (cloud-side index)
Content — code · md · PDF · docx	all four	text only	code-first	code-first	code-first
Setup	`pip install`	`brew install`	`pip install`	npm + Ollama	account + sub
Cost	$ 0 / mo	$ 0 / mo	$ 0 / mo	$ 0 / mo	$ 20 – 100+ / mo
Multilingual queries (NL → code id)	bge-m3 native	n/a	English-leaning	embedder-dep.	supported

mgrep is the predecessor; skylakegrep is the next-generation evolution with bge-m3 substrate + content-agnostic graph + σ-adaptive cascade. autodev-codebase is the closest direct competitor in the offline-Ollama-CLI lane.

How it works

┌─────────┐      ┌──────────────────┐      ┌──────────────────┐      ┌─────────┐
│  query  │ ──▶  │   LLM router     │ ──▶  │ Cosine cascade   │ ──▶  │ results │
│  text   │      │  (qwen2.5:3b)    │      │   (bge-m3 +      │      │         │
│         │      │  intent + scope  │      │    rerank)       │      │         │
└─────────┘      └──────────────────┘      └──────────────────┘      └─────────┘
                       ~50 ms                    0.5 – 2 s
                       (local)                    (local)

Local Ollama + SQLite. Zero network calls. Zero subscription. The same architecture handles every content type — code · PDFs · notes · markdown · any file you register an extractor for. The LLM router classifies intent + scope + primary token on every query; the cosine cascade uses bge-m3 (multilingual, 1024-d, symmetric XLM-RoBERTa) with σ-adaptive early-exit. Two proactive enhancers kick in when the cascade can't answer: filename_extend extends the search to common home directories; recovery_progress_hint surfaces live re-embed progress when the index is being rebuilt.

Architecture deep-dive →

Install

# 1. install (Python 3.9+)
pip install skylakegrep

# 2. pull the local models (~3 GB, one time)
ollama pull bge-m3 qwen2.5:1.5b qwen2.5:3b

# 3. (one time) register skygrep with your LLM CLI of choice
skygrep setup     # Claude Code · Codex · OpenCode · Gemini CLI · Cursor

# 4. ask anything, anywhere
skygrep "your question here"

That's it. The first query in a fresh project completes in under a second via a ripgrep fallback while a background process builds the semantic index. Every query after that uses the full cascade with the local LLM kept warm in memory.

Performance

Public-OSS reproducible benchmark across three popular codebases (Django · React · Tokio · 30 hand-labelled questions, 10 each):

skylakegrep — public-OSS benchmark performance (30/30 recall on Django + Tokio + React)

Same data as a plain markdown table

Repo	Lang	LOC ≈	skygrep recall	rg recall	Token reduction vs `rg`
Django	Python	524 K	10 / 10	10 / 10	703 ×
Tokio	Rust	80 K	10 / 10	10 / 10	61 ×
React	JS · TS	270 K	10 / 10	10 / 10	773 ×
Aggregate			30 / 30 (100 %)	30 / 30	60 × – 770 ×

Honest reading:

rg's 100 % is a recall-ceiling baseline — it returns 20 M+ tokens per query (term-OR scan with 2-line context windows). Yes, the answer is in the dump; no, the agent has to read all of it to find it.
skygrep returns the right file ranked top-10 in 30 / 30 cases while emitting 60 × – 770 × less context for the agent's LLM round-trip downstream. That's the user-facing number.
Reproduce: git clone Django + React + Tokio at any commit, run benchmarks/public_oss_bench.py. Numbers within ±5 %.

For the full bench protocol, per-task analysis, and worked example (one query · 1,395 × token reduction), see docs/parity-benchmarks.md.

What you can search

The retrieval substrate is content-agnostic by design. The embedder, the cascade, and the reference graph all abstract over "A references B" — not over any specific programming language or file format. New content types plug in via a one-line register_extractor() call.

Content type	How it's parsed	Reference graph	Since
Code — Rust · Python · JS · TS	tree-sitter symbol-aware chunking + line-window fallback	imports / `use` / `require` / dynamic `import()`	0.1.0
Markdown	line-window chunks; `[](link)`, `![]()`, `[[wiki]]` link extraction	relative-path resolution + Obsidian wiki links	0.2.0
PDF	`pypdf` text extraction; opt-in OCR for scanned pages	—	0.1.0
Word docs (`.docx`)	`python-docx` paragraph extraction	—	0.1.0
Plain text · TOML · YAML · CSV · JSON · …	line-window chunking via the default text path	—	0.1.0
Custom (your content type)	register an extractor returning `(source, target)` edges	your call	0.2.0

from skylakegrep.src.reference_graph import register_extractor

def yaml_anchor_extractor(path):
    """Return list of (source, target) reference edges."""
    ...

register_extractor("yaml", [".yaml", ".yml"], yaml_anchor_extractor)

Command cheatsheet

The bare form covers ~95 % of real-world use: skygrep "<your question>". No subcommand, no flags. The system auto-routes (LLM router → find / rg / semantic cascade), auto-indexes on first query, and auto-recovers when the embedder is upgraded.

Command	When to use	Example
`skygrep "<query>"` (bare)	Default. Just ask a question. Auto-indexes, auto-recovers.	`skygrep "where is the auth refresh logic"`
`skygrep search <query>`	Explicit form when you need flags.	`skygrep search "session token" --top 20 --json`
`skygrep doctor`	First-time troubleshooting. Probes Ollama, lists models, summarises the project index, checks integrations.	`skygrep doctor`
`skygrep setup`	Register skygrep with detected LLM CLIs. Run once.	`skygrep setup` · `skygrep setup --uninstall`
`skygrep stats`	Print chunk and file counts.	`skygrep stats`
`skygrep index [PATH] [--reset]`	Rarely needed. Auto-recovery (0.2.2+) handles embedder upgrades.	`skygrep index . --reset`
`skygrep watch [PATH] -i N`	Keep index live in the background. Polls every `N` seconds.	`skygrep watch .`
`skygrep serve --port P`	Daemon mode. Keeps cross-encoder + Ollama warm for 0.5 – 2 s warm queries.	`skygrep serve --port 7878`
`skygrep enrich`	Advanced. Generate doc2query-style descriptions for vocab-mismatch queries.	`skygrep enrich`

Reading the per-query telemetry footer (0.2.2+)

Every search prints a one-line footer so you can see which retrieval path answered your query and why:

✓ 0.42s · quality=BEST
   path     : cosine-cheap (high-confidence early-exit)
   router   : llm → intent=mixed (0.83)
   evidence : σ-gap=0.0820 ≥ τ=0.0050 (adaptive)
   pool     : 1 filename + 0 lexical · cascade
   index    : 20s ago · 36 files · L2 symbols + graph prior

Field guide:

path= — cosine-cheap / cosine-escalated-rerank / rg-only / cascade-skipped. The retrieval strategy this specific query took.
σ-gap=… → reason — Bayesian-evidence proxy that drove the cascade decision. High σ-gap = top-K candidates well separated → cosine trusted, exit cheap. Low σ-gap = candidates tied → escalate to rerank.
recovery=… (only when the recovery worker is active) — live progress + ETA for the in-progress re-embed.
quality=BEST / DEGRADED-recovery — at-a-glance trust indicator.

Configuration

Set via environment variables. Defaults assume a local Ollama server on the default port.

Variable	Default	Effect
`OLLAMA_URL`	`http://localhost:11434`	Ollama server URL.
`OLLAMA_EMBED_MODEL`	`bge-m3`	Embedding model. Switching is auto-detected (0.2.2+) — no manual `--reset` needed.
`OLLAMA_LLM_MODEL`	`qwen2.5:3b`	Used for `--answer`, `--agentic`, the LLM router, and recovery telemetry.
`OLLAMA_HYDE_MODEL`	`qwen2.5:3b`	Used for cascade-escalation HyDE rewrite. Falls back to `OLLAMA_LLM_MODEL`.
`OLLAMA_KEEP_ALIVE`	`-1`	Passed to every Ollama call. `-1` keeps models resident indefinitely (recommended).
`SKYGREP_DB_PATH`	per-project	When set, treats the index as curated and disables auto-mutation.
`SKYGREP_AUTO_PULL`	unset	Set `yes` to auto-`ollama pull` missing models without prompting.
`SKYGREP_AUTO_REFRESH_THROTTLE_SECONDS`	`30`	Skip the mtime scan if the previous refresh ran more recently.
`SKYGREP_RERANK_MODEL`	`mixedbread-ai/mxbai-rerank-large-v2`	Cross-encoder for `--rerank`.
`SKYGREP_RERANK_POOL`	`50`	Candidate pool before reranking.
`SKYGREP_NO_HINTS`	unset	Set `1` to silence all intelligent-CLI hints (out-of-scope, typo, low-conf, first-run).
`SKYGREP_NO_PROACTIVE`	unset	Set `1` to disable the proactive enhancement framework.
`SKYGREP_PROACTIVE_BUDGET_MS`	`2000`	Total wall-clock cap on proactive enhancers per query.
`SKYGREP_FOOTER_COMPACT`	unset	Set `1` for the legacy single-line telemetry footer.

What's new

Recent releases (2026-05-05, in chronological ship order):

0.2.13 — Privacy-only sweep: removed user-personal references from public release notes / docs / README. No code change.
0.2.12 — filename_extend morphology fallback when LLM is unreachable. Plus the conversational session state plan.
0.2.11 — Second built-in proactive enhancer: recovery_progress_hint. Plus ProactiveContext infrastructure for future enhancers.
0.2.10 — Critical fix: the per-dir find budget bug that silenced proactive on the user's actual scenarios. End-to-end verified before tagging.
0.2.9 ← 0.2.7 — Three iterations on the proactive framework's gate logic, recorded in the Principle 1 receipts table. Each was a Principle-1 lapse the user caught.
0.2.6 — LLM-driven scope classification replaces the keyword _METADATA_TOKENS list. Principle 1 ✓ shipped.
0.2.0 — bge-m3 substrate · content-agnostic reference graph registry · σ-adaptive cascade · 30 / 30 public-OSS recall (was 28 / 30).

Full release notes →

Project principles

Architecture rules every contributor (human or AI agent) should follow. Recorded in docs/PRINCIPLES.md. Loaded into Claude sessions automatically via CLAUDE.md.

Understanding > Enumeration — substrate (LLM / embedder / registry) over hardcoded lists. Receipts table tracks 5 past lapses.
Substrate before scaffolding — upgrade the underlying model before layering priors on top.
Latency / quality / correctness — in that priority order.
Public surfaces sync at every release — the 8-surface checklist in docs/RELEASING.md.
Honest evaluation over hopeful claims — name the bench, show the numbers, don't combine across benches.
Proactive over Passive — when the cascade can't answer, try bounded extra work in parallel rather than shrug.

Development

git clone https://github.com/danielchen26/skylakegrep.git
cd skylakegrep
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[rerank]

# Verify
.venv/bin/python -m pytest -q tests/        # 201 / 201 should pass

The release protocol is documented in docs/RELEASING.md. Every release must sync 8 public-facing surfaces (PyPI, GitHub Release, README, GitHub Pages, plan docs, principles, version bump, tag) in a specific order.

License

PolyForm Noncommercial 1.0.0. Personal · academic · research · hobby use is fully permitted. Commercial use requires a separate license — contact chentianchi@gmail.com.

Acknowledgments

Built on the shoulders of:

Ollama — local model serving
bge-m3 — multilingual embedder (BAAI)
qwen2.5 — local LLM family for routing + answer synthesis
tree-sitter — symbol-aware chunking
SQLite — durable index storage
pypdf · python-docx — binary content extraction
Pygments — syntax highlighting in the rendered terminal output

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.13

May 11, 2026

0.5.12

May 9, 2026

0.5.11

May 9, 2026

0.5.10

May 9, 2026

0.5.9

May 9, 2026

0.5.8.7

May 8, 2026

0.5.8.6

May 8, 2026

0.5.8.5

May 8, 2026

0.5.8.3

May 8, 2026

0.5.8.2

May 7, 2026

0.5.8.1

May 7, 2026

0.5.8

May 7, 2026

0.5.7

May 6, 2026

0.5.6

May 6, 2026

0.5.4

May 6, 2026

0.5.3

May 6, 2026

0.5.2

May 6, 2026

0.5.1

May 6, 2026

0.5.0

May 6, 2026

0.4.2

May 6, 2026

0.4.1

May 6, 2026

0.4.0

May 6, 2026

0.3.1

May 6, 2026

0.3.0

May 6, 2026

0.2.21

May 6, 2026

0.2.20

May 6, 2026

0.2.19

May 6, 2026

0.2.18

May 6, 2026

0.2.17

May 6, 2026

0.2.16

May 6, 2026

This version

0.2.15

May 6, 2026

0.2.14

May 6, 2026

0.2.13

May 5, 2026

0.2.12

May 5, 2026

0.2.11

May 5, 2026

0.2.10

May 5, 2026

0.2.9

May 5, 2026

0.2.8

May 5, 2026

0.2.7

May 5, 2026

0.2.6

May 5, 2026

0.2.5

May 5, 2026

0.2.4

May 5, 2026

0.2.3

May 5, 2026

0.2.2

May 5, 2026

0.2.1

May 5, 2026

0.2.0

May 5, 2026

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skylakegrep-0.2.15.tar.gz (171.6 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skylakegrep-0.2.15-py3-none-any.whl (142.2 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file skylakegrep-0.2.15.tar.gz.

File metadata

Download URL: skylakegrep-0.2.15.tar.gz
Upload date: May 6, 2026
Size: 171.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.2.15.tar.gz
Algorithm	Hash digest
SHA256	`6493ca8ffb454baf4076d0f0f73cd0cff8b83572e7fb76e294433a50fee67085`
MD5	`7c334f13590eeb40b624eacf0e05b175`
BLAKE2b-256	`c22cb7f77e592579136952fe1e4988b8bab08a33466055a907284209e7b03e14`

See more details on using hashes here.

File details

Details for the file skylakegrep-0.2.15-py3-none-any.whl.

File metadata

Download URL: skylakegrep-0.2.15-py3-none-any.whl
Upload date: May 6, 2026
Size: 142.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for skylakegrep-0.2.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db64497407f5a0545eaf7299080a22b9829744eb2e79ff5e874bceed6ef715d3`
MD5	`6e68f7536103ed1f693e951af64888ae`
BLAKE2b-256	`b7b262e826c3bb09d7bee04c99490c3732c1ba624d7962f5ab17fe08b54643d6`

See more details on using hashes here.

skylakegrep 0.2.15

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Find anything on your machine.

Three ways people use it

🧠 Code by concept

📄 Cross-content

🌐 Multilingual · private

Why skylakegrep?

How it works

Install

Performance

What you can search

Command cheatsheet

Reading the per-query telemetry footer (0.2.2+)

Configuration

What's new

Project principles

Development

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes