A ridiculously simple search engine factory
Project description
grub
A ridiculously simple search engine factory.
Point grub at anything — a folder, a codebase, a Python package, a
website, a pile of notes — and get back a working search engine in one
line. No servers, no indexes to babysit, no configuration.
pip install grub
from grub import grub
grub('./my_notes', 'where did I write about retirement savings')
The AI-first way (start here)
You probably shouldn't be calling grub yourself at all.
grub ships with agent skills — instruction files that teach an AI
coding agent (Claude Code, Cursor, and friends) how to drive grub on your
behalf. The skills live in .claude/skills/:
| Skill | What it lets the agent do |
|---|---|
grub-search |
Build a search index over a folder, codebase, module, website, or list of strings, and answer questions against it. |
grub-extend |
Wire grub up to custom embedding providers (OpenAI, Cohere, …), new backends, or new data sources. |
With the skills in place, you stop writing code and start asking:
"Search my
./docsfolder and tell me which file explains the deployment process."
"Index this codebase and find where rate limiting is implemented."
"Use semantic search over my meeting notes to find anything about the Q3 budget."
"Search these three documentation URLs and summarize what they say about authentication."
The agent reads the skill, picks the right source type, the right search
method (lexical, semantic, or hybrid), chunks long documents when it
helps, and hands you the answer. You never see a SearchStore
constructor. You never tune a vectorizer. You describe the outcome you
want, in English, and it happens.
Why AI-first?
Because the interface to software is changing, and grub is built for the change.
For decades, "using a tool" meant learning the tool — its API, its flags, its mental model — and then translating your intent into its vocabulary. That translation tax was unavoidable. It is not anymore.
An AI agent is a universal adapter between human intent and machine capability. It already knows grub's vocabulary; you don't have to. So the job of a well-designed library is no longer "expose a clever API to humans" — it's "expose powerful, composable capabilities, and ship the knowledge an agent needs to wield them." That knowledge is the skill files.
grub leans all the way into this:
- The skills are the primary interface. They are documentation an agent executes, not documentation a human reads and then forgets.
- The Python API is the substrate. It stays clean, small, and honest — because an agent calling it deserves the same good design a human would.
- You operate at the level of intent. "Find the doc about X" instead of "instantiate, configure, fit, query, parse."
The future of tooling is not humans memorizing more APIs. It's humans stating goals and agents composing capabilities. grub is a small tool, so it's a small example — but the shape is the same all the way up.
For the dinosaurs who want to operate with code directly 🦖
No judgment. Sometimes you are the agent, and a REPL is the fastest path. The Python API is built to be a pleasure to use directly.
One function does it all
from grub import grub
search = grub('./docs') # build a searcher
results = grub('./docs', 'how to deploy') # ...or search in one call
grub() figures out what you handed it:
grub('./docs') # a folder of files
grub('src/**/*.py') # a glob
grub(some_module) # a Python package's source
grub('https://example.com/guide') # a web page (HTML stripped to text)
grub({'intro': '...', 'faq': '...'}) # a dict of documents
grub(['first doc', 'second doc']) # a list of strings
Results that explain themselves
results = grub('./docs', 'configure logging')
for hit in results:
print(hit.score, hit.key, hit.snippet)
results.keys # ['logging.md', 'setup.md', ...] best-first
results.scores # [0.71, 0.33, ...]
print(results.show()) # a tidy ranked rendering
print(search['logging.md']) # the full original text of a hit
Every hit carries a score and a snippet — the line that shows you why it matched.
Three ways to search
grub(src, query, method='tfidf') # lexical: shared words (default, fast)
grub(src, query, method='semantic') # embeddings: shared *meaning*
grub(src, query, method='hybrid') # a blend of both
Semantic search finds "automobile" when you searched "car". It needs
embeddings — either pip install 'grub[semantic]' (a local
sentence-transformers model) or your own provider:
grub('./docs', method='semantic', embed=my_openai_embedding_function)
Long documents, chunking, and persistence
grub('./book.txt', chunk=1500) # split into passages, not whole files
grub('./src', extensions=['.py']) # filter what gets indexed
from grub import Searcher
grub('./big_codebase').save('code.grub') # build once
Searcher.load('code.grub') # reload instantly
From the command line
grub ./docs "how do I configure logging"
grub ./src --extensions .py --snippets "retry with backoff"
grub https://example.com/guide --semantic "getting started"
grub ./docs # interactive prompt
The legacy API still works
The original SearchStore and friends are unchanged and still exported,
so existing code keeps running:
from grub import SearchStore
import sklearn, os
search = SearchStore(os.path.dirname(sklearn.__file__) + '/{}.py')
search('how to calibrate the estimates of my classifier')
How it works
grub is a thin, honest pipeline:
source ──to_store──▶ store ──backend──▶ scores ──▶ SearchResults
to_storeturns any source into aMapping[str, str].- A backend (TF-IDF, embeddings, or a hybrid) scores every document against your query.
- Results come back ranked, scored, and annotated with snippets.
Every stage is swappable — see the grub-extend skill or
grub/backends.py. That's the whole trick: simple
things stay simple, powerful things stay possible.
Install
pip install grub # core (TF-IDF / lexical search)
pip install 'grub[semantic]' # adds local embedding-based search
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grub-0.1.7.tar.gz.
File metadata
- Download URL: grub-0.1.7.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ef5f49cae2fb6682dd1fd6c60150e31d51089a84a7c0922572c73e430c5c9b4
|
|
| MD5 |
8edec0ff3ac2f5ee3313e4c705b034cb
|
|
| BLAKE2b-256 |
5d4ecfb136b9d3fb035ff32adf4a0361aceac684a149e637ba6771ccdc8e1561
|
File details
Details for the file grub-0.1.7-py3-none-any.whl.
File metadata
- Download URL: grub-0.1.7-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea3a0ff369eb525f54c1f9bf0e78d01147fd162174f23857510fc6a9c1f22cbf
|
|
| MD5 |
8ce85c74772ac75d7a641900582e0f4b
|
|
| BLAKE2b-256 |
6d654702858a038ece5dfa4ed2e826bf723affce71348778718454cb4bf56d94
|