Dependency-free BM25 indexer and search CLI for local code/docs
Project description
searxh
Small BM25 search tool for local code/docs. No third-party dependencies.
what it does
- Indexes text/code files into SQLite (
bm25.sqliteby default) - Supports incremental indexing (only changed files are reprocessed)
- Ranks with BM25
- Shows optional snippets with line numbers
- Supports path/extension filters, JSON output, and colored matches
requirements
- Python 3.9+
install
Editable install:
python3 -m pip install -e .
After install, use:
searxh index .
searxh "replication backlog"
sx index .
sx "replication backlog"
quick start
Build or update index:
./search index .
Search:
./search "replication backlog"
command forms
./search [global-options] index [root] [index-options]
./search [global-options] search "query"
./search [global-options] "query"
common examples
Full rebuild:
./search index . --full
Search with snippet + color:
./search --snippet --color "aof fsync"
Filter by path:
./search --path src/ "replication"
Filter by extension:
./search --ext .c,.h,.md "dict"
JSON output:
./search --json "cluster slots"
Custom index path:
./search index . --out /path/to/myindex.sqlite
./search --index /path/to/myindex.sqlite "term"
key options
--k: number of results (default10)--k1,--b: BM25 tuning knobs--path-boost: extra weight for path token matches (default1.5)--stem: enable simple stemming--no-stopwords: disable stopword filtering--workers: indexing worker threads--no-progress: hide indexing progress output
indexing behavior
- Scan files by extension/name and skip likely binary files.
- Compare
mtimeandsizewith index metadata. - Reindex changed files and remove deleted files.
- Update postings and document metadata in SQLite.
If a file produces no tokens (for example, empty/whitespace-only), it is saved as a zero-length doc so incremental runs do not keep retrying it.
tests
python3 -m unittest discover -s . -p 'test_*.py' -q
troubleshooting
sqlite3.OperationalError: unable to open database file
- Make sure the DB parent directory exists and is writable.
- Try writing the DB in the current project first:
./search index . --out ./bm25.sqlite
Results look weak
- Rebuild with
--full - Try
--stem - Increase
--k - Recheck filters (
--ext,--path)
files
src/sx_search/cli.py: CLIsrc/sx_search/engine.py: indexing/search enginesearch: compatibility wrapper scriptbm25tool.py: compatibility import wrappertest_bm25tool.py: testsSEARCH.md: short usage notes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file searxh-0.1.1.tar.gz.
File metadata
- Download URL: searxh-0.1.1.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07bb33fb1e486d73937dc2b1b2ab190cfe60f0992766b919b52fa11e729a6d0c
|
|
| MD5 |
621bd885e0c337d2c89225de92ef8332
|
|
| BLAKE2b-256 |
59ba19b30526b5b3cf21821bde7289fc27fccaaec165f417ebd41eb5a70e2bbe
|
File details
Details for the file searxh-0.1.1-py3-none-any.whl.
File metadata
- Download URL: searxh-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71cd3e59dd2d0408645f88a409959c94fd6a6499d255144dad746f65b902fcde
|
|
| MD5 |
c26a6ba0a3a1769c7deb7b87114a4eba
|
|
| BLAKE2b-256 |
089bc1e0ac1a5fb5ff677d8afb0901a9c40acfd30bb0bfc5625f12eeae824bf3
|