Skip to main content

Dependency-free BM25 indexer and search CLI for local code/docs

Project description

searxh

Small BM25 search tool for local code/docs. No third-party dependencies.

what it does

  • Indexes text/code files into SQLite (bm25.sqlite by default)
  • Supports incremental indexing (only changed files are reprocessed)
  • Ranks with BM25
  • Shows optional snippets with line numbers
  • Supports path/extension filters, JSON output, and colored matches

requirements

  • Python 3.9+

install

Editable install:

python3 -m pip install -e .

After install, use:

searxh index .
searxh "replication backlog"

quick start

Build or update index:

./search index .

Search:

./search "replication backlog"

command forms

./search [global-options] index [root] [index-options]
./search [global-options] search "query"
./search [global-options] "query"

common examples

Full rebuild:

./search index . --full

Search with snippet + color:

./search --snippet --color "aof fsync"

Filter by path:

./search --path src/ "replication"

Filter by extension:

./search --ext .c,.h,.md "dict"

JSON output:

./search --json "cluster slots"

Custom index path:

./search index . --out /path/to/myindex.sqlite
./search --index /path/to/myindex.sqlite "term"

key options

  • --k: number of results (default 10)
  • --k1, --b: BM25 tuning knobs
  • --path-boost: extra weight for path token matches (default 1.5)
  • --stem: enable simple stemming
  • --no-stopwords: disable stopword filtering
  • --workers: indexing worker threads
  • --no-progress: hide indexing progress output

indexing behavior

  1. Scan files by extension/name and skip likely binary files.
  2. Compare mtime and size with index metadata.
  3. Reindex changed files and remove deleted files.
  4. Update postings and document metadata in SQLite.

If a file produces no tokens (for example, empty/whitespace-only), it is saved as a zero-length doc so incremental runs do not keep retrying it.

tests

python3 -m unittest discover -s . -p 'test_*.py' -q

troubleshooting

sqlite3.OperationalError: unable to open database file

  • Make sure the DB parent directory exists and is writable.
  • Try writing the DB in the current project first:
./search index . --out ./bm25.sqlite

Results look weak

  • Rebuild with --full
  • Try --stem
  • Increase --k
  • Recheck filters (--ext, --path)

files

  • src/sx_search/cli.py: CLI
  • src/sx_search/engine.py: indexing/search engine
  • search: compatibility wrapper script
  • bm25tool.py: compatibility import wrapper
  • test_bm25tool.py: tests
  • SEARCH.md: short usage notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searxh-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searxh-0.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file searxh-0.1.0.tar.gz.

File metadata

  • Download URL: searxh-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4d6724076be2e674a2742a89374509b36a22d14b4b12a17707a20551fc0912b8
MD5 16ef7e4ca0e6e17a43d47bc90a28a070
BLAKE2b-256 8771a20b9e38c8f6e444c8bd4daa5e089d0c1e09bf3943f4dbe8f092304a65c9

See more details on using hashes here.

File details

Details for the file searxh-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: searxh-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 121b40cf6b64a24401dea027d94006c073d692b34117490ac234f11041dbeddd
MD5 f9ead38f50f7f564b979d2a72f55a5b0
BLAKE2b-256 8ba465a4acd604665e1e7e6d53a4bde54748d45c94b41be1c34daeaa54b32121

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page