Skip to main content

Dependency-free BM25 indexer and search CLI for local code/docs

Project description

searxh

Small BM25 search tool for local code/docs. No third-party dependencies.

what it does

  • Indexes text/code files into SQLite (bm25.sqlite by default)
  • Supports incremental indexing (only changed files are reprocessed)
  • Ranks with BM25
  • Shows optional snippets with line numbers
  • Supports path/extension filters, JSON output, and colored matches

requirements

  • Python 3.9+

install

Editable install:

python3 -m pip install -e .

After install, use:

searxh index .
searxh "replication backlog"
sx index .
sx "replication backlog"

quick start

Build or update index:

./search index .

Search:

./search "replication backlog"

command forms

./search [global-options] index [root] [index-options]
./search [global-options] search "query"
./search [global-options] "query"

common examples

Full rebuild:

./search index . --full

Search with snippet + color:

./search --snippet --color "aof fsync"

Filter by path:

./search --path src/ "replication"

Filter by extension:

./search --ext .c,.h,.md "dict"

JSON output:

./search --json "cluster slots"

Custom index path:

./search index . --out /path/to/myindex.sqlite
./search --index /path/to/myindex.sqlite "term"

key options

  • --k: number of results (default 10)
  • --k1, --b: BM25 tuning knobs
  • --path-boost: extra weight for path token matches (default 1.5)
  • --stem: enable simple stemming
  • --no-stopwords: disable stopword filtering
  • --workers: indexing worker threads
  • --no-progress: hide indexing progress output

indexing behavior

  1. Scan files by extension/name and skip likely binary files.
  2. Compare mtime and size with index metadata.
  3. Reindex changed files and remove deleted files.
  4. Update postings and document metadata in SQLite.

If a file produces no tokens (for example, empty/whitespace-only), it is saved as a zero-length doc so incremental runs do not keep retrying it.

tests

python3 -m unittest discover -s . -p 'test_*.py' -q

troubleshooting

sqlite3.OperationalError: unable to open database file

  • Make sure the DB parent directory exists and is writable.
  • Try writing the DB in the current project first:
./search index . --out ./bm25.sqlite

Results look weak

  • Rebuild with --full
  • Try --stem
  • Increase --k
  • Recheck filters (--ext, --path)

files

  • src/sx_search/cli.py: CLI
  • src/sx_search/engine.py: indexing/search engine
  • search: compatibility wrapper script
  • bm25tool.py: compatibility import wrapper
  • test_bm25tool.py: tests
  • SEARCH.md: short usage notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searxh-0.1.1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searxh-0.1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file searxh-0.1.1.tar.gz.

File metadata

  • Download URL: searxh-0.1.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.1.tar.gz
Algorithm Hash digest
SHA256 07bb33fb1e486d73937dc2b1b2ab190cfe60f0992766b919b52fa11e729a6d0c
MD5 621bd885e0c337d2c89225de92ef8332
BLAKE2b-256 59ba19b30526b5b3cf21821bde7289fc27fccaaec165f417ebd41eb5a70e2bbe

See more details on using hashes here.

File details

Details for the file searxh-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: searxh-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 71cd3e59dd2d0408645f88a409959c94fd6a6499d255144dad746f65b902fcde
MD5 c26a6ba0a3a1769c7deb7b87114a4eba
BLAKE2b-256 089bc1e0ac1a5fb5ff677d8afb0901a9c40acfd30bb0bfc5625f12eeae824bf3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page