Skip to main content

Dependency-free BM25 indexer and search CLI for local code/docs

Project description

searxh

Small BM25 search tool for local code/docs. No third-party dependencies.

what it does

  • Indexes text/code files into SQLite (bm25.sqlite by default)
  • Supports incremental indexing (only changed files are reprocessed)
  • Ranks with BM25
  • Shows optional snippets with line numbers
  • Supports path/extension filters, JSON output, and colored matches

requirements

  • Python 3.9+

install

Editable install:

python3 -m pip install -e .

After install, use:

searxh index .
searxh "replication backlog"
sx index .
sx "replication backlog"

quick start

Build or update index:

./search index .

Check whether the current directory is indexed:

./search status

Search:

./search "replication backlog"

command forms

./search [global-options] index [root] [index-options]
./search [global-options] search "query"
./search [global-options] "query"

common examples

Full rebuild:

./search index . --full

Search with snippet + color:

./search --snippet --color "aof fsync"

Filter by path:

./search --path src/ "replication"

Filter by extension:

./search --ext .c,.h,.md "dict"

JSON output:

./search --json "cluster slots"

Custom index path:

./search index . --out /path/to/myindex.sqlite
./search --index /path/to/myindex.sqlite "term"

key options

  • --k: number of results (default 10)
  • --k1, --b: BM25 tuning knobs
  • --path-boost: extra weight for path token matches (default 1.5)
  • --stem: enable simple stemming
  • --no-stopwords: disable stopword filtering
  • --workers: indexing worker threads
  • --no-progress: hide indexing progress output

indexing behavior

  1. Scan files by extension/name and skip likely binary files.
  2. Compare mtime and size with index metadata.
  3. Reindex changed files and remove deleted files.
  4. Update postings and document metadata in SQLite.

If a file produces no tokens (for example, empty/whitespace-only), it is saved as a zero-length doc so incremental runs do not keep retrying it.

tests

python3 -m unittest discover -s . -p 'test_*.py' -q

troubleshooting

sqlite3.OperationalError: unable to open database file

  • Make sure the DB parent directory exists and is writable.
  • Try writing the DB in the current project first:
./search index . --out ./bm25.sqlite

Results look weak

  • Rebuild with --full
  • Try --stem
  • Increase --k
  • Recheck filters (--ext, --path)

files

  • src/sx_search/cli.py: CLI
  • src/sx_search/engine.py: indexing/search engine
  • search: compatibility wrapper script
  • bm25tool.py: compatibility import wrapper
  • test_bm25tool.py: tests
  • SEARCH.md: short usage notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searxh-0.1.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searxh-0.1.2-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file searxh-0.1.2.tar.gz.

File metadata

  • Download URL: searxh-0.1.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.2.tar.gz
Algorithm Hash digest
SHA256 345f4d930ffea86a6693b815067d54b3cb1569917060c82251a93335ea6e1626
MD5 062c912b8697014a49091e4f6e111f38
BLAKE2b-256 909995dc2f0c8304377c7b9f0b727763d5e55934cc3e4e493db84716fd61dfa3

See more details on using hashes here.

File details

Details for the file searxh-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: searxh-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for searxh-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e28d3be7a8f96ba28284003cfcc01c3986c4df1ab2e9157ad0e4bc40393d4dd9
MD5 120bba553bd3fcf0d80a7da7fd835ce3
BLAKE2b-256 234a4cee21f7dffafc5016d2dca3933a0e64b7f3c531124d23da36243d21d9f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page