Skip to main content

Full text and vector search library with pluggable storage (file, SQL, Redis). Fork of Whoosh.

Project description

Wesh?!

Wesh?! is a fast, pure-Python full-text and vector search library with pluggable storage: filesystem, RAM, SQL, or Redis.

It's a 3.0 reboot of Whoosh by Matt Chaput (via the Sygil-Dev/whoosh-reloaded fork). Both upstreams are unmaintained.

The original upstream README is preserved as README-whoosh.md.

Install

pip install wesh
# with vector / semantic search (lightweight, works on Alpine too):
pip install "wesh[model2vec]"
# or with SQL backend:
pip install "wesh[sql]"
# or with Redis backend:
pip install "wesh[redis]"

Python 3.12+. Core deps are tiny (orjson, loguru). Backends and embedders are opt-in extras: [sql], [redis], [vector], [model2vec], [fastembed], [hnsw], [openai], or [all] for the lot.

Quick start

from wesh import index
from wesh.fields import Schema, TEXT, ID
from wesh.qparser import QueryParser

schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = index.create_in("./indexdir", schema)

with ix.writer() as w:
    w.add_document(title="First", path="/a", content="hello world")
    w.add_document(title="Second", path="/b", content="goodbye world")

with ix.searcher() as s:
    q = QueryParser("content", ix.schema).parse("hello")
    for hit in s.search(q):
        print(hit["title"], hit.score)

Want to store the index elsewhere? Pass a URL instead of a path:

ix = index.create_in("sqlite:///./wesh.db", schema)
ix = index.create_in("postgresql://user:pw@host/db", schema)
ix = index.create_in("redis://localhost:6379/0", schema)

Full documentation: build it locally with uv run zensical build (see site/) or read the Markdown sources under docs/.

What's new in 3.0

  • Pluggable storage backends. File, RAM, SQL (SQLAlchemy), Redis. Same Wesh API across all of them.
  • Vector search. VectorField + KnnQuery + an Embedder ABC. Three embedders ship in-box: Model2VecEmbedder (pure-numpy, runs everywhere — recommended default), FastembedEmbedder (ONNX, more models, glibc-only), and OpenAIEmbedder (HTTP, BYO endpoint).
  • Columnar numeric values with an auto-routed fast path for numeric range queries (5.9× faster than the in-RAM W3 codec at 64K docs on the SQL hybrid codec).
  • Facets v1 — count-histogram facets with Typesense-style filter semantics.
  • Python 3.12 floor; everything below the public API surface was rebuilt.
  • Modernised tooling: uv, ruff, mypy, pyrefly, ty, orjson, nox, pytest.

Full release notes: CHANGES.md and docs/releases/3_0.md.

Performance, honestly

Pure Python is 10–50× slower than native engines (Lucene, tantivy, …). If you need sub-millisecond queries over billions of documents, use one of those. Wesh?! is sized for the 10K–10M document range with patient indexing, where the win is "embedded in your Python app, no separate service, no native build."

That said, the SQL hybrid codec ended up faster than the in-RAM W3 codec at 64K docs on most query shapes — see the backends overview and the design history under notes/refactor-pluggable-backends.md.

Development

git clone https://github.com/abilian/wesh
cd wesh
uv sync                          # install dev deps
uv run pytest tests/             # 962 tests + 2 vector-extra skips
uv run nox -s check              # ruff + mypy + ty
uv run zensical build            # build the docs (output: site/)

The cross-backend correctness check is the canonical perf-affecting guardrail:

uv run python benchmark/equivalence.py

It builds the same corpus on every backend and asserts identical observable outputs (matched docs, ranked order, term info, doc count). Exits 1 on mismatch.

Lineage

Whoosh was created by Matt Chaput at Side Effects Software for the Houdini documentation site, then open-sourced. Upstream Whoosh stopped around 2012 (1.x line) and 2014 (2.x line). The Sygil-Dev/whoosh-reloaded fork kept the 2.x line on Python 3 but is itself unmaintained.

Wesh?! picks up from the Sygil-Dev branch, drops the legacy compatibility layers, rebuilds the storage layer around an abstract Storage ABC, and adds vector search. The Python import name was renamed from whoosh to wesh for 3.0.

License

BSD 2-clause (inherited from upstream Whoosh). See LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wesh-3.1.1.tar.gz (593.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wesh-3.1.1-py3-none-any.whl (528.5 kB view details)

Uploaded Python 3

File details

Details for the file wesh-3.1.1.tar.gz.

File metadata

  • Download URL: wesh-3.1.1.tar.gz
  • Upload date:
  • Size: 593.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.1.1.tar.gz
Algorithm Hash digest
SHA256 dda1f4a4d700660136c78be89571cb71f616a8b48a2307494828cdbe23435bee
MD5 0d36b1ae1ef64748f727eda591d77a8d
BLAKE2b-256 3a80713c2e8af76bf239ad9eb7fdefcc0581482fb15611fe708e665e1caeae00

See more details on using hashes here.

File details

Details for the file wesh-3.1.1-py3-none-any.whl.

File metadata

  • Download URL: wesh-3.1.1-py3-none-any.whl
  • Upload date:
  • Size: 528.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5a46a8f63144c15ed6e4dc6a837f14f0f2b93466767da17b8052193cf110f73
MD5 25d636b78dd0696f25ca96899ae8c721
BLAKE2b-256 a505271458eacb5b2f0950fcaa7c0cf2dbb30ae3201a5ab41a3da7c4504ad226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page