Skip to main content

Full text and vector search library with pluggable storage (file, SQL, Redis). Fork of Whoosh.

Project description

Wesh?!

Wesh?! is a fast, pure-Python full-text and vector search library with pluggable storage: filesystem, RAM, SQL, or Redis.

It's a 3.0 reboot of Whoosh by Matt Chaput (via the Sygil-Dev/whoosh-reloaded fork). Both upstreams are unmaintained.

The original upstream README is preserved as README-whoosh.md.

Install

pip install wesh
# with vector / semantic search:
pip install "wesh[fastembed]"

Python 3.12+. Pure Python except for orjson, sqlalchemy>=2.0, redis>=5.0, loguru, cached-property. Optional [vector] / [fastembed] / [hnsw] / [openai] extras pull in numpy and the matching embedder backend.

Quick start

from wesh import index
from wesh.fields import Schema, TEXT, ID
from wesh.qparser import QueryParser

schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = index.create_in("./indexdir", schema)

with ix.writer() as w:
    w.add_document(title="First", path="/a", content="hello world")
    w.add_document(title="Second", path="/b", content="goodbye world")

with ix.searcher() as s:
    q = QueryParser("content", ix.schema).parse("hello")
    for hit in s.search(q):
        print(hit["title"], hit.score)

Want to store the index elsewhere? Pass a URL instead of a path:

ix = index.create_in("sqlite:///./wesh.db", schema)
ix = index.create_in("postgresql://user:pw@host/db", schema)
ix = index.create_in("redis://localhost:6379/0", schema)

Full documentation: build it locally with uv run zensical build (see site/) or read the Markdown sources under docs/.

What's new in 3.0

  • Pluggable storage backends. File, RAM, SQL (SQLAlchemy), Redis. Same Wesh API across all of them.
  • Vector search. VectorField + KnnQuery + an Embedder ABC, with FastembedEmbedder and FakeEmbedder in-box.
  • Columnar numeric values with an auto-routed fast path for numeric range queries (5.9× faster than the in-RAM W3 codec at 64K docs on the SQL hybrid codec).
  • Facets v1 — count-histogram facets with Typesense-style filter semantics.
  • Python 3.12 floor; everything below the public API surface was rebuilt.
  • Modernised tooling: uv, ruff, mypy, pyrefly, ty, orjson, nox, pytest.

Full release notes: CHANGES.md and docs/releases/3_0.md.

Performance, honestly

Pure Python is 10–50× slower than native engines (Lucene, tantivy, …). If you need sub-millisecond queries over billions of documents, use one of those. Wesh?! is sized for the 10K–10M document range with patient indexing, where the win is "embedded in your Python app, no separate service, no native build."

That said, the SQL hybrid codec ended up faster than the in-RAM W3 codec at 64K docs on most query shapes — see the backends overview and the design history under notes/refactor-pluggable-backends.md.

Development

git clone https://github.com/sfermigier/wheesh
cd wheesh
uv sync                          # install dev deps
uv run pytest tests/             # 962 tests + 2 vector-extra skips
uv run nox -s check              # ruff + mypy + ty
uv run zensical build            # build the docs (output: site/)

The cross-backend correctness check is the canonical perf-affecting guardrail:

uv run python benchmark/equivalence.py

It builds the same corpus on every backend and asserts identical observable outputs (matched docs, ranked order, term info, doc count). Exits 1 on mismatch.

Lineage

Whoosh was created by Matt Chaput at Side Effects Software for the Houdini documentation site, then open-sourced. Upstream Whoosh stopped around 2012 (1.x line) and 2014 (2.x line). The Sygil-Dev/whoosh-reloaded fork kept the 2.x line on Python 3 but is itself unmaintained.

Wesh?! picks up from the Sygil-Dev branch, drops the legacy compatibility layers, rebuilds the storage layer around an abstract Storage ABC, and adds vector search. The Python import name was renamed from whoosh to wesh for 3.0.

License

BSD 2-clause (inherited from upstream Whoosh). See LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wesh-3.0.0.tar.gz (586.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wesh-3.0.0-py3-none-any.whl (523.6 kB view details)

Uploaded Python 3

File details

Details for the file wesh-3.0.0.tar.gz.

File metadata

  • Download URL: wesh-3.0.0.tar.gz
  • Upload date:
  • Size: 586.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.0.0.tar.gz
Algorithm Hash digest
SHA256 fe5c4313048699b09f8691a551fefe3160720bb971baad70782adb356d06be19
MD5 75b5624f0440c5f0cc147a80625299f9
BLAKE2b-256 8d4cdb9b05b57b8206900e7166ce51bbf7908cbaa800b910ab1eb6807c44adfa

See more details on using hashes here.

File details

Details for the file wesh-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: wesh-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 523.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57e2eea484e7c853275e7829e51dc8ff3bfde122a075d26184a6e2e2940835ba
MD5 bef9dfd530f4d2151135ffc2efbd217f
BLAKE2b-256 121b8180a5d6040ffcd4df16b90557cc70234dc2268e16d0500625442eafad51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page