Skip to main content

Full text and vector search library with pluggable storage (file, SQL, Redis). Fork of Whoosh.

Project description

Wesh?!

Wesh?! is a fast, pure-Python full-text and vector search library with pluggable storage: filesystem, RAM, SQL, or Redis.

It's a 3.0 reboot of Whoosh by Matt Chaput (via the Sygil-Dev/whoosh-reloaded fork). Both upstreams are unmaintained.

The original upstream README is preserved as README-whoosh.md.

Install

pip install wesh
# with vector / semantic search (lightweight, works on Alpine too):
pip install "wesh[model2vec]"
# or with SQL backend:
pip install "wesh[sql]"
# or with Redis backend:
pip install "wesh[redis]"

Python 3.12+. Core deps are tiny (orjson, loguru). Backends and embedders are opt-in extras: [sql], [redis], [vector], [model2vec], [fastembed], [hnsw], [openai], or [all] for the lot.

Quick start

from wesh import index
from wesh.fields import Schema, TEXT, ID
from wesh.qparser import QueryParser

schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
ix = index.create_in("./indexdir", schema)

with ix.writer() as w:
    w.add_document(title="First", path="/a", content="hello world")
    w.add_document(title="Second", path="/b", content="goodbye world")

with ix.searcher() as s:
    q = QueryParser("content", ix.schema).parse("hello")
    for hit in s.search(q):
        print(hit["title"], hit.score)

Want to store the index elsewhere? Pass a URL instead of a path:

ix = index.create_in("sqlite:///./wesh.db", schema)
ix = index.create_in("postgresql://user:pw@host/db", schema)
ix = index.create_in("redis://localhost:6379/0", schema)

Full documentation: build it locally with uv run zensical build (see site/) or read the Markdown sources under docs/.

What's new in 3.0

  • Pluggable storage backends. File, RAM, SQL (SQLAlchemy), Redis. Same Wesh API across all of them.
  • Vector search. VectorField + KnnQuery + an Embedder ABC. Three embedders ship in-box: Model2VecEmbedder (pure-numpy, runs everywhere — recommended default), FastembedEmbedder (ONNX, more models, glibc-only), and OpenAIEmbedder (HTTP, BYO endpoint).
  • Columnar numeric values with an auto-routed fast path for numeric range queries (5.9× faster than the in-RAM W3 codec at 64K docs on the SQL hybrid codec).
  • Facets v1 — count-histogram facets with Typesense-style filter semantics.
  • Python 3.12 floor; everything below the public API surface was rebuilt.
  • Modernised tooling: uv, ruff, mypy, pyrefly, ty, orjson, nox, pytest.

Full release notes: CHANGES.md and docs/releases/3_0.md.

Performance, honestly

Pure Python is 10–50× slower than native engines (Lucene, tantivy, …). If you need sub-millisecond queries over billions of documents, use one of those. Wesh?! is sized for the 10K–10M document range with patient indexing, where the win is "embedded in your Python app, no separate service, no native build."

That said, the SQL hybrid codec ended up faster than the in-RAM W3 codec at 64K docs on most query shapes — see the backends overview and the design history under notes/refactor-pluggable-backends.md.

Development

git clone https://github.com/sfermigier/wheesh
cd wheesh
uv sync                          # install dev deps
uv run pytest tests/             # 962 tests + 2 vector-extra skips
uv run nox -s check              # ruff + mypy + ty
uv run zensical build            # build the docs (output: site/)

The cross-backend correctness check is the canonical perf-affecting guardrail:

uv run python benchmark/equivalence.py

It builds the same corpus on every backend and asserts identical observable outputs (matched docs, ranked order, term info, doc count). Exits 1 on mismatch.

Lineage

Whoosh was created by Matt Chaput at Side Effects Software for the Houdini documentation site, then open-sourced. Upstream Whoosh stopped around 2012 (1.x line) and 2014 (2.x line). The Sygil-Dev/whoosh-reloaded fork kept the 2.x line on Python 3 but is itself unmaintained.

Wesh?! picks up from the Sygil-Dev branch, drops the legacy compatibility layers, rebuilds the storage layer around an abstract Storage ABC, and adds vector search. The Python import name was renamed from whoosh to wesh for 3.0.

License

BSD 2-clause (inherited from upstream Whoosh). See LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wesh-3.1.0.tar.gz (591.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wesh-3.1.0-py3-none-any.whl (527.3 kB view details)

Uploaded Python 3

File details

Details for the file wesh-3.1.0.tar.gz.

File metadata

  • Download URL: wesh-3.1.0.tar.gz
  • Upload date:
  • Size: 591.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.1.0.tar.gz
Algorithm Hash digest
SHA256 561e8404f2627e45a7ab5def2b52a02e7eb71ccce89be20bc8106ff80a32dadb
MD5 f393b564c90a94f41aa4ca399fb9edf9
BLAKE2b-256 59626e61fc843f4b5c44a89296376db67c66e7c0aac42cb05ef9ddb14d9f3b7f

See more details on using hashes here.

File details

Details for the file wesh-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: wesh-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 527.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wesh-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 846659b26f24a750367660cb1b4098a837292adcfdc51c64331b9460b786fbb6
MD5 87711d72cc575a58bc254949f8b72096
BLAKE2b-256 021b3a67395da1686d29e18f1932275921e62c818ef4e999bfe06fdcfb29a48f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page