Skip to main content

SIE integration for Weaviate

Project description

sie-weaviate

SIE integration for Weaviate v4.

Two integration paths

1. Client-side (this package, works now)

sie-weaviate provides vectorizer helpers that call SIE's encode() and return vectors in the format Weaviate expects. You configure collections with Configure.Vectors.self_provided() and pass vectors on insert/query.

pip install sie-weaviate
import weaviate
import weaviate.classes as wvc
from sie_weaviate import SIEVectorizer

vectorizer = SIEVectorizer(base_url="http://localhost:8080", model="BAAI/bge-m3")

client = weaviate.connect_to_local()
try:
    collection = client.collections.create(
        "Documents",
        properties=[wvc.config.Property(name="text", data_type=wvc.config.DataType.TEXT)],
        vector_config=wvc.config.Configure.Vectors.self_provided(),
    )

    texts = ["first doc", "second doc"]
    vectors = vectorizer.embed_documents(texts)
    collection.data.insert_many([
        wvc.data.DataObject(properties={"text": t}, vector=v)
        for t, v in zip(texts, vectors)
    ])

    query_vec = vectorizer.embed_query("search text")
    results = collection.query.near_vector(near_vector=query_vec, limit=5)
finally:
    client.close()

2. Server-side module (partnership, planned)

A text2vec-sie Go module for the Weaviate server that enables native vectorizer config (Configure.Vectorizer.text2vec_sie(...)). See weaviate-module-spec/ for the spec and reference implementation.

Named vectors (dense + sparse)

SIE's multi-output encode produces dense and sparse vectors in one call. Weaviate's named vectors feature stores them separately:

from sie_weaviate import SIENamedVectorizer

vectorizer = SIENamedVectorizer(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3",
    output_types=["dense", "sparse"],
)

collection = client.collections.create(
    "Documents",
    properties=[wvc.config.Property(name="text", data_type=wvc.config.DataType.TEXT)],
    vector_config=[
        wvc.config.Configure.Vectors.self_provided(name="dense"),
        wvc.config.Configure.Vectors.self_provided(name="sparse"),
    ],
)

named = vectorizer.embed_documents(["hello world"])
collection.data.insert_many([
    wvc.data.DataObject(properties={"text": "hello world"}, vector=named[0])
])

Storage note: SIE sparse vectors (SPLADE/BGE-M3) are expanded to full vocabulary length (~30K floats per document for BERT-based models) so that positional information is preserved for similarity search. At large scale this is significant storage. If you only need keyword-style hybrid search, use Weaviate's built-in BM25 instead — it requires no extra vectors:

results = collection.query.hybrid(query="search text", alpha=0.75)

Testing

# Unit tests (no server needed)
pytest

# Integration tests (requires SIE + Weaviate)
pytest -m integration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sie_weaviate-0.1.9.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sie_weaviate-0.1.9-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file sie_weaviate-0.1.9.tar.gz.

File metadata

  • Download URL: sie_weaviate-0.1.9.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sie_weaviate-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a3f604155999d402a23047f468642f6d73741e82761a828fb7b1073ed22fb027
MD5 1f8b017bd0254596451cf63f43881d8e
BLAKE2b-256 a249f433d2174ee0177ca9b88db6fcbca9053e4652406a714c72d9fbf9a0d2b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sie_weaviate-0.1.9.tar.gz:

Publisher: release-python.yml on superlinked/sie-internal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sie_weaviate-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: sie_weaviate-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sie_weaviate-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b600f2689dcad4f1b0a9612aeb7738a7ac3d9b34941423b9e6b10e5ba6002a24
MD5 8978472348a9ff6413b555e58bd0bc4e
BLAKE2b-256 4b4a40de66ea977c805d1988e189428065ec78c1d77e0be2529a0f4ba1e87346

See more details on using hashes here.

Provenance

The following attestation bundles were made for sie_weaviate-0.1.9-py3-none-any.whl:

Publisher: release-python.yml on superlinked/sie-internal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page