A clean, multi-tenant Weaviate wrapper for isolated data management.
Project description
WeavScope ๐ญ
A clean, multi-tenant wrapper for Weaviate โ batteries-included, no boilerplate.
WeavScope lets you interact with Weaviate using a simple, Pythonic API. It handles the full lifecycle: connecting, creating collections, managing tenants, inserting vectors, and searching โ all with one consistent interface. Stop writing boilerplate; start building.
Table of Contents
- Why WeavScope?
- How It Works
- Installation
- Core Concepts
- Quick Start (Two-Step Pattern)
- Detailed Usage Guide
- All Query Methods
- Supported Embedding Providers
- Error Handling
- Architecture Overview
- AI/LLM Documentation
- License
Why WeavScope?
Working with Weaviate directly involves a lot of ceremony: creating clients, managing connections, building multi-tenancy configs, handling batch contexts, deserializing responses, and cleaning up after yourself. WeavScope abstracts all of that away.
| Without WeavScope | With WeavScope |
|---|---|
Manual connect_to_custom(...) calls |
Auto-connects from WeaviateConfig |
Manually build Configure.multi_tenancy(...) |
ensure_collection() handles it |
| Manually create/delete tenants | Auto-created & deleted by context manager |
Manage batch.dynamic() context |
scope.batch.add_objects(...) โ done |
| Deserialize raw Weaviate objects | Results are plain Python dicts |
How It Works
WeavScope is built around a two-step pattern, because collection creation (schema setup) is a one-time operation, while tenant-scoped data operations happen repeatedly:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STEP 1 (once): WeavScope.ensure_collection() โ
โ โ Creates the Weaviate collection with multi-tenancy enabled. โ
โ โ Idempotent: safe to call again โ skips if already exists. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STEP 2 (per tenant): with WeavScope(config, tenant_id="...") as ws: โ
โ โ Creates the tenant on __enter__ โ
โ โ Exposes ws.batch โ insert objects โ
โ โ Exposes ws.query โ search objects โ
โ โ Deletes the tenant + all its data on __exit__ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This separation ensures the collection schema exists before any tenant operations happen, and the context manager keeps each "scope" of work clean and isolated.
Installation
pip install weavscope
Requires Python 3.11+ and a running Weaviate instance (v1.24+ recommended for multi-tenancy support).
Core Concepts
WeaviateConfig
WeaviateConfig is a plain Python dataclass that holds all your connection and embedding settings. There is no hidden env-var magic โ you control how values are supplied (hardcoded, from os.environ, from a secrets manager, etc.).
from weavscope import WeaviateConfig
config = WeaviateConfig(
WEAVIATE_HOST="localhost", # Weaviate instance hostname or IP
WEAVIATE_PORT=8080, # HTTP port (default: 8080)
WEAVIATE_GRPC_PORT=50051, # gRPC port (default: 50051)
WEAVIATE_CLASS_NAME="MyCollection", # Collection (class) name in Weaviate
WEAVIATE_EMBEDDING_MODEL_PROVIDER="openai", # Embedding provider
WEAVIATE_EMBEDDING_MODEL_NAME="text-embedding-3-small", # Model name
WEAVIATE_API_KEY="", # Weaviate API key (empty = no auth)
WEAVIATE_EMBEDDING_MODEL_API_KEY="", # Embedding provider API key
)
Key fields:
| Field | Type | Default | Description |
|---|---|---|---|
WEAVIATE_HOST |
str |
โ | Hostname or IP of your Weaviate instance |
WEAVIATE_PORT |
int |
8080 |
HTTP API port |
WEAVIATE_GRPC_PORT |
int |
50051 |
gRPC port (required for batch imports) |
WEAVIATE_USE_GRPC |
bool |
True |
Use gRPC for batch inserts (faster). Set False for environments without gRPC |
WEAVIATE_CLASS_NAME |
str |
โ | Name of the Weaviate collection (PascalCase recommended) |
WEAVIATE_API_KEY |
str |
"" |
Weaviate auth key. Leave empty for open/anonymous instances |
WEAVIATE_EMBEDDING_MODEL_PROVIDER |
str |
โ | Embedding provider (see Supported Providers) |
WEAVIATE_EMBEDDING_MODEL_NAME |
str |
โ | Model name for the selected provider |
WEAVIATE_EMBEDDING_MODEL_API_KEY |
str |
"" |
API key for the embedding provider |
WeavScope
WeavScope is the main entry point. It connects to Weaviate on instantiation and exposes two sub-interfaces:
scope.batchโ for inserting data (WeavScopeBatch)scope.queryโ for searching data (WeavScopeQuery)
It can be used as a context manager (recommended) or manually with try/finally.
# Context manager (recommended)
with WeavScope(config, tenant_id="my-tenant") as scope:
# tenant is created here automatically
scope.batch.add_objects(objects=[...], id_field="title")
results = scope.query.hybrid("my search query")
# tenant is deleted and connection is closed here automatically
# Manual usage (when you need more control)
scope = WeavScope(config)
try:
scope.ensure_tenant("my-tenant")
scope.batch.add_objects(objects=[...], tenant_id="my-tenant")
results = scope.query.hybrid("my query", tenant_id="my-tenant")
scope.delete_tenant("my-tenant")
finally:
scope.close()
Tenants
WeavScope is built around Weaviate's multi-tenancy feature, which provides data isolation at the tenant level. Each tenant is a logically separate storage space within the same collection.
- Tenants are identified by a string ID (e.g.,
"project-A","user-123","event-42"). - When you use
WeavScope(config, tenant_id="..."), the tenant is auto-created on enter and auto-deleted (with all its data) on exit. - If you want tenants to persist after the scope exits, manage them manually (don't pass
tenant_idto the constructor).
Why tenants? They let multiple isolated workloads share a single Weaviate collection without interfering with each other. Ideal for multi-user applications, per-project vector stores, or ephemeral session data.
Batch Insertions
scope.batch.add_objects(...) handles inserting a list of dictionaries into a tenant. It supports:
- gRPC batching (fast, default if
WEAVIATE_USE_GRPC=True) - REST fallback (sequential inserts if gRPC is disabled)
- Deterministic UUIDs โ pass
id_field="title"to generate a UUID from(object_value, tenant_id), ensuring idempotent inserts (inserting the same object twice won't duplicate it)
Querying
scope.query exposes four search methods, all returning a list of plain Python dicts:
| Method | Description |
|---|---|
.hybrid(query) |
BM25 keyword + vector similarity (recommended default) |
.near_text(query) |
Pure semantic (vector) search by text |
.near_vector(vector) |
Vector search using a pre-computed embedding |
.bm25(query) |
Pure keyword (BM25) search |
.fetch_all() |
Fetch all objects from a tenant |
.fetch_by_id(uuid) |
Fetch a single object by UUID |
Each result dict has the shape:
{
"uuid": "...",
"properties": { "title": "...", "content": "...", ... },
"score": 0.87, # hybrid/BM25 score
"distance": 0.12, # vector distance
"certainty": 0.88, # semantic certainty
}
Quick Start (Two-Step Pattern)
Here's the minimal, complete example to get running with a local Weaviate instance:
from weavscope import WeavScope, WeaviateConfig
# Configure your connection (no credentials needed for a local open instance)
config = WeaviateConfig(
WEAVIATE_HOST="localhost",
WEAVIATE_PORT=8080,
WEAVIATE_GRPC_PORT=50051,
WEAVIATE_CLASS_NAME="Articles",
WEAVIATE_EMBEDDING_MODEL_PROVIDER="gemini",
WEAVIATE_EMBEDDING_MODEL_NAME="gemini-embedding-001",
WEAVIATE_EMBEDDING_MODEL_API_KEY="your-gemini-api-key",
)
# STEP 1: Create the collection (run once โ idempotent, safe to repeat)
setup = WeavScope(config)
try:
setup.ensure_collection(
provider="gemini",
model="gemini-embedding-001"
)
finally:
setup.close()
# STEP 2: Operate within a tenant scope
# The tenant "project-A" is auto-created on entry and auto-deleted on exit.
with WeavScope(config, tenant_id="project-A") as scope:
# Insert documents โ UUIDs are derived deterministically from the title field
scope.batch.add_objects(
objects=[
{"title": "Intro to AI", "content": "AI is changing the world..."},
{"title": "Vector DBs", "content": "Vector databases are cool."},
],
id_field="title"
)
# Search using hybrid (BM25 + vector) search
results = scope.query.hybrid("machine learning")
for hit in results:
print(f"Found: {hit['properties']['title']} (score: {hit['score']})")
# Connection is closed and tenant "project-A" (with all its data) is deleted.
Why two steps? Weaviate requires the collection (schema) to exist before tenants can be added to it.
ensure_collection()is idempotent โ safe to call every time, but typically run once during app startup or deployment.
Detailed Usage Guide
Step 1: Define Your Configuration
All settings live in one WeaviateConfig object. Use os.environ to pull secrets from environment variables:
import os
from weavscope import WeaviateConfig
config = WeaviateConfig(
WEAVIATE_HOST=os.environ.get("WEAVIATE_HOST", "localhost"),
WEAVIATE_PORT=int(os.environ.get("WEAVIATE_PORT", 8080)),
WEAVIATE_GRPC_PORT=int(os.environ.get("WEAVIATE_GRPC_PORT", 50051)),
WEAVIATE_CLASS_NAME="Articles",
WEAVIATE_API_KEY=os.environ.get("WEAVIATE_API_KEY", ""),
WEAVIATE_EMBEDDING_MODEL_PROVIDER="gemini",
WEAVIATE_EMBEDDING_MODEL_NAME="gemini-embedding-001",
WEAVIATE_EMBEDDING_MODEL_API_KEY=os.environ["GEMINI_API_KEY"],
)
For open/anonymous local Weaviate instances (no auth), leave WEAVIATE_API_KEY empty (it defaults to ""). The embedding model key is only required if you're using a hosted model (OpenAI, Gemini, Cohere, etc.) for server-side vectorization. If you're supplying your own pre-computed vectors, use provider="custom" and omit the embedding key.
Step 2: Create the Collection
The collection is the Weaviate "class" (schema) that holds all your data. Multi-tenancy is enabled automatically.
from weavscope import WeavScope
setup = WeavScope(config)
try:
setup.ensure_collection(
provider="gemini", # Which embedding provider powers this collection
model="gemini-embedding-001" # The specific model to use for vectorization
)
finally:
setup.close()
ensure_collection() is idempotent โ if the collection already exists, it does nothing and logs a debug message. Run it at startup without worry.
You can also add extra properties to the schema:
from weaviate.classes.config import Property, DataType
setup.ensure_collection(
provider="openai",
model="text-embedding-3-small",
extra_properties=[
Property(name="author", data_type=DataType.TEXT),
Property(name="published_at", data_type=DataType.DATE),
Property(name="word_count", data_type=DataType.INT),
]
)
Note:
tenant_idandobject_idproperties are always added automatically as base properties by WeavScope.
Step 3: Insert Data in a Tenant Scope
with WeavScope(config, tenant_id="project-A") as scope:
scope.batch.add_objects(
objects=[
{"title": "Intro to AI", "content": "Artificial Intelligence is..."},
{"title": "Deep Learning", "content": "Neural networks learn by..."},
{"title": "RAG Systems", "content": "Retrieval Augmented Generation..."},
],
id_field="title" # Use "title" to generate deterministic UUIDs
)
How deterministic UUIDs work: When you specify id_field="title", WeavScope generates a UUID from the combination of the field value and the tenant ID using a UUID5 hash. This means:
- Inserting the same object into the same tenant produces the same UUID every time.
- Re-running your ingestion pipeline won't create duplicate records.
- Objects with the same title in different tenants get different UUIDs.
Inserting a single object:
scope.batch.add_object(
properties={"title": "One Document", "content": "..."},
id_field="title"
)
Inserting with pre-computed vectors (custom provider):
my_vector = [0.1, 0.3, 0.5, ...] # Your own embedding
scope.batch.add_object(
properties={"title": "Doc", "content": "..."},
vector=my_vector
)
Deleting objects by filter:
scope.batch.delete_objects_where(
filter_property="title",
filter_value="Intro to AI"
)
Step 4: Query Within the Scope
with WeavScope(config, tenant_id="project-A") as scope:
# ... (insert objects) ...
results = scope.query.hybrid(
query_text="neural networks",
limit=5, # Return up to 5 results (default: 10)
alpha=0.75, # 0.0 = pure BM25, 1.0 = pure vector (default: 0.75)
)
for hit in results:
print(f"[{hit['score']:.3f}] {hit['properties']['title']}")
All Query Methods
scope.query.hybrid(query_text, ...)
Combines BM25 (keyword) and vector (semantic) search. The alpha parameter controls the blend.
results = scope.query.hybrid(
query_text="machine learning tutorial",
limit=10,
alpha=0.75, # 75% vector, 25% BM25
exclude_property="title", # Optional: filter out objects where...
exclude_value="Intro to AI", # ...title == "Intro to AI"
return_properties=["title"], # Optional: only return specific properties
)
scope.query.near_text(query_text, ...)
Pure semantic search โ finds objects whose vectors are closest to the query text's embedding.
results = scope.query.near_text(
query_text="deep neural architectures",
limit=5,
certainty=0.8, # Minimum similarity threshold (0.0โ1.0)
distance=0.2, # Maximum vector distance (alternative to certainty)
)
scope.query.near_vector(vector, ...)
Search using a pre-computed embedding vector. Useful when you already have an embedding from your own pipeline.
my_embedding = [0.12, 0.45, ...] # 768-dim or however many dims your model uses
results = scope.query.near_vector(
vector=my_embedding,
limit=5,
certainty=0.7,
)
scope.query.bm25(query_text, ...)
Pure keyword search (no vectors). Fast and effective for exact or near-exact term matching.
results = scope.query.bm25(
query_text="vector database performance",
limit=10,
properties=["title", "content"], # Only search within these fields
)
scope.query.fetch_all(limit=100, ...)
Retrieve all objects in a tenant up to a limit.
all_docs = scope.query.fetch_all(limit=50, return_properties=["title"])
scope.query.fetch_by_id(uuid, ...)
Retrieve a single object by its Weaviate UUID.
doc = scope.query.fetch_by_id("3fa85f64-5717-4562-b3fc-2c963f66afa6")
if doc:
print(doc["properties"]["title"])
Supported Embedding Providers
Pass the provider name as a string โ WeavScope maps it to the correct Weaviate vectorizer config internally.
| Provider Alias | Weaviate Vectorizer | Notes |
|---|---|---|
"openai" |
text2vec_openai |
OpenAI embedding models |
"gemini" |
text2vec_google_gemini |
Gemini Embedding API |
"cohere" |
text2vec_cohere |
Cohere embedding models |
"google" / "vertexai" |
text2vec_palm |
Legacy Vertex AI / PaLM |
"huggingface" |
text2vec_huggingface |
HuggingFace Inference API |
"voyageai" |
text2vec_voyageai |
VoyageAI embedding models |
"mistral" |
text2vec_mistral |
Mistral embedding models |
"jinaai" |
text2vec_jinaai |
Jina AI embedding models |
"azure" |
text2vec_azure_openai |
Azure OpenAI; pass deployment name as model |
"custom" |
None | You supply vectors manually via vector= |
The embedding model API key is passed to Weaviate via the appropriate provider-specific HTTP header (e.g., X-OpenAI-Api-Key, X-Goog-Api-Key) โ all handled automatically by WeavScope.
Error Handling
All WeavScope exceptions inherit from WeavscopeError, so you can catch them broadly or specifically:
from weavscope import (
WeavscopeError, # Base โ catch all WeavScope errors
WeavscopeConnectionError, # Failed to connect to Weaviate
WeavscopeCollectionError, # Collection create/delete failed
WeavscopeTenantError, # Tenant create/delete/list failed
WeavscopeBatchError, # Batch insert/delete failed
WeavscopeQueryError, # Query execution failed
)
try:
with WeavScope(config, tenant_id="project-A") as scope:
scope.batch.add_objects(objects=[...], id_field="title")
results = scope.query.hybrid("neural networks")
except WeavscopeConnectionError as e:
print(f"Could not reach Weaviate: {e}")
except WeavscopeBatchError as e:
print(f"Insertion failed: {e}")
except WeavscopeQueryError as e:
print(f"Search failed: {e}")
except WeavscopeError as e:
print(f"Unexpected WeavScope error: {e}")
Architecture Overview
weavscope/
โโโ __init__.py # Public API exports
โโโ config/
โ โโโ settings.py # WeaviateConfig dataclass
โโโ core/
โ โโโ connection.py # Weaviate client factory (connect_to_custom)
โ โโโ providers.py # Maps provider names โ Weaviate VectorConfig
โ โโโ store.py # WeavScope: collection & tenant lifecycle
โ โโโ batch.py # WeavScopeBatch: object insertion
โ โโโ query.py # WeavScopeQuery: all search methods
โโโ utils/
โโโ exceptions.py # Custom exception hierarchy
โโโ logging.py # Structured logger setup
โโโ uuid.py # Deterministic UUID5 generation
Data flow for a batch insert:
User โ scope.batch.add_objects(objects, id_field)
โ WeavScopeBatch._store.ensure_tenant(tenant_id)
โ Generate UUID5(object[id_field] + tenant_id) [if id_field set]
โ collection.with_tenant(tenant_id).batch.dynamic()
โ batch.add_object(properties=obj, uuid=uuid)
โ Weaviate vectorizes server-side using configured provider
โ Stores (properties + vector) in tenant's shard
AI/LLM Documentation
For AI coding assistants and LLMs looking for an in-depth technical overview of WeavScope's architecture and API, see LLM.txt.
License
MIT โ Copyright ยฉ 2026 Tahcin Ul Karim (Mycin)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file weavscope-0.1.1.tar.gz.
File metadata
- Download URL: weavscope-0.1.1.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1003e1cf53c79cdb6cee2a78b16340a8c79f7a6e5b131383b8714034841f1d0
|
|
| MD5 |
a140de84f4225faf3a60f6c493b9fe89
|
|
| BLAKE2b-256 |
18c341ce72bf3791c1aaf594b08f494dd9833ed0ad9b218cb62afe0edd3cc72a
|
File details
Details for the file weavscope-0.1.1-py3-none-any.whl.
File metadata
- Download URL: weavscope-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e0a6f8f4e24897343e0c574f3d7b93a1821ff187d9a8cfff776a7b2d406b2a8
|
|
| MD5 |
421c994501e0910d520fd5cb1dfa4fcb
|
|
| BLAKE2b-256 |
2945359b0f7e6eac98141ae0428877ec6bcaae052e993dfc4a820478446b973d
|