Skip to main content

A lightweight ORM-style layer for Weaviate

Project description

Weaviate ORM

A Pythonic ORM-style layer for Weaviate

weaviate-orm provides a structured, object-oriented interface to the Weaviate vector database. Inspired by traditional ORM patterns, it allows you to define collections as Python classes, automatically generate Weaviate schemas, and perform CRUD and similarity-based queries without leaving the object-oriented paradigm.

Requirements

  • Python 3.11+
  • weaviate-client >= 4.16.0
  • Weaviate server >= 1.27.0
  • For examples using Ollama: a reachable Ollama instance with the "snowflake-arctic-embed2" embedding model available

Server Requirements

Weaviate 1.27.0 or higher is required. The weaviate-client >= 4.16.0 enforces this constraint. During engine initialization, the ORM performs a runtime check:

engine = Weaviate_Engine()
engine.create_all_schemas()  # Raises RuntimeError if server version < 1.27.0

Why 1.27.0?

  • Unified vector configuration API (Configure.Vectors.* instead of deprecated vectorizer_config)
  • Robust named vector support
  • Improved schema handling and stability

If your server doesn't meet this requirement, the engine will raise a clear error:

RuntimeError: Weaviate server version 1.25.4 is not supported. 
Please use Weaviate >= 1.27.0 to match the client requirements.

Features

  • Declarative Schema Definition
    Define collections using Python classes and descriptors for properties and cross-references.

  • Dynamic Schema Generation via Metaclass
    A metaclass extracts property and reference definitions to auto-generate Weaviate-compatible schema structures.

  • Polymorphic References
    Define references to base classes to accept objects from multiple subclass collections. Includes automatic type tracking and runtime type resolution.

  • Object-Oriented CRUD Operations
    Seamlessly create, retrieve, update, and delete data using instance methods — no raw queries needed.

  • Recursive Save & Update
    Handles deeply nested references and automatically saves or updates related objects when desired.

  • Flexible Engine Binding
    The engine takes any connection method from the weaviate library, allowing support for local, remote, or custom configurations.

  • UUID Management
    Built-in support for both uuid4 and uuid5 strategies — including UUID validation and immutability enforcement.

  • Optional Auto-loading of References
    Cross-references can be automatically resolved into full model instances when accessed.

  • Support for Near-Vector and Near-Text Queries
    Leverage Weaviate’s native similarity search APIs directly from your model classes.

  • Strict Typing and Validation
    Type enforcement and optional validation logic on properties and references using Python descriptors.

  • Fully Tested
    Includes both unit and integration tests with Pytest and Docker-based Weaviate setups.


Design Patterns

  • Descriptor Pattern
    Custom Property and Reference descriptors manage validation, casting, and schema representation of scalar and relational fields.

  • Metaclass-Based Schema Introspection
    A metaclass (Weaviate_Meta) dynamically collects all descriptors from a class and builds the full Weaviate schema. It also generates a dynamic __init__ constructor based on the field signatures.

  • Engine Abstraction Layer
    The Weaviate_Engine encapsulates connection logic and schema management. It supports any connection method compatible with the weaviate Python client (e.g., local, remote, or cloud).

  • Instance-Oriented CRUD
    CRUD operations (including recursive reference handling and UUID safety checks) are exposed as instance methods on models that inherit from Base_Model.

  • Reference Resolution Strategy
    Reference descriptors can be configured to auto-load full objects (not just UUIDs), and support both one-way and two-way references, as well as single or list cardinalities.


Installation

You can install weaviate_orm either from PyPI (once published) or directly from the source repository.

📦 From PyPI (recommended)

pip install weaviate-orm

🛠 From Git (development and testing)

git clone https://gitlab.opencode.de/bbsr_ida_public/weaviate_orm.git
cd weaviate_orm
pip install -e .

Note: Make sure your Python environment includes a compatible Weaviate client (>= 4.16.0):

pip install "weaviate-client>=4.16.0"

Quick Start

In this example, we will use a local Weaviate instance using text2vec_ollama as the default vectorizer with a local ollama instance running and providing "snowflake-arctic-embed2" as embedding model.

Create a Model (vector_config)

Create two related models Paper and Author and specify configuration for the weaviate schema.

from __future__ import annotations

import os
from uuid import UUID, uuid5
import datetime

from weaviate_orm.weaviate_base import Base_Model
from weaviate_orm.weaviate_property import Property
from weaviate_orm.weavitae_reference import Reference, Reference_Type

from weaviate.classes.config import Configure, DataType

# Get the host and port from environment variables
llm_host = os.getenv("LLM_HOST", "llm")
llm_port = int(os.getenv("LLM_PORT", 11434))

class Paper(Base_Model):
  # Use _vector_config (new API) – replaces deprecated vectorizer_config
  # Protected with underscore prefix to prevent accidental overrides
  _vector_config = Configure.Vectors.text2vec_ollama(
            api_endpoint = f"http://{llm_host}:{llm_port}", #api-endpoint for the local ollama model
            model = "snowflake-arctic-embed2", #Embedding model to use
            vectorize_collection_name = False
        )
    
    title = Property(cast_type=str, description="The title of the paper", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    abstract = Property(cast_type=str, description="The abstract of the paper", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    pub_date = Property(cast_type=datetime.date, description="The publication date of the paper", required=True, weaviate_type=DataType.DATE, skip_vectorization=True)
    doi = Property(cast_type=str, description="The doi of the paper", required=True, weaviate_type=DataType.TEXT, skip_vectorization=True)
    author = Reference(target_collection_name="Author", auto_loading=True, description="The author of the paper", reference_type=Reference_Type.SINGLE, way_type=Reference_Type.TWOWAY, required=False, skip_validation=True)
    co_authors = Reference(target_collection_name="Author", auto_loading=False, description="The co-authors of the paper", reference_type=Reference_Type.LIST, way_type=Reference_Type.ONEWAY, required=False, skip_validation=True)

    _namespace = UUID("eb8bc242-5f59-4a47-8230-0cea6fcc1028")

    def _get_uuid_name_string(self):
        return self.doi
        
class Author(Base_Model):
    first_name = Property(cast_type=str, description="The first name of the author", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    last_name = Property(cast_type=str, description="The last name of the author", required=True, weaviate_type=DataType.TEXT, vectorize_property_name=True)
    orc_id = Property(cast_type=str, description="The orc_id of the author", required=False, weaviate_type=DataType.TEXT, skip_vectorization=True)
    papers = Reference(target_collection_name="Paper", auto_loading=False, description="The papers of the author", reference_type=Reference_Type.LIST, way_type=Reference_Type.TWOWAY, required=False, skip_validation=True)

    _namespace = UUID("eb8bc242-5f59-4a47-8230-0cea6fcc1028")

    def _get_uuid_name_string(self) -> str:
        return f"{self.first_name} {self.last_name}"

Create a Weaviate Engine, Register Models, and Create Schema

from weaviate_orm.weaviate_engine import Weaviate_Engine
from weaviate import connect_to_local, connect_to_weaviate_cloud

# Get host and ports from environment variables
host = os.getenv("WEAVIATE_HOST", "vdatabase")
port = int(os.getenv("WEAVIATE_PORT", 8080))
grpc_port = int(os.getenv("WEAVIATE_GRPC_PORT", 50051))

# Initialize the Weaviate engine using the connect_to_local method and its parameters
engine = Weaviate_Engine(connect_to_local, host=host, port=port, grpc_port=grpc_port)

# Register the models with the engine
engine.register_all_models(Paper, Author)

# Create the schema in Weaviate
engine.create_all_schemas()

Create and Save Instances

# Example data
paper_data = {
    "title": "A Study on Weaviate ORM",
    "abstract": "This paper discusses the Weaviate ORM and its features.",
    "pub_date": datetime.datetime.now(datetime.timezone.utc),
    "doi": "10.1234/weaviate-orm",
}

author_data = {
    "first_name": "Allen",
    "last_name": "Turing",
    "orc_id": "0000-0002-1234-5678",
}

# Create an insance of author and paper

author = Author(first_name=author_data["first_name"],
                last_name=author_data["last_name"],
                orc_id=author_data["orc_id"])

paper = Paper(title=paper_data["title"],
                abstract=paper_data["abstract"],
                pub_date=paper_data["pub_date"],
                doi=paper_data["doi"],
                author=author)

# Save the author and paper to Weaviate
paper.save(include_references=True, recursive=True)

Read an Instance

# Retrieve the paper by its UUID
paper_uuid = paper.get_uuid()

paper_instance = Paper.get(paper_uuid, include_references=True)
print(f"Paper Title: {paper_instance.title}")
print(f"Author: {paper_instance.author.first_name} {paper_instance.author.last_name}")

Update an Instance

# Update the paper's title
paper_instance.title = "An Updated Study on Weaviate ORM"
paper_instance.update()

#Check updated instance
paper_instance = Paper.get(paper_uuid, include_references=True)
print(f"Paper Title: {paper_instance.title}")

Delete an Instance

# Delete the paper instance
paper_instance.delete()

Schema Configuration Access

The ORM uses protected class-level attributes for schema configuration to prevent accidental overrides. All schema configs are prefixed with an underscore and accessed via read-only class properties.

Configuring Collections

Define schema configurations using the underscored attributes:

from weaviate.classes.config import Configure

class Article(Base_Model):
    # Class-level schema configuration (protected with underscore)
    _vector_config = Configure.Vectors.text2vec_ollama(
        api_endpoint="http://llm:11434",
        model="snowflake-arctic-embed2",
        vectorize_collection_name=False
    )
    
    _description = "A collection of articles with vector embeddings"
    
    _inverted_index_config = None  # Optional index tuning
    _generative_config = None      # Optional generative configuration

Accessing Configurations

You can read schema configurations at both class and instance levels:

# Class-level access (read-only)
print(Article.vector_config)  # Returns the configured vector
print(Article.description)     # Returns "A collection of articles..."

# Instance-level access
article = Article(...)
print(article.vector_config)   # Proxies to class-level config

Deprecation Path

Older code using non-underscored names will still work but emit a deprecation warning:

class LegacyModel(Base_Model):
    vector_config = Configure.Vectors.text2vec_ollama(...)  # DeprecationWarning
    description = "Legacy"  # DeprecationWarning

Migrate to the underscored versions to avoid future incompatibility:

class ModernModel(Base_Model):
    _vector_config = Configure.Vectors.text2vec_ollama(...)
    _description = "Modern"

Project Structure

weaviate_orm/
│
├── __init__.py              # Public interface and versioning
├── weaviate_base.py         # Base_Model with full CRUD logic and query support
├── weaviate_engine.py       # Manages Weaviate client and schema creation
├── weaviate_meta.py         # Metaclass for schema extraction and dynamic __init__
├── weaviate_property.py     # Descriptor for scalar fields
├── weavitae_reference.py    # Descriptor for references (one-way, two-way, single, list)
├── weaviate_decorators.py   # Client injection and async-to-sync conversion
└──  weaviate_utility.py      # Helpers for validation and reference comparison

License

This project is licensed under the GNU General Public License v3.0 (GPLv3).

You are free to use, modify, and distribute this software under the terms of the GPLv3 license. Any derivative work must also be distributed under the same license.

For full details, see the LICENSE.

Contributing & Credits

This project is created and maintained by the BBSR - IDA (Tobias Heimig-Elschner). Feel free to open issues or submit pull requests if you encounter bugs, have ideas, or want to improve the package.

📚 Citation

The project is published on Zenodo and can be cited as: Heimig, T. (2025). Weaviate ORM - (0.1.0). Bundesinstitut für Bau-, Stadt- und Raumforschung (BBSR). https://doi.org/10.58007/x1wa-rt92

Roadmap & Open Development Topics

  • Batch Operations
    Add support for batched inserts and updates for high-throughput use cases.

  • Generalized Query Interface
    Unify near-vector, near-text, and filter queries into a common, fluent interface.

  • Nested Reference Updates
    Extend update logic to fully support reference deletions and new nested reference creation during .update() calls.


Testing

Unit tests

Run the unit test suite:

pytest tests/unit -q

Integration tests

Integration tests require a running Weaviate (>= 1.27.0) and, for Ollama-based examples, a reachable LLM service. Using the included docker-compose setup:

# From repository root
docker-compose build vdatabase llm
docker-compose up -d vdatabase llm

# Run integration tests once services are healthy
pytest tests/integration -q -m integration

If you maintain your own Weaviate instance, set these environment variables so the engine can connect:

export WEAVIATE_HOST=vdatabase
export WEAVIATE_PORT=8080
export WEAVIATE_GRPC_PORT=50051

Migration note

The Weaviate Python client has deprecated vectorizer_config in favor of vector_config. This project now uses vector_config everywhere, including named vectors (e.g., Configure.Vectors.text2vec_ollama(...)). Ensure your server and client meet the versions above to avoid startup or schema warnings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate_orm-0.1.30.tar.gz (54.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weaviate_orm-0.1.30-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file weaviate_orm-0.1.30.tar.gz.

File metadata

  • Download URL: weaviate_orm-0.1.30.tar.gz
  • Upload date:
  • Size: 54.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for weaviate_orm-0.1.30.tar.gz
Algorithm Hash digest
SHA256 c0e98ff684f9b13afc713524029566fc26df2cd0cba3a8c4d34f76b62774786d
MD5 daf5d32394179c51fa2fc64042d2467f
BLAKE2b-256 b21907949f0838f07f4c95fa111620023fe886be34e193e9a76b9dfcd01d4e30

See more details on using hashes here.

File details

Details for the file weaviate_orm-0.1.30-py3-none-any.whl.

File metadata

  • Download URL: weaviate_orm-0.1.30-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for weaviate_orm-0.1.30-py3-none-any.whl
Algorithm Hash digest
SHA256 4610dd7af3539c6ea54e0b1bdbd08676b60e2b92fa8d2dab65e8b649710a4ac2
MD5 8df617b888ff7d5d9be8c2a9128e983e
BLAKE2b-256 6c67258d8b5de9b65ddd8ad7c9c67315eebe647a2c92ae4d1ca1d6e285e85ad9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page