No project description provided

Project description

vectordb-orm

vectordb-orm is an Object-Relational Mapping (ORM) library designed to work with vector databases. This project aims to provide a consistent and convenient interface for working with vector data, allowing you to interact with vector databases using familiar ORM concepts and syntax. Right now Milvus and Pinecone are supported with more backend engines planned for the future.

Getting Started

Here are some simple examples demonstrating common behavior with vectordb-orm. First a note on structure. vectordb-orm is designed around the idea of a Schema, which is logically equivalent to a table in classic relational databases. This schema is marked up with python typehints that define the type of vector and metadata that will be stored alongisde the objects.

You create a class definition by subclassing VectorSchemaBase and providing typehints for the keys of your model, similar to pydantic. These fields also support custom initialization behavior if you want (or need) to modify their configuration options.

Make sure to have a vector database running on your system before connecting. We provide an archive of the official docker-compose that's mainly used for testing Milvus. Pinecone requires your API key and environment parameters.

git clone https://github.com/piercefreeman/vectordb-orm.git
cd vectordb-orm
docker-compose up -d

Field Type	Description
BaseField	The `BaseField` provides the ability to add a default value for a given field. This should be used in cases where the more specific field types aren't relevant.
PrimaryKeyField	The `PrimaryKeyField` is used to specify the primary key of your model, and one is required per class.
VarCharField	The `VarCharField` is used to specify a string field, and the `EmbeddingField` is used to specify a vector field.
EmbeddingField	The `EmbeddingField` also supports specifying an index type, which is used to specify the index type for the field. The `EmbeddingField` also supports specifying a dimension, which is used to specify the dimension of the vector field.

Object Definition

Defining a schema is almost entirely the same between backends but there are some small differences when it comes to index creation.

Milvus:

from vectordb_orm import VectorSchemaBase, EmbeddingField, VarCharField, PrimaryKeyField, Milvus_IVF_FLAT
import numpy as np

class MyObject(VectorSchemaBase):
    __collection_name__ = 'my_object_collection'

    id: int = PrimaryKeyField()
    text: str = VarCharField(max_length=128)
    embedding: np.ndarray = EmbeddingField(dim=128, index=Milvus_IVF_FLAT(cluster_units=128))

Pinecone:

from vectordb_orm import VectorSchemaBase, EmbeddingField, VarCharField, PrimaryKeyField, PineconeIndex, PineconeSimilarityMetric
import numpy as np

class MyObject(VectorSchemaBase):
    __collection_name__ = 'my_object_collection'

    id: int = PrimaryKeyField()
    text: str = VarCharField(max_length=128)
    embedding: np.ndarray = EmbeddingField(dim=128, index=PineconeIndex(metric_type=PineconeSimilarityMetric.COSINE))

Embedding Types

We currently support two different types of embeddings: floating point and binary. We distinguish these based on the type signatures of the embedding array.

For binary:

embedding: np.ndarray[np.bool_] = EmbeddingField(
    dim=128,
    index=FLAT()
)

For floating point:

embedding: np.ndarray = EmbeddingField(
    dim=128,
    index=BIN_FLAT()
)

Querying Syntax

from pymilvus import Milvus, connections
from vectordb_orm import MilvusBackend, VectorSession

# Instantiate a Milvus session
session = VectorSession(MilvusBackend(Milvus()))
connections.connect("default", host="localhost", port="19530")

from vectordb_orm import PineconeBackend, VectorSession

# Instantiate a Pinecone session
session = VectorSession(
    PineconeBackend(
        api_key=getenv("PINECONE_API_KEY"),
        environment=getenv("PINECONE_ENVIRONMENT"),
    )
)

# Perform a simple boolean query
results = session.query(MyObject).filter(MyObject.text == 'bar').limit(2).all()

# Rank results by their similarity to a given reference vector
query_vector = np.array([8.0]*128)
results = session.query(MyObject).filter(MyObject.text == 'bar').order_by_similarity(MyObject.embedding, query_vector).limit(2).all()

Installation

To get started with vectordb-orm, simply install the package and its dependencies, then import the necessary modules:

pip install vectordb-orm

We use poetry for local development work:

poetry install
poetry run pytest

Why use an ORM?

Most vector databases use a JSON-like querying syntax where schemas and objects are specified as dictionary blobs. This makes it difficult to use IDE features like autocomplete or typehinting, and also can lead to error prone code while translating between Python logic and querying syntax.

An ORM provides a high-level, abstracted interface to work with databases. This abstraction makes it easier to write, read, and maintain code, as well as to switch between different database backends with minimal changes. Furthermore, an ORM allows developers to work with databases in a more Pythonic way, using Python objects and classes instead of raw SQL queries or low-level API calls.

Comparison to SQLAlchemy

While vectordb-orm is inspired by the widely-used SQLAlchemy ORM, it is specifically designed for vector databases, such as Milvus. This means that vectordb-orm offers unique features tailored to the needs of working with vector data, such as similarity search, index management, and efficient data storage. Although the two ORMs share some similarities in terms of syntax and structure, vectordb-orm focuses on providing a seamless experience for working with vector databases.

WIP

Please note that vectordb-orm is still a (somewhat large) work in progress. The current implementation focuses on Milvus integration; the goal is to eventually expand support to other vector databases. Contributions and feedback are welcome as we work to improve and expand the capabilities of vectordb-orm.

Project details

Release history Release notifications | RSS feed

0.2.1

Apr 28, 2023

This version

0.2.0

Apr 21, 2023

0.1.0

Apr 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectordb_orm-0.2.0.tar.gz (20.0 kB view details)

Uploaded Apr 21, 2023 Source

Built Distribution

vectordb_orm-0.2.0-py3-none-any.whl (26.9 kB view details)

Uploaded Apr 21, 2023 Python 3

File details

Details for the file vectordb_orm-0.2.0.tar.gz.

File metadata

Download URL: vectordb_orm-0.2.0.tar.gz
Upload date: Apr 21, 2023
Size: 20.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1035-azure

File hashes

Hashes for vectordb_orm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c1fbfa0bb876eaf3351597c6d0da804f8207b6acdac92cbdb218cc77b7558743`
MD5	`7082c89c5c422c1975ea6d4e2064adce`
BLAKE2b-256	`6aea8b15952ce1eea21ac93fedc2d206068d54693b5454bf89ad50283a4d6ca3`

See more details on using hashes here.

File details

Details for the file vectordb_orm-0.2.0-py3-none-any.whl.

File metadata

Download URL: vectordb_orm-0.2.0-py3-none-any.whl
Upload date: Apr 21, 2023
Size: 26.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.11 Linux/5.15.0-1035-azure

File hashes

Hashes for vectordb_orm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb06711aa41a4ceef6102b0b7c4cb98d2a8cc705f4ed7a82d73fd8ac4a1c9ca2`
MD5	`885628e3b22281181b514b7972b42cd2`
BLAKE2b-256	`1f5b2c3c605f72f4ba11cca4a687c1304b774c12627bcbc87480fcb9439b4013`