Skip to main content

A python SDK for OceanBase Vector Store, based on SQLAlchemy, compatible with Milvus API.

Project description

pyobvector

A python SDK for OceanBase Multimodal Store (Vector Store / Full Text Search / JSON Table), based on SQLAlchemy, compatible with Milvus API.

Downloads Downloads

Installation

  • git clone this repo, then install with:
poetry install
  • install with pip:
pip install pyobvector==0.2.12

Build Doc

You can build document locally with sphinx:

mkdir build
make html

Usage

pyobvector supports two modes:

  • Milvus compatible mode: You can use the MilvusLikeClient class to use vector storage in a way similar to the Milvus API
  • SQLAlchemy hybrid mode: You can use the vector storage function provided by the ObVecClient class and execute the relational database statement with the SQLAlchemy library. In this mode, you can regard pyobvector as an extension of SQLAlchemy.

Milvus compatible mode

Refer to tests/test_milvus_like_client.py for more examples.

A simple workflow to perform ANN search with OceanBase Vector Store:

  • setup a client:
from pyobvector import *

client = MilvusLikeClient(uri="127.0.0.1:2881", user="test@test")
  • create a collection with vector index:
test_collection_name = "ann_test"
# define the schema of collection with optional partitions
range_part = ObRangePartition(False, range_part_infos = [
    RangeListPartInfo('p0', 100),
    RangeListPartInfo('p1', 'maxvalue'),
], range_expr='id')
schema = client.create_schema(partitions=range_part)
# define field schema of collection
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)
schema.add_field(field_name="meta", datatype=DataType.JSON, nullable=True)
# define index parameters
idx_params = self.client.prepare_index_params()
idx_params.add_index(
    field_name='embedding',
    index_type=VecIndexType.HNSW,
    index_name='vidx',
    metric_type="L2",
    params={"M": 16, "efConstruction": 256},
)
# create collection
client.create_collection(
    collection_name=test_collection_name,
    schema=schema,
    index_params=idx_params,
)
  • insert data to your collection:
# prepare
vector_value1 = [0.748479,0.276979,0.555195]
vector_value2 = [0, 0, 0]
data1 = [{'id': i, 'embedding': vector_value1} for i in range(10)]
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(10, 13)])
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(111, 113)])
# insert data
client.insert(collection_name=test_collection_name, data=data1)
  • do ann search:
res = client.search(collection_name=test_collection_name, data=[0,0,0], anns_field='embedding', limit=5, output_fields=['id'])
# For example, the result will be:
# [{'id': 112}, {'id': 111}, {'id': 10}, {'id': 11}, {'id': 12}]

SQLAlchemy hybrid mode

  • setup a client:
from pyobvector import *
from sqlalchemy import Column, Integer, JSON
from sqlalchemy import func

client = ObVecClient(uri="127.0.0.1:2881", user="test@test")
  • create a partitioned table with vector index:
# create partitioned table
range_part = ObRangePartition(False, range_part_infos = [
    RangeListPartInfo('p0', 100),
    RangeListPartInfo('p1', 'maxvalue'),
], range_expr='id')

cols = [
    Column('id', Integer, primary_key=True, autoincrement=False),
    Column('embedding', VECTOR(3)),
    Column('meta', JSON)
]
client.create_table(test_collection_name, columns=cols, partitions=range_part)

# create vector index
client.create_index(
    test_collection_name, 
    is_vec_index=True, 
    index_name='vidx',
    column_names=['embedding'],
    vidx_params='distance=l2, type=hnsw, lib=vsag',
)
  • insert data to your collection:
# insert data
vector_value1 = [0.748479,0.276979,0.555195]
vector_value2 = [0, 0, 0]
data1 = [{'id': i, 'embedding': vector_value1} for i in range(10)]
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(10, 13)])
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(111, 113)])
client.insert(test_collection_name, data=data1)
  • do ann search:
# perform ann search
res = self.client.ann_search(
    test_collection_name, 
    vec_data=[0,0,0], 
    vec_column_name='embedding',
    distance_func=l2_distance,
    topk=5,
    output_column_names=['id']
)
# For example, the result will be:
# [(112,), (111,), (10,), (11,), (12,)]
  • If you want to use pure SQLAlchemy API with OceanBase dialect, you can just get an SQLAlchemy.engine via client.engine. The engine can also be created as following:
import pyobvector
from sqlalchemy.dialects import registry
from sqlalchemy import create_engine

uri: str = "127.0.0.1:2881"
user: str = "root@test"
password: str = ""
db_name: str = "test"
registry.register("mysql.oceanbase", "pyobvector.schema.dialect", "OceanBaseDialect")
connection_str = (
    f"mysql+oceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4"
)
engine = create_engine(connection_str, **kwargs)
  • Async engine is also supported:
import pyobvector
from sqlalchemy.dialects import registry
from sqlalchemy.ext.asyncio import create_async_engine

uri: str = "127.0.0.1:2881"
user: str = "root@test"
password: str = ""
db_name: str = "test"
registry.register("mysql.aoceanbase", "pyobvector", "AsyncOceanBaseDialect")
connection_str = (
    f"mysql+aoceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4"
)
engine = create_async_engine(connection_str)
  • For further usage in pure SQLAlchemy mode, please refer to SQLAlchemy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyobvector-0.2.12.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyobvector-0.2.12-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file pyobvector-0.2.12.tar.gz.

File metadata

  • Download URL: pyobvector-0.2.12.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pyobvector-0.2.12.tar.gz
Algorithm Hash digest
SHA256 f85c03c7fd16753cb110fb7a49515c35c3fa3580783ab6caced42757b812138f
MD5 5e1e95aec1d35b5799abf3126e755d65
BLAKE2b-256 6dd89cb1190085e8b1713904f90a11c611711a7933348aaccf41ac007ccb9c47

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyobvector-0.2.12.tar.gz:

Publisher: python-publish.yml on oceanbase/pyobvector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyobvector-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: pyobvector-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pyobvector-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 c018635d36b5caa821024a5186c486485c3846728b9f918fbea10344bc47b94a
MD5 dfd16ac03536edbd983e929e967fba9e
BLAKE2b-256 3deb0325ed2b8a09f8d72a7128a4a02f1e34f9375a5b2f8fd4e7e38f2fb1b57e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyobvector-0.2.12-py3-none-any.whl:

Publisher: python-publish.yml on oceanbase/pyobvector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page