A python SDK for OceanBase Vector Store, based on SQLAlchemy, compatible with Milvus API.
Project description
pyobvector
A python SDK for OceanBase Multimodal Store (Vector Store / Full Text Search / JSON Table), based on SQLAlchemy, compatible with Milvus API.
Installation
- git clone this repo, then install with:
poetry install
- install with pip:
pip install pyobvector==0.2.12
Build Doc
You can build document locally with sphinx:
mkdir build
make html
Usage
pyobvector supports two modes:
Milvus compatible mode: You can use theMilvusLikeClientclass to use vector storage in a way similar to the Milvus APISQLAlchemy hybrid mode: You can use the vector storage function provided by theObVecClientclass and execute the relational database statement with the SQLAlchemy library. In this mode, you can regardpyobvectoras an extension of SQLAlchemy.
Milvus compatible mode
Refer to tests/test_milvus_like_client.py for more examples.
A simple workflow to perform ANN search with OceanBase Vector Store:
- setup a client:
from pyobvector import *
client = MilvusLikeClient(uri="127.0.0.1:2881", user="test@test")
- create a collection with vector index:
test_collection_name = "ann_test"
# define the schema of collection with optional partitions
range_part = ObRangePartition(False, range_part_infos = [
RangeListPartInfo('p0', 100),
RangeListPartInfo('p1', 'maxvalue'),
], range_expr='id')
schema = client.create_schema(partitions=range_part)
# define field schema of collection
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)
schema.add_field(field_name="meta", datatype=DataType.JSON, nullable=True)
# define index parameters
idx_params = self.client.prepare_index_params()
idx_params.add_index(
field_name='embedding',
index_type=VecIndexType.HNSW,
index_name='vidx',
metric_type="L2",
params={"M": 16, "efConstruction": 256},
)
# create collection
client.create_collection(
collection_name=test_collection_name,
schema=schema,
index_params=idx_params,
)
- insert data to your collection:
# prepare
vector_value1 = [0.748479,0.276979,0.555195]
vector_value2 = [0, 0, 0]
data1 = [{'id': i, 'embedding': vector_value1} for i in range(10)]
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(10, 13)])
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(111, 113)])
# insert data
client.insert(collection_name=test_collection_name, data=data1)
- do ann search:
res = client.search(collection_name=test_collection_name, data=[0,0,0], anns_field='embedding', limit=5, output_fields=['id'])
# For example, the result will be:
# [{'id': 112}, {'id': 111}, {'id': 10}, {'id': 11}, {'id': 12}]
SQLAlchemy hybrid mode
- setup a client:
from pyobvector import *
from sqlalchemy import Column, Integer, JSON
from sqlalchemy import func
client = ObVecClient(uri="127.0.0.1:2881", user="test@test")
- create a partitioned table with vector index:
# create partitioned table
range_part = ObRangePartition(False, range_part_infos = [
RangeListPartInfo('p0', 100),
RangeListPartInfo('p1', 'maxvalue'),
], range_expr='id')
cols = [
Column('id', Integer, primary_key=True, autoincrement=False),
Column('embedding', VECTOR(3)),
Column('meta', JSON)
]
client.create_table(test_collection_name, columns=cols, partitions=range_part)
# create vector index
client.create_index(
test_collection_name,
is_vec_index=True,
index_name='vidx',
column_names=['embedding'],
vidx_params='distance=l2, type=hnsw, lib=vsag',
)
- insert data to your collection:
# insert data
vector_value1 = [0.748479,0.276979,0.555195]
vector_value2 = [0, 0, 0]
data1 = [{'id': i, 'embedding': vector_value1} for i in range(10)]
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(10, 13)])
data1.extend([{'id': i, 'embedding': vector_value2} for i in range(111, 113)])
client.insert(test_collection_name, data=data1)
- do ann search:
# perform ann search
res = self.client.ann_search(
test_collection_name,
vec_data=[0,0,0],
vec_column_name='embedding',
distance_func=l2_distance,
topk=5,
output_column_names=['id']
)
# For example, the result will be:
# [(112,), (111,), (10,), (11,), (12,)]
- If you want to use pure
SQLAlchemyAPI withOceanBasedialect, you can just get anSQLAlchemy.engineviaclient.engine. The engine can also be created as following:
import pyobvector
from sqlalchemy.dialects import registry
from sqlalchemy import create_engine
uri: str = "127.0.0.1:2881"
user: str = "root@test"
password: str = ""
db_name: str = "test"
registry.register("mysql.oceanbase", "pyobvector.schema.dialect", "OceanBaseDialect")
connection_str = (
f"mysql+oceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4"
)
engine = create_engine(connection_str, **kwargs)
- Async engine is also supported:
import pyobvector
from sqlalchemy.dialects import registry
from sqlalchemy.ext.asyncio import create_async_engine
uri: str = "127.0.0.1:2881"
user: str = "root@test"
password: str = ""
db_name: str = "test"
registry.register("mysql.aoceanbase", "pyobvector", "AsyncOceanBaseDialect")
connection_str = (
f"mysql+aoceanbase://{user}:{password}@{uri}/{db_name}?charset=utf8mb4"
)
engine = create_async_engine(connection_str)
- For further usage in pure
SQLAlchemymode, please refer to SQLAlchemy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyobvector-0.2.12.tar.gz.
File metadata
- Download URL: pyobvector-0.2.12.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f85c03c7fd16753cb110fb7a49515c35c3fa3580783ab6caced42757b812138f
|
|
| MD5 |
5e1e95aec1d35b5799abf3126e755d65
|
|
| BLAKE2b-256 |
6dd89cb1190085e8b1713904f90a11c611711a7933348aaccf41ac007ccb9c47
|
Provenance
The following attestation bundles were made for pyobvector-0.2.12.tar.gz:
Publisher:
python-publish.yml on oceanbase/pyobvector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyobvector-0.2.12.tar.gz -
Subject digest:
f85c03c7fd16753cb110fb7a49515c35c3fa3580783ab6caced42757b812138f - Sigstore transparency entry: 233010226
- Sigstore integration time:
-
Permalink:
oceanbase/pyobvector@3b7ca4e55f8f267c8c1dd7eb2c84194bf700d6ba -
Branch / Tag:
refs/tags/release-v0.2.12 - Owner: https://github.com/oceanbase
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3b7ca4e55f8f267c8c1dd7eb2c84194bf700d6ba -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyobvector-0.2.12-py3-none-any.whl.
File metadata
- Download URL: pyobvector-0.2.12-py3-none-any.whl
- Upload date:
- Size: 52.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c018635d36b5caa821024a5186c486485c3846728b9f918fbea10344bc47b94a
|
|
| MD5 |
dfd16ac03536edbd983e929e967fba9e
|
|
| BLAKE2b-256 |
3deb0325ed2b8a09f8d72a7128a4a02f1e34f9375a5b2f8fd4e7e38f2fb1b57e
|
Provenance
The following attestation bundles were made for pyobvector-0.2.12-py3-none-any.whl:
Publisher:
python-publish.yml on oceanbase/pyobvector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyobvector-0.2.12-py3-none-any.whl -
Subject digest:
c018635d36b5caa821024a5186c486485c3846728b9f918fbea10344bc47b94a - Sigstore transparency entry: 233010233
- Sigstore integration time:
-
Permalink:
oceanbase/pyobvector@3b7ca4e55f8f267c8c1dd7eb2c84194bf700d6ba -
Branch / Tag:
refs/tags/release-v0.2.12 - Owner: https://github.com/oceanbase
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3b7ca4e55f8f267c8c1dd7eb2c84194bf700d6ba -
Trigger Event:
release
-
Statement type: