Doris Vector Search Python SDK
Project description
Doris Vector Search Python SDK
Introduction
A Python SDK of Doris Vector Search for performing vector search operations on Apache Doris database. It provides an easy-to-use interface for table management, index management and performing vector search.
Features
- Automatically detect schema when creating tables
- Insert vector data efficiently using Stream Load API (currently, we support CSV and Arrow formats)
- Support for various data formats (pandas DataFrame, PyArrow Table, list of dicts)
- Support data and schema validation
Installation
pip install doris-vector-search
Install in local development mode:
git clone https://github.com/uchenily/doris_vector_search && cd doris_vector_search
pip install -e .
Basic Usage
Creating A Client
from doris_vector_search import DorisVectorClient, AuthOptions
# Usage doris default auth options
client = DorisVectorClient("test_database")
# With custom auth options
auth = AuthOptions(
host="localhost",
query_port=9030,
http_port=8030,
user="root",
password=""
)
client = DorisVectorClient("test_database", auth_options=auth)
Creating A Table With Data
import pandas as pd
# Test data (pd.DataFrame)
data = pd.DataFrame([
{"id": 1, "vector": [0.9, 0.4, 0.8], "text": "knight"},
{"id": 2, "vector": [0.8, 0.5, 0.3], "text": "ranger"},
{"id": 3, "vector": [0.5, 0.9, 0.6], "text": "cleric"},
])
# Test data (List of dicts)
test_data2 = [
{'id': 1, 'name': 'Alice', 'embedding': [1.1, 2.2, 3.3]},
{'id': 2, 'name': 'Bob', 'embedding': [4.4, 5.5, 6.6]},
{'id': 3, 'name': 'Charlie', 'embedding': [8.8, 9.9, 10.0]},
{'id': 4, 'name': 'David', 'embedding': [10.1, 11.2, 12.3]},
{'id': 5, 'name': 'Eve', 'embedding': [15.6, 16.7, 17.8]},
]
# Create table with vector index
table = client.create_table("test_vector_table", data, create_index=True)
# Create table with specific index options
index_options = IndexOptions(index_type="hnsw", metric_type="l2_distance")
table = client.create_table("test_vector_table", data, index_options=index_options)
Adding Data To Existed Table
# Open a existed table
table = client.open_table("test_vector_table")
# Add more data
new_data = pd.DataFrame([
{"id": 4, "vector": [0.3, 0.8, 0.7], "text": "rogue"},
{"id": 5, "vector": [0.2, 1.0, 0.5], "text": "thief"},
])
table.add(new_data)
# Add data with specific load options
load_options = LoadOptions(format="csv", batch_size=10000)
table.add(new_data, load_options=load_options)
Vector Search
query_vector = [0.8, 0.3, 0.8]
results = table.search(query_vector).limit(10).to_pandas()
print(results)
Advanced Search Options
results = table.search(query_vector)\
.limit(5)\
.distance_range(upper_bound=1.0)\
.where("text = 'knight'")\
.select(["id", "text"])\
.to_pandas()
print(results)
Index Management
from doris_vector_search import IndexOptions
# Create custom index options
index_options = IndexOptions(
index_type="hnsw",
metric_type="l2_distance",
dim=64
)
# Add index
table.add_index(index_options)
# Drop index
table.drop_index()
Setting Session Variables
from doris_vector_search import DorisVectorClient
db = DorisVectorClient(database="test")
# Set session variables
db.with_session("parallel_pipeline_task_num", 1)\
.with_session("enable_profile", False)
# or
db.with_sessions(
{"parallel_pipeline_task_num": 1, "enable_profile": False})
Thread Safety
The DorisVectorClient is not thread-safe because the underlying connection object created by mysql.connector.connect(...) cannot be shared across multiple threads. If you need to use the SDK in a multi-threaded environment, create a separate client instance in each thread.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doris_vector_search-0.0.9.tar.gz.
File metadata
- Download URL: doris_vector_search-0.0.9.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5eb0b8531d1127041a9a0edc874c91841896d6548c8284ac7fe993342e870caf
|
|
| MD5 |
5ac2b78a6ab24d9cfc729e131fbbcf37
|
|
| BLAKE2b-256 |
cb56b7ba1973dd81192da6ea149bf03e75eca0ff54f80aad749ddcde6679e0ab
|
File details
Details for the file doris_vector_search-0.0.9-py3-none-any.whl.
File metadata
- Download URL: doris_vector_search-0.0.9-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f5f22af1d1f66be0550886c6a7187daaf4aba0d2ddf5c58b3602b9de5469e92
|
|
| MD5 |
5e65a168a16812d25f132ca06b16df65
|
|
| BLAKE2b-256 |
ddd522b8e8f730b32463d648cbc496865ad3ed25c14804c8290b2be330482b7d
|