Doris Vector Search Python SDK
Project description
Doris Vector Search Python SDK
Introduction
A Python SDK of Doris Vector Search for performing vector search operations on Apache Doris database. It provides an easy-to-use interface for table management, index management and performing vector search.
Features
- Automatically detect schema when creating tables
- Insert vector data efficiently using Stream Load API (currently, we support CSV and Arrow formats)
- Support for various data formats (pandas DataFrame, PyArrow Table, list of dicts)
- Support data and schema validation
Installation
pip install doris-vector-search
Install in local development mode:
git clone https://github.com/uchenily/doris_vector_search && cd doris_vector_search
pip install -e .
Basic Usage
Creating A Client
from doris_vector_search import DorisVectorClient, AuthOptions
# Usage doris default auth options
client = DorisVectorClient("test_database")
# With custom auth options
auth = AuthOptions(
host="localhost",
query_port=9030,
http_port=8030,
user="root",
password=""
)
client = DorisVectorClient("test_database", auth_options=auth)
Creating A Table With Data
import pandas as pd
# Test data (pd.DataFrame)
data = pd.DataFrame([
{"id": 1, "vector": [0.9, 0.4, 0.8], "text": "knight"},
{"id": 2, "vector": [0.8, 0.5, 0.3], "text": "ranger"},
{"id": 3, "vector": [0.5, 0.9, 0.6], "text": "cleric"},
])
# Test data (List of dicts)
test_data2 = [
{'id': 1, 'name': 'Alice', 'embedding': [1.1, 2.2, 3.3]},
{'id': 2, 'name': 'Bob', 'embedding': [4.4, 5.5, 6.6]},
{'id': 3, 'name': 'Charlie', 'embedding': [8.8, 9.9, 10.0]},
{'id': 4, 'name': 'David', 'embedding': [10.1, 11.2, 12.3]},
{'id': 5, 'name': 'Eve', 'embedding': [15.6, 16.7, 17.8]},
]
# Create table with vector index
table = client.create_table("test_vector_table", data, create_index=True)
# Create table with specific index options
index_options = IndexOptions(index_type="hnsw", metric_type="l2_distance")
table = client.create_table("test_vector_table", data, index_options=index_options)
Adding Data To Existed Table
# Open a existed table
table = client.open_table("test_vector_table")
# Add more data
new_data = pd.DataFrame([
{"id": 4, "vector": [0.3, 0.8, 0.7], "text": "rogue"},
{"id": 5, "vector": [0.2, 1.0, 0.5], "text": "thief"},
])
table.add(new_data)
# Add data with specific load options
load_options = LoadOptions(format="csv", batch_size=10000)
table.add(new_data, load_options=load_options)
Vector Search
query_vector = [0.8, 0.3, 0.8]
results = table.search(query_vector).limit(10).to_pandas()
print(results)
Advanced Search Options
results = table.search(query_vector)\
.limit(5)\
.distance_range(upper_bound=1.0)\
.where("text = 'knight'")\
.select(["id", "text"])\
.to_pandas()
print(results)
Index Management
from doris_vector_search import IndexOptions
# Create custom index options
index_options = IndexOptions(
index_type="hnsw",
metric_type="l2_distance",
dim=64
)
# Add index
table.add_index(index_options)
# Drop index
table.drop_index()
Setting Session Variables
from doris_vector_search import DorisVectorClient
db = DorisVectorClient(database="test")
# Set session variables
db.with_session("parallel_pipeline_task_num", 1)\
.with_session("num_scanner_threads", 1)\
.with_session("enable_profile", False)
# or
db.with_sessions(
{"parallel_pipeline_task_num": 1, "num_scanner_threads": 1, "enable_profile": False})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
doris_vector_search-0.0.4.tar.gz
(23.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doris_vector_search-0.0.4.tar.gz.
File metadata
- Download URL: doris_vector_search-0.0.4.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5bf41503f9b53945c369b33d360ab773eaeac20af4319b6671337e12f43ea03
|
|
| MD5 |
3b53c3aefa234df19bb249ee3ac3af4a
|
|
| BLAKE2b-256 |
da560d2c367397414ca4d7d492e5f633fec30e4b68b724bdadb4f3ecb13a0aec
|
File details
Details for the file doris_vector_search-0.0.4-py3-none-any.whl.
File metadata
- Download URL: doris_vector_search-0.0.4-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da356bd9e24fdd0e0561a49afbe294af9ea37e30dbed7eb93e7d09e467739ff
|
|
| MD5 |
e09ec834642d2dfb4e3c8c8ecb101e7e
|
|
| BLAKE2b-256 |
16ea0d0afe50368b7fb8c699ff51415fa4d488d7f9af72be4b83bf80b29b74e4
|