Doris Vector Search Python SDK
Project description
Doris Vector Search Python SDK
Introduction
A Python SDK of Doris Vector Search for performing vector search operations on Apache Doris database. It provides an easy-to-use interface for table management, index management and performing vector search.
Features
- Automatically detect schema when creating tables
- Insert vector data efficiently using Stream Load API (currently, we support CSV and Arrow formats)
- Support for various data formats (pandas DataFrame, PyArrow Table, list of dicts)
- Support data and schema validation
Installation
pip install doris-vector-search
Install in local development mode:
git clone https://github.com/uchenily/doris_vector_search && cd doris_vector_search
pip install -e .
Basic Usage
Creating A Client
from doris_vector_search import DorisVectorClient, AuthOptions
# Usage doris default auth options
client = DorisVectorClient("test_database")
# With custom auth options
auth = AuthOptions(
host="localhost",
query_port=9030,
http_port=8030,
user="root",
password=""
)
client = DorisVectorClient("test_database", auth_options=auth)
Creating A Table With Data
import pandas as pd
# Test data (pd.DataFrame)
data = pd.DataFrame([
{"id": 1, "vector": [0.9, 0.4, 0.8], "text": "knight"},
{"id": 2, "vector": [0.8, 0.5, 0.3], "text": "ranger"},
{"id": 3, "vector": [0.5, 0.9, 0.6], "text": "cleric"},
])
# Test data (List of dicts)
test_data2 = [
{'id': 1, 'name': 'Alice', 'embedding': [1.1, 2.2, 3.3]},
{'id': 2, 'name': 'Bob', 'embedding': [4.4, 5.5, 6.6]},
{'id': 3, 'name': 'Charlie', 'embedding': [8.8, 9.9, 10.0]},
{'id': 4, 'name': 'David', 'embedding': [10.1, 11.2, 12.3]},
{'id': 5, 'name': 'Eve', 'embedding': [15.6, 16.7, 17.8]},
]
# Create table with vector index
table = client.create_table("test_vector_table", data, create_index=True)
# Create table with specific index options
index_options = IndexOptions(index_type="hnsw", metric_type="l2_distance")
table = client.create_table("test_vector_table", data, index_options=index_options)
Adding Data To Existed Table
# Open a existed table
table = client.open_table("test_vector_table")
# Add more data
new_data = pd.DataFrame([
{"id": 4, "vector": [0.3, 0.8, 0.7], "text": "rogue"},
{"id": 5, "vector": [0.2, 1.0, 0.5], "text": "thief"},
])
table.add(new_data)
# Add data with specific load options
load_options = LoadOptions(format="csv", batch_size=10000)
table.add(new_data, load_options=load_options)
Vector Search
query_vector = [0.8, 0.3, 0.8]
results = table.search(query_vector).limit(10).to_pandas()
print(results)
Advanced Search Options
results = table.search(query_vector)\
.limit(5)\
.distance_range(upper_bound=1.0)\
.where("text = 'knight'")\
.select(["id", "text"])\
.to_pandas()
print(results)
Index Management
from doris_vector_search import IndexOptions
# Create custom index options
index_options = IndexOptions(
index_type="hnsw",
metric_type="l2_distance",
dim=64
)
# Add index
table.add_index(index_options)
# Drop index
table.drop_index()
Setting Session Variables
from doris_vector_search import DorisVectorClient
db = DorisVectorClient(database="test")
# Set session variables
db.with_session("parallel_pipeline_task_num", 1)\
.with_session("num_scanner_threads", 1)\
.with_session("enable_profile", False)
# or
db.with_sessions(
{"parallel_pipeline_task_num": 1, "num_scanner_threads": 1, "enable_profile": False})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
doris_vector_search-0.0.2.tar.gz
(17.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doris_vector_search-0.0.2.tar.gz.
File metadata
- Download URL: doris_vector_search-0.0.2.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
947515ff4948a2332842aef9d707f3a264eb2188c62b87dfb8c88e90dd291c24
|
|
| MD5 |
78d3bb34a79bba76061e8d28c6b0fac5
|
|
| BLAKE2b-256 |
60dab67c541d83a5f7a4bdc2b8d66bd00ff05927e881cd20b3e6f91525c0f8c1
|
File details
Details for the file doris_vector_search-0.0.2-py3-none-any.whl.
File metadata
- Download URL: doris_vector_search-0.0.2-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a30c5a29c38aae7091ddd2d8eb29e288654b622582448d18a5907e28b6d05f5
|
|
| MD5 |
7de1afea20d18f05b9c1166b27b847c8
|
|
| BLAKE2b-256 |
a5eae4f5a58bcc41568861318d16aaa2428484a97b653262e591929385a6ba14
|