Skip to main content

Doris Vector Search Python SDK

Project description

Doris Vector Search Python SDK

Introduction

A Python SDK of Doris Vector Search for performing vector search operations on Apache Doris database. It provides an easy-to-use interface for table management, index management and performing vector search.

Features

  • Automatically detect schema when creating tables
  • Insert vector data efficiently using Stream Load API (currently, we support CSV and Arrow formats)
  • Support for various data formats (pandas DataFrame, PyArrow Table, list of dicts)
  • Support data and schema validation

Installation

pip install doris-vector-search

Install in local development mode:

git clone https://github.com/uchenily/doris_vector_search && cd doris_vector_search
pip install -e .

Basic Usage

Creating A Client

from doris_vector_search import DorisVectorClient, AuthOptions

# Usage doris default auth options
client = DorisVectorClient("test_database")


# With custom auth options
auth = AuthOptions(
    host="localhost",
    query_port=9030,
    http_port=8030,
    user="root",
    password=""
)
client = DorisVectorClient("test_database", auth_options=auth)

Creating A Table With Data

import pandas as pd

# Test data (pd.DataFrame)
data = pd.DataFrame([
    {"id": 1, "vector": [0.9, 0.4, 0.8], "text": "knight"},
    {"id": 2, "vector": [0.8, 0.5, 0.3], "text": "ranger"},
    {"id": 3, "vector": [0.5, 0.9, 0.6], "text": "cleric"},
])

# Test data (List of dicts)
test_data2 = [
    {'id': 1, 'name': 'Alice', 'embedding': [1.1, 2.2, 3.3]},
    {'id': 2, 'name': 'Bob', 'embedding': [4.4, 5.5, 6.6]},
    {'id': 3, 'name': 'Charlie', 'embedding': [8.8, 9.9, 10.0]},
    {'id': 4, 'name': 'David', 'embedding': [10.1, 11.2, 12.3]},
    {'id': 5, 'name': 'Eve', 'embedding': [15.6, 16.7, 17.8]},
]

# Create table with vector index
table = client.create_table("test_vector_table", data, create_index=True)

# Create table with specific index options
index_options = IndexOptions(index_type="hnsw", metric_type="l2_distance")
table = client.create_table("test_vector_table", data, index_options=index_options)

Adding Data To Existed Table

# Open a existed table
table = client.open_table("test_vector_table")

# Add more data
new_data = pd.DataFrame([
    {"id": 4, "vector": [0.3, 0.8, 0.7], "text": "rogue"},
    {"id": 5, "vector": [0.2, 1.0, 0.5], "text": "thief"},
])
table.add(new_data)

# Add data with specific load options
load_options = LoadOptions(format="csv", batch_size=10000)
table.add(new_data, load_options=load_options)

Vector Search

query_vector = [0.8, 0.3, 0.8]
results = table.search(query_vector).limit(10).to_pandas()
print(results)

Advanced Search Options

results = table.search(query_vector)\
    .limit(5)\
    .distance_range(upper_bound=1.0)\
    .where("text = 'knight'")\
    .select(["id", "text"])\
    .to_pandas()

print(results)

Index Management

from doris_vector_search import IndexOptions

# Create custom index options
index_options = IndexOptions(
    index_type="hnsw",
    metric_type="l2_distance",
    dim=64
)

# Add index
table.add_index(index_options)

# Drop index
table.drop_index()

Setting Session Variables

from doris_vector_search import DorisVectorClient

db = DorisVectorClient(database="test")

# Set session variables
db.with_session("parallel_pipeline_task_num", 1)\
  .with_session("enable_profile", False)

# or
db.with_sessions(
    {"parallel_pipeline_task_num": 1, "enable_profile": False})

Thread Safety

The DorisVectorClient is not thread-safe because the underlying connection object created by mysql.connector.connect(...) cannot be shared across multiple threads. If you need to use the SDK in a multi-threaded environment, create a separate client instance in each thread.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doris_vector_search-0.0.8.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doris_vector_search-0.0.8-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file doris_vector_search-0.0.8.tar.gz.

File metadata

  • Download URL: doris_vector_search-0.0.8.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for doris_vector_search-0.0.8.tar.gz
Algorithm Hash digest
SHA256 63d0000056183a61d3fba0aa322cc0f098f54d1b1d5a9d9d4e96e6d9c6063c78
MD5 edb9f9e61b9a3f7470cc2070d1d62035
BLAKE2b-256 e0aa14471b41e34a187b7626fa2d51a5e2d7d3b021c7c91dafc1b80d494188f1

See more details on using hashes here.

File details

Details for the file doris_vector_search-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for doris_vector_search-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1eadad822adcdbfc4e52728c004cb3dec027a5c5ce5e482e2bd169b53cf7d718
MD5 700659d597e9b01efde921f3f0aab3d2
BLAKE2b-256 1c3bf9271d81bce50c1b7cf6b2aaf2d9503a7fe27787b832b07bdef9926e7a18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page