Skip to main content

No project description provided

Project description

hologres-vector

PyPI - Version PyPI - Python Version

Use Hologres to store large amount of vector data and perform high speed k-nearest-neighbour search!


Table of Contents

Installation

pip install hologres-vector

Usage

输入Hologres实例连接信息

from hologres_vector import HologresVector
import os

host = os.environ["HOLO_HOST"]
port = os.environ["HOLO_PORT"]
dbname = os.environ["HOLO_DBNAME"]
user = os.environ["HOLO_USER"]
password = os.environ["HOLO_PASSWORD"]

connection_string = HologresVector.connection_string_from_db_params(host, port, dbname, user, password)

与数据库建立连接并建表

建表时,需要指定向量的维数,以及表中的除向量数据、主键、json元数据以外的其他强schema列。

table_name = "test_table"
holo = HologresVector(
    connection_string,     # 连接信息
    5,                     # 向量维度
    table_name=table_name, # 表名
    table_schema={"t": "text", "date": "timestamptz", "i": "int"},
    distance_method="SquaredEuclidean", # 距离函数,推荐用默认值,也可以选择"Euclidean"或"InnerProduct"
    pre_delete_table=False, # 若表已存在则先删除
)

插入向量数据与对应的其他列信息

支持强schema列 schema_datas 与一个json列 metadatas

该接口为批量导入,内部会将输入数据切分为512行的批进行插入。

vectors = [[0,0,0,0,0], [1,1,1,1,1], [2,2,2,2,2]]
ids = ['0', '1', '2'] # primary key
schema_datas = [
    {'t': 'text 0', 'date': '2023-08-02 18:30:00', 'i': 0},
    {'t': 'text 1', 'date': '2023-08-02 19:30:00', 'i': 1},
    {'t': 'text 2', 'date': '2023-08-02 20:30:00', 'i': 2},
]
metadatas = [
    {'a': "hello"},
    {'b': 123},
    {},
]

holo.upsert_vectors(vectors, ids, schema_datas=schema_datas, metadatas=metadatas)

查询

  1. 普通查询:从数据库中任取一条数据(可加filter)
holo.query(limit=1)
[{'id': '2', 'vector': [2.0, 2.0, 2.0, 2.0, 2.0], 'metadata': {}}]
  1. 近邻查询:根据向量从数据库中取最近邻
holo.search([0.1, 0.1, 0.1, 0.1, 0.1], k=2, select_columns=['t'])
[{'id': '0', 'metadata': {'a': 'hello'}, 'distance': 0.05, 't': 'text 0'},
{'id': '1', 'metadata': {'b': 123}, 'distance': 4.05, 't': 'text 1'}]
  1. 融合查询:根据向量从数据库中取最近邻,并用其他列查询条件约束
holo.search([0.1, 0.1, 0.1, 0.1, 0.1], k=2, schema_data_filters={'t': 'text 1'})
[{'id': '1', 'metadata': {'b': 123}, 'distance': 4.05}]

替换(upsert)

本SDK目前默认使用根据主键id的一种插入替换策略:当插入的数据和已有数据主键相同时,用新插入的整行替换已有的行。

# 先插入一行id为3的数据
holo.upsert_vectors([[3, 3, 3, 3, 3]], [3], schema_datas=[{'t': 'old data'}])
# 再插入一行id为3的数据,下面这行会将上面的整行替换掉
holo.upsert_vectors([[-3, -3, -3, -3, -3]], [3], schema_datas=[{'t': 'new data'}])

holo.query(schema_data_filters={'id': '3'})
[{'id': '3', 'vector': [-3.0, -3.0, -3.0, -3.0, -3.0], 'metadata': {}}]

删除

可使用与查询格式相同的filter条件来对数据进行部分删除。

holo.delete_vectors(schema_data_filters={'id': '2'})
holo.query(limit=10)
[{'id': '0', 'vector': [0.0, 0.0, 0.0, 0.0, 0.0], 'metadata': {'a': 'hello'}},
 {'id': '1', 'vector': [1.0, 1.0, 1.0, 1.0, 1.0], 'metadata': {'b': 123}},
 {'id': '3', 'vector': [-3.0, -3.0, -3.0, -3.0, -3.0], 'metadata': {}}]
holo.delete_vectors() # 删除全部数据
holo.query(limit=10)

License

hologres-vector is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hologres_vector-0.0.10.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

hologres_vector-0.0.10-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file hologres_vector-0.0.10.tar.gz.

File metadata

  • Download URL: hologres_vector-0.0.10.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.23.3

File hashes

Hashes for hologres_vector-0.0.10.tar.gz
Algorithm Hash digest
SHA256 78a915894628df4cbb7e37ce5c9a9e0224626785e30604daabb7f90e6017af39
MD5 bc04d055ff1982482a5d68b802325857
BLAKE2b-256 549a63bf13f45753c8470f0463d0140a80143214d3ef86bb9fdbfffc199c77ee

See more details on using hashes here.

File details

Details for the file hologres_vector-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for hologres_vector-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 75ef7deb67eefc55b3fe46cc5447a2c3374365ff3ff91be38840eb58394c496c
MD5 2c7c22a46cc439e8d87216ac6db2ca77
BLAKE2b-256 6b98392ac4fdc1e7e6aa3fc24c112b7d3dfe61704c1757712daf22a9e314bcbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page