Skip to main content

DashVector Client Python Sdk Library

Project description

DashVector Client Python Library

DashVector is a scalable and fully-managed vector-database service for building various machine learning applications. The DashVector client SDK is your gateway to access the DashVector service.

For more information about DashVector, please visit: https://help.aliyun.com/document_detail/2510225.html

Installation

To install the DashVector client Python SDK, simply run:

pip install dashvector

QuickStart

import numpy as np
import dashvector

# Use DashVector `Client` api to communicate with the backend vectorDB service.
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')

# Create a collection named "quickstart" with dimension of 4, using the default Cosine distance metric
rsp = client.create(name='quickstart', dimension=4)
assert rsp

# Get a collection by name
collection = client.get(name='quickstart')

# Operations on 'Collection' includes Inert/Query/Upsert/Update/Delete/Fetch of docs
# Here we insert sample data (4-dimensional vectors) in batches of 16
collection.insert(
    [
        dashvector.Doc(id=str(i), vector=np.random.rand(4), fields={'anykey': 'anyvalue'}) 
        for i in range(16)
    ]
)

# Query a vector from the collection
docs = collection.query([0.1, 0.2, 0.3, 0.4], topk=5)
print(docs)

# Get statistics about collection
stats = collection.stats()
print(stats)

# Delete a collection by name
client.delete(name='quickstart')

Reference

Create a Client

Client host various APIs for interacting with DashVector Collection.

dashvector.Client(
    api_key: str,
    endpoint: str = 'dashvector.cn-hangzhou.aliyuncs.com',
    protocal: dashvector.DashVectorProtocol = dashvector.DashVectorProtocol.GRPC, 
    timeout: float = 10.0
) -> Client
Parameters Type Required Description
api_key str Yes Your DashVector API-KEY
endpoint str No Service Endpoint.
Default value: dashvector.cn-hangzhou.aliyuncs.com
protocol DashVectorProtocol No Communication protocol, support HTTP and GRPC.
Default value: DashVectorProtocol.GRPC
timeout float No Timeout period (in seconds), -1 means no timeout.
Default value: 10.0

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
assert client

Create Collection

Client.create(
    name: str,
    dimension: int,
    dtype: Union[Type[int], Type[float]] = float,
    fields_schema: Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] = None,
    metric: str = 'cosine',
    timeout: Optional[int] = None
) -> DashVectorResponse
Parameters Type Required Description
name str Yes The name of the Collection to create.
dimension int Yes The dimensions of the Collection's vectors. Valid values: 1-20,000
dtype Union[Type[int], Type[float]] No The date type of the Collection's vectors.
Default value: Type[float]
fields_schema Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] No Fields schema of the Collection.
Default value: None
e.g. {"name": str, "age": int}
metric str No Vector similarity metric. For cosine, dtype must be float.
Valid values:
1. (Default)cosine
2. dotproduct
3. euclidean
timeout Optional[int] No Timeout period (in seconds), -1 means asynchronous creation collection.
Default value: None

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')

rsp = client.create('YOUR-COLLECTION-NAME', dimension=4)
assert rsp

List Collections

Client.list() -> DashVectorResponse

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')

collections = client.list()

for collection in collections:
    print(collection)
# outputs:
# 'quickstart'

Describe Collection

Client.describe(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Collection to describe.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
rsp = client.describe('YOUR-COLLECTION-NAME')

print(rsp)
# example output:
# {
#   "request_id": "8d3ac14e-5382-4736-b77c-4318761ddfab",
#   "code": 0,
#   "message": "",
#   "output": {
#     "name": "quickstart",
#     "dimension": 4,
#     "dtype": "FLOAT",
#     "metric": "dotproduct",
#     "fields_schema": {
#       "name": "STRING",
#       "age": "INT",
#       "height": "FLOAT"
#     },
#     "status": "SERVING",
#     "partitions": {
#       "default": "SERVING"
#     }
#   }
# }

Delete Collection

Client.delete(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Collection to delete.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
client.delete('YOUR-COLLECTION-NAME')

Get a Collection Instance

Collection provides APIs for accessing Doc and Partition

Client.get(name: str) -> Collection

Parameters Type Required Description
name str Yes The name of the Collection to get.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
assert collection

Describe Collection Statistics

Collection.stats() -> DashVectorResponse

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.stats()

print(rsp)
# example output:
# {
#   "request_id": "14448bcb-c9a3-49a8-9152-0de3990bce59",
#   "code": 0,
#   "message": "Success",
#   "output": {
#     "total_doc_count": "26",
#     "index_completeness": 1.0,
#     "partitions": {
#       "default": {
#         "total_doc_count": "26"
#       }
#     }
#   }
# }

Insert/Update/Upsert Docs

Collection.insert(
    docs: Union[Doc, List[Doc], Tuple, List[Tuple]],
    partition: Optional[str] = None,
    async_req: False
) -> DashVectorResponse
Parameters Type Required Description
docs Union[Doc, List[Doc], Tuple, List[Tuple]] Yes The docs to Insert/Update/Upsert.
partition Optional[str] No Name of the partition to Insert/Update/Upsert.
Default value: None
async_req bool No Enable async request or not.
Default value: False

Example:

import dashvector
import numpy as np

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')

# insert a doc with Tuple
collection.insert(('YOUR-DOC-ID1', [0.1, 0.2, 0.3, 0.4]))
collection.insert(('YOUR-DOC-ID2', [0.2, 0.3, 0.4, 0.5], {'age': 30, 'name': 'alice', 'anykey': 'anyvalue'}))

# insert a doc with dashvector.Doc
collection.insert(
    dashvector.Doc(
        id='YOUR-DOC-ID3', 
        vector=[0.3, 0.4, 0.5, 0.6], 
        fields={'foo': 'bar'}
    )
)

# insert in batches
ret = collection.insert(
    [
        ('YOUR-DOC-ID4', [0.2, 0.7, 0.8, 1.3], {'age': 1}),
        ('YOUR-DOC-ID4', [0.3, 0.6, 0.9, 1.2], {'age': 2}),
        ('YOUR-DOC-ID6', [0.4, 0.5, 1.0, 1.1], {'age': 3})
    ]
)

# insert in batches
ret = collection.insert(
    [
        dashvector.Doc(id=str(i), vector=np.random.rand(4)) for i in range(10)
    ]
)

# async insert
ret_funture = collection.insert(
    [
        dashvector.Doc(id=str(i+10), vector=np.random.rand(4)) for i in range(10)
    ],
    async_req=True
)
ret = ret_funture.get()

Query a Collection

Collection.query(
    vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
    id: Optional[str] = None,
    topk: int = 10,
    filter: Optional[str] = None,
    include_vector: bool = False,
    partition: Optional[str] = None,
    output_fields: Optional[List[str]] = None,
    async_req: False
) -> DashVectorResponse
Parameters Type Required Description
vector Optional[Union[List[Union[int, float]], np.ndarray]] No The vector to query
id Optional[str] No The doc id to query.
Setting id means searching by vector corresponding to the id
topk Optional[str] No Number of similarity results to return.
Default value: 10
filter Optional[str] No Expression used to filter results
Default value: None
e.g. age>20
include_vector bool No Return vector details or not.
Default value: False
partition Optional[str] No Name of the partition to Query.
Default value: None
output_fields Optional[List[str]] No List of field names to return.
Default value: None, means return all fields
e.g. ['name', 'age']
async_req bool No Enable async request or not.
Default value: False

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
match_docs = collection.query([0.1, 0.2, 0.3, 0.4], topk=100, filter='age>20', include_vector=True, output_fields=['age','name','foo'])
if match_docs:
    for doc in match_docs:
        print(doc.id)
        print(doc.vector)
        print(doc.fields)
        print(doc.score)

Delete Docs

collection.delete(
    ids: Union[str, List[str]],
    delete_all: bool = False,
    partition: Optional[str] = None,
    async_req: bool = False
) -> DashVectorResponse
Parameters Type Required Description
ids Union[str, List[str]] Yes The id (or list of ids) for the Doc(s) to Delete
delete_all bool No Delete all vectors from partition.
Default value: False
partition Optional[str] No Name of the partition to Delete from.
Default value: None
async_req bool No Enable async request or not.
Default value: False

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
collection.delete(['YOUR-DOC-ID1','YOUR-DOC-ID2'])

Fetch Docs

Collection.fetch(
    ids: Union[str, List[str]],
    partition: Optional[str] = None,
    async_req: bool = False
) -> DashVectorResponse
Parameters Type Required Description
ids Union[str, List[str]] Yes The id (or list of ids) for the Doc(s) to Fetch
partition Optional[str] No Name of the partition to Fetch from.
Default value: None
async_req bool No Enable async request or not.
Default value: False

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
fetch_docs = collection.fetch(['YOUR-DOC-ID1', 'YOUR-DOC-ID2'])
if fetch_docs:
    for doc_id in fetch_docs:
        doc = fetch_docs[doc_id]
        print(doc.id)
        print(doc.vector)
        print(doc.fields)

Create Collection Partition

Collection.create_partition(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Partition to Create.
timeout Optional[int] No Timeout period (in seconds), -1 means asynchronous creation partition.
Default value: None

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.create_partition('YOUR-PARTITION-NAME')
assert rsp

Delete Collection Partition

Collection.delete_partition(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Partition to Delete.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.delete_partition('YOUR-PARTITION-NAME')
assert rsp

List Collection Partitions

Collection.list_partitions() -> DashVectorResponse

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
partitions = collection.list_partitions()

assert partitions
for pt in partitions:
    print(pt)

Describe Collection Partition

Collection.describe_partition(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Partition to Describe.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')

rsp = collection.describe_partition('shoes')
print(rsp)
# example output:
# {"request_id":"296267a7-68e2-483a-87e6-5992d85a5806","code":0,"message":"","output":"SERVING"}

Statistics for Collection Partition

Collection.stats_partition(name: str) -> DashVectorResponse

Parameters Type Required Description
name str Yes The name of the Partition to get Statistics.

Example:

import dashvector

client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')

rsp = collection.stats_partition('shoes')
print(rsp)
# example outptut:
# {
#     "code":0,
#     "message":"",
#     "requests_id":"330a2bcb-e4a7-4fc6-a711-2fe5f8a24e8c",
#     "output":{
#         "total_doc_count":0
#     }
# }

Class

dashvector.Doc

@dataclass(frozen=True)
class Doc(object):
    id: str
    vector: Union[List[int], List[float], numpy.ndarray]
    fields: Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] = None 
    score: float = 0.0

dashvector.DashVectorResponse

class DashVectorResponse(object):
    code: DashVectorCode
    message: str
    request_id: str
    output: Any

License

This project is licensed under the Apache License (Version 2.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dashvector-1.0.14.tar.gz (46.6 kB view hashes)

Uploaded Source

Built Distribution

dashvector-1.0.14-py3-none-any.whl (76.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page