A Python SDK for interacting with the Cosdata Vector Database
Project description
Cosdata Python SDK
A Python SDK for interacting with the Cosdata Vector Database.
Installation
pip install cosdata-client
Quick Start
from cosdata import Client # Import the Client class
# Initialize the client (all parameters are optional)
client = Client(
host="http://127.0.0.1:8443", # Default host
username="admin", # Default username
password="admin", # Default password
verify=False # SSL verification
)
# Create a collection
collection = client.create_collection(
name="my_collection",
dimension=768, # Vector dimension
description="My vector collection"
)
# Create an index (all parameters are optional)
index = collection.create_index(
distance_metric="cosine", # Default: cosine
num_layers=10, # Default: 10
max_cache_size=1000, # Default: 1000
ef_construction=128, # Default: 128
ef_search=64, # Default: 64
neighbors_count=32, # Default: 32
level_0_neighbors_count=64 # Default: 64
)
# Generate some vectors (example with random data)
import numpy as np
def generate_random_vector(id: int, dimension: int) -> dict:
values = np.random.uniform(-1, 1, dimension).tolist()
return {
"id": f"vec_{id}",
"dense_values": values,
"document_id": f"doc_{id//10}", # Group vectors into documents
"metadata": { # Optional metadata
"created_at": "2024-03-20",
"category": "example"
}
}
# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]
# Add vectors using a transaction
with collection.transaction() as txn:
# Single vector upsert
txn.upsert_vector(vectors[0])
# Batch upsert for remaining vectors
txn.batch_upsert_vectors(vectors[1:], max_workers=8, max_retries=3)
# Search for similar vectors
results = collection.search.dense(
query_vector=vectors[0]["dense_values"], # Use first vector as query
top_k=5, # Number of nearest neighbors
return_raw_text=True
)
# Fetch a specific vector
vector = collection.vectors.get("vec_1")
# Get collection information
collection_info = collection.get_info()
print(f"Collection info: {collection_info}")
# List all collections
print("Available collections:")
for coll in client.collections():
print(f" - {coll.name}")
# Version management
current_version = collection.versions.get_current()
print(f"Current version: {current_version}")
API Reference
Client
The main client for interacting with the Vector Database API.
client = Client(
host="http://127.0.0.1:8443", # Optional
username="admin", # Optional
password="admin", # Optional
verify=False # Optional
)
Methods:
create_collection(...) -> Collection- Returns a
Collectionobject. Collection info can be accessed viacollection.get_info():{ "name": str, "description": str, "dense_vector": {"enabled": bool, "dimension": int}, "sparse_vector": {"enabled": bool}, "tf_idf_options": {"enabled": bool} }
- Returns a
collections() -> List[Collection]- Returns a list of
Collectionobjects.
- Returns a list of
get_collection(name: str) -> Collection- Returns a
Collectionobject for the given name.
- Returns a
Collection
The Collection class provides access to all collection-specific operations.
collection = client.create_collection(
name="my_collection",
dimension=768,
description="My collection"
)
Methods:
create_index(...) -> Index- Returns an
Indexobject. Index info can be fetched (if implemented) as:{ "dense": {...}, "sparse": {...}, "tf-idf": {...} }
- Returns an
create_sparse_index(...) -> Indexcreate_tf_idf_index(...) -> Indexget_index(name: str) -> Indexget_info() -> dict- Returns collection metadata as above.
delete() -> Noneload() -> Noneunload() -> Nonetransaction() -> Transaction(context manager)
Transaction
The Transaction class provides methods for vector operations.
with collection.transaction() as txn:
txn.upsert_vector(vector) # Single vector
txn.batch_upsert_vectors(vectors, max_workers=8, max_retries=3) # Multiple vectors, with parallelism and retries
Methods:
upsert_vector(vector: Dict[str, Any]) -> Nonebatch_upsert_vectors(vectors: List[Dict[str, Any]], max_workers: Optional[int] = None, max_retries: int = 3) -> Nonevectors: List of vector dictionaries to upsertmax_workers: Number of threads to use for parallel upserts (default: all available CPU threads)max_retries: Number of times to retry a failed batch (default: 3)
commit() -> Noneabort() -> None
Search
The Search class provides methods for vector similarity search.
results = collection.search.dense(
query_vector=vector,
top_k=5,
return_raw_text=True
)
Methods:
dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> dict- Returns:
{ "results": [ { "id": str, "document_id": str, "score": float, "text": str | None }, ... ] }
- Returns:
sparse(query_terms: List[dict], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> dict- Same structure as above.
text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> dict- Same structure as above.
Vectors
The Vectors class provides methods for vector operations.
vector = collection.vectors.get("vec_1")
exists = collection.vectors.exists("vec_1")
Methods:
get(vector_id: str) -> Vector- Returns a
Vectordataclass object with attributes:vector.id: str vector.document_id: Optional[str] vector.dense_values: Optional[List[float]] vector.sparse_indices: Optional[List[int]] vector.sparse_values: Optional[List[float]] vector.text: Optional[str]
- Returns a
get_by_document_id(document_id: str) -> List[Vector]- Returns a list of
Vectorobjects as above.
- Returns a list of
exists(vector_id: str) -> bool- Returns
Trueif the vector exists, elseFalse.
- Returns
Versions
The Versions class provides methods for version management.
current_version = collection.versions.get_current()
all_versions = collection.versions.list()
Methods:
list() -> dict- Returns:
{ "versions": [ { "hash": str, "version_number": int, "timestamp": int, "vector_count": int }, ... ], "current_hash": str }
- Returns:
get_current() -> Version- Returns a
Versiondataclass object with attributes:version.hash: str version.version_number: int version.timestamp: int version.vector_count: int version.created_at: datetime # property for creation time
- Returns a
get(version_hash: str) -> Version- Same as above.
Best Practices
-
Connection Management
- Reuse the client instance across your application
- The client automatically handles authentication and token management
-
Vector Operations
- Use transactions for batch operations
- The context manager (
withstatement) automatically handles commit/abort - Maximum batch size is 200 vectors per transaction
-
Error Handling
- All operations raise exceptions on failure
- Use try/except blocks for error handling
- Transactions automatically abort on exceptions when using the context manager
-
Performance
- Adjust index parameters based on your use case
- Use appropriate vector dimensions
- Consider batch sizes for large operations
-
Version Management
- Create versions before major changes
- Use versions to track collection evolution
- Clean up old versions when no longer needed
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosdata_client-0.2.1.tar.gz.
File metadata
- Download URL: cosdata_client-0.2.1.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ac2913a276ea3d1c4b06a15fa6d20d7dd6d2e6af2048a3e0bb7f9960030e4bc
|
|
| MD5 |
7343e57d1cdb4b688b44fbe597c938bc
|
|
| BLAKE2b-256 |
aa460872fd078f96a819b926ff74af91ec00fbf16f3b3569eaa49250bc0c9b7b
|
File details
Details for the file cosdata_client-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cosdata_client-0.2.1-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
982c51e77980b2a71105b57794473d5ca3c45bdcbdbc9dc58bb38e1c17bb1c55
|
|
| MD5 |
2c54b2604e585fce86b24b01ed118870
|
|
| BLAKE2b-256 |
dcf3807b5f419070131f8747d2222415a662c23a3deb9096a224698ab8b4b940
|