Skip to main content

A Python SDK for interacting with the Cosdata Vector Database

Project description

Cosdata Python SDK

A Python SDK for interacting with the Cosdata Vector Database.

Installation

pip install cosdata-client

Quick Start

from cosdata import Client  # Import the Client class

# Initialize the client (all parameters are optional)
client = Client(
    host="http://127.0.0.1:8443",  # Default host
    username="admin",               # Default username
    password="admin",               # Default password
    verify=False                    # SSL verification
)

# Create a collection
collection = client.create_collection(
    name="my_collection",
    dimension=768,                  # Vector dimension
    description="My vector collection"
)

# Create an index (all parameters are optional)
index = collection.create_index(
    distance_metric="cosine",       # Default: cosine
    num_layers=10,                  # Default: 10
    max_cache_size=1000,            # Default: 1000
    ef_construction=128,            # Default: 128
    ef_search=64,                   # Default: 64
    neighbors_count=32,             # Default: 32
    level_0_neighbors_count=64      # Default: 64
)

# Generate some vectors (example with random data)
import numpy as np

def generate_random_vector(id: int, dimension: int) -> dict:
    values = np.random.uniform(-1, 1, dimension).tolist()
    return {
        "id": f"vec_{id}",
        "dense_values": values,
        "document_id": f"doc_{id//10}",  # Group vectors into documents
        "metadata": {  # Optional metadata
            "created_at": "2024-03-20",
            "category": "example"
        }
    }

# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]

# Add vectors using a transaction
with collection.transaction() as txn:
    # Single vector upsert
    txn.upsert_vector(vectors[0])
    # Batch upsert for remaining vectors
    txn.batch_upsert_vectors(vectors[1:])

# Search for similar vectors
results = collection.search.dense(
    query_vector=vectors[0]["dense_values"],  # Use first vector as query
    top_k=5,                                  # Number of nearest neighbors
    return_raw_text=True
)

# Fetch a specific vector
vector = collection.vectors.get("vec_1")

# Get collection information
collection_info = collection.get_info()
print(f"Collection info: {collection_info}")

# List all collections
print("Available collections:")
for coll in client.collections():
    print(f" - {coll.name}")

# Version management
current_version = collection.versions.get_current()
print(f"Current version: {current_version}")

API Reference

Client

The main client for interacting with the Vector Database API.

client = Client(
    host="http://127.0.0.1:8443",  # Optional
    username="admin",               # Optional
    password="admin",               # Optional
    verify=False                    # Optional
)

Methods:

  • create_collection(name: str, dimension: int = 1024, description: Optional[str] = None, dense_vector: Optional[Dict[str, Any]] = None, sparse_vector: Optional[Dict[str, Any]] = None, tf_idf_options: Optional[Dict[str, Any]] = None) -> Collection
  • collections() -> List[Collection]
  • get_collection(name: str) -> Collection

Collection

The Collection class provides access to all collection-specific operations.

collection = client.create_collection(
    name="my_collection",
    dimension=768,
    description="My collection"
)

Methods:

  • create_index(distance_metric: str = "cosine", num_layers: int = 7, max_cache_size: int = 1000, ef_construction: int = 512, ef_search: int = 256, neighbors_count: int = 32, level_0_neighbors_count: int = 64) -> Index
  • create_sparse_index(name: str, quantization: int = 64, sample_threshold: int = 1000) -> Index
  • create_tf_idf_index(name: str, sample_threshold: int = 1000, k1: float = 1.2, b: float = 0.75) -> Index
  • get_index(name: str) -> Index
  • get_info() -> Dict[str, Any]
  • delete() -> None
  • load() -> None
  • unload() -> None
  • transaction() -> Transaction (context manager)

Transaction

The Transaction class provides methods for vector operations.

with collection.transaction() as txn:
    txn.upsert_vector(vector)  # Single vector
    txn.batch_upsert_vectors(vectors)  # Multiple vectors

Methods:

  • upsert_vector(vector: Dict[str, Any]) -> None
  • batch_upsert_vectors(vectors: List[Dict[str, Any]]) -> None
  • commit() -> None
  • abort() -> None

Search

The Search class provides methods for vector similarity search.

results = collection.search.dense(
    query_vector=vector,
    top_k=5,
    return_raw_text=True
)

Methods:

  • dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> Dict[str, Any]
  • sparse(query_terms: List[List[float]], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> Dict[str, Any]
  • text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> Dict[str, Any]

Vectors

The Vectors class provides methods for vector operations.

vector = collection.vectors.get("vec_1")
exists = collection.vectors.exists("vec_1")

Methods:

  • get(vector_id: str) -> Dict[str, Any]
  • get_by_document_id(document_id: str) -> List[Dict[str, Any]]
  • exists(vector_id: str) -> bool

Versions

The Versions class provides methods for version management.

current_version = collection.versions.get_current()
all_versions = collection.versions.list()

Methods:

  • list() -> List[Dict[str, Any]]
  • get_current() -> Dict[str, Any]
  • get(version_hash: str) -> Dict[str, Any]

Best Practices

  1. Connection Management

    • Reuse the client instance across your application
    • The client automatically handles authentication and token management
  2. Vector Operations

    • Use transactions for batch operations
    • The context manager (with statement) automatically handles commit/abort
    • Maximum batch size is 200 vectors per transaction
  3. Error Handling

    • All operations raise exceptions on failure
    • Use try/except blocks for error handling
    • Transactions automatically abort on exceptions when using the context manager
  4. Performance

    • Adjust index parameters based on your use case
    • Use appropriate vector dimensions
    • Consider batch sizes for large operations
  5. Version Management

    • Create versions before major changes
    • Use versions to track collection evolution
    • Clean up old versions when no longer needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosdata_client-0.2.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cosdata_client-0.2.0-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file cosdata_client-0.2.0.tar.gz.

File metadata

  • Download URL: cosdata_client-0.2.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3cb60930522b58f60227880eed555ee8f727ced519269c219e68af6c6ff47d8
MD5 b6487fc632bf0b185779be449fd38d97
BLAKE2b-256 86aef7c2e1f2f190741fa0824ebc1a1a5f6cd599aa5fd03d7e88f5a038401d69

See more details on using hashes here.

File details

Details for the file cosdata_client-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cosdata_client-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4fa158a2fec0520e9176b4bcc7ec6ff8a6c5dfef4f82d86d333dc76f249e2ce
MD5 5929afa150e9a28640dc7260be71d333
BLAKE2b-256 ab5bea8318487cfd1bda7fffafe5e1873686c5190b5d05d8eb386347ed0bcb34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page