Skip to main content

A Python SDK for interacting with the Cosdata Vector Database

Project description

Cosdata Python SDK

A Python SDK for interacting with the Cosdata Vector Database.

Installation

pip install cosdata-client

Quick Start

from cosdata import Client  # Import the Client class

# Initialize the client (all parameters are optional)
client = Client(
    host="http://127.0.0.1:8443",  # Default host
    username="admin",               # Default username
    password="admin",               # Default password
    verify=False                    # SSL verification
)

# Create a collection
collection = client.create_collection(
    name="my_collection",
    dimension=768,                  # Vector dimension
    description="My vector collection"
)

# Create an index (all parameters are optional)
index = collection.create_index(
    distance_metric="cosine",       # Default: cosine
    num_layers=10,                  # Default: 10
    max_cache_size=1000,            # Default: 1000
    ef_construction=128,            # Default: 128
    ef_search=64,                   # Default: 64
    neighbors_count=32,             # Default: 32
    level_0_neighbors_count=64      # Default: 64
)

# Generate some vectors (example with random data)
import numpy as np

def generate_random_vector(id: int, dimension: int) -> dict:
    values = np.random.uniform(-1, 1, dimension).tolist()
    return {
        "id": f"vec_{id}",
        "dense_values": values,
        "document_id": f"doc_{id//10}",  # Group vectors into documents
        "metadata": {  # Optional metadata
            "created_at": "2024-03-20",
            "category": "example"
        }
    }

# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]

# Add vectors using a transaction
with collection.transaction() as txn:
    # Single vector upsert
    txn.upsert_vector(vectors[0])
    # Batch upsert for remaining vectors
    txn.batch_upsert_vectors(vectors[1:], max_workers=8, max_retries=3)

# Search for similar vectors
results = collection.search.dense(
    query_vector=vectors[0]["dense_values"],  # Use first vector as query
    top_k=5,                                  # Number of nearest neighbors
    return_raw_text=True
)

# Fetch a specific vector
vector = collection.vectors.get("vec_1")

# Get collection information
collection_info = collection.get_info()
print(f"Collection info: {collection_info}")

# List all collections
print("Available collections:")
for coll in client.collections():
    print(f" - {coll.name}")

# Version management
current_version = collection.versions.get_current()
print(f"Current version: {current_version}")

API Reference

Client

The main client for interacting with the Vector Database API.

client = Client(
    host="http://127.0.0.1:8443",  # Optional
    username="admin",               # Optional
    password="admin",               # Optional
    verify=False                    # Optional
)

Methods:

  • create_collection(...) -> Collection
    • Returns a Collection object. Collection info can be accessed via collection.get_info():
      {
        "name": str,
        "description": str,
        "dense_vector": {"enabled": bool, "dimension": int},
        "sparse_vector": {"enabled": bool},
        "tf_idf_options": {"enabled": bool}
      }
      
  • collections() -> List[Collection]
    • Returns a list of Collection objects.
  • get_collection(name: str) -> Collection
    • Returns a Collection object for the given name.

Collection

The Collection class provides access to all collection-specific operations.

collection = client.create_collection(
    name="my_collection",
    dimension=768,
    description="My collection"
)

Methods:

  • create_index(...) -> Index
    • Returns an Index object. Index info can be fetched (if implemented) as:
      {
        "dense": {...},
        "sparse": {...},
        "tf-idf": {...}
      }
      
  • create_sparse_index(...) -> Index
  • create_tf_idf_index(...) -> Index
  • get_index(name: str) -> Index
  • get_info() -> dict
    • Returns collection metadata as above.
  • delete() -> None
  • load() -> None
  • unload() -> None
  • transaction() -> Transaction (context manager)

Transaction

The Transaction class provides methods for vector operations.

with collection.transaction() as txn:
    txn.upsert_vector(vector)  # Single vector
    txn.batch_upsert_vectors(vectors, max_workers=8, max_retries=3)  # Multiple vectors, with parallelism and retries

Methods:

  • upsert_vector(vector: Dict[str, Any]) -> None
  • batch_upsert_vectors(vectors: List[Dict[str, Any]], max_workers: Optional[int] = None, max_retries: int = 3) -> None
    • vectors: List of vector dictionaries to upsert
    • max_workers: Number of threads to use for parallel upserts (default: all available CPU threads)
    • max_retries: Number of times to retry a failed batch (default: 3)
  • commit() -> None
  • abort() -> None

Search

The Search class provides methods for vector similarity search.

results = collection.search.dense(
    query_vector=vector,
    top_k=5,
    return_raw_text=True
)

Methods:

  • dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> dict
    • Returns:
      {
        "results": [
          {
            "id": str,
            "document_id": str,
            "score": float,
            "text": str | None
          },
          ...
        ]
      }
      
  • sparse(query_terms: List[dict], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> dict
    • Same structure as above.
  • text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> dict
    • Same structure as above.

Vectors

The Vectors class provides methods for vector operations.

vector = collection.vectors.get("vec_1")
exists = collection.vectors.exists("vec_1")

Methods:

  • get(vector_id: str) -> Vector
    • Returns a Vector dataclass object with attributes:
      vector.id: str
      vector.document_id: Optional[str]
      vector.dense_values: Optional[List[float]]
      vector.sparse_indices: Optional[List[int]]
      vector.sparse_values: Optional[List[float]]
      vector.text: Optional[str]
      
  • get_by_document_id(document_id: str) -> List[Vector]
    • Returns a list of Vector objects as above.
  • exists(vector_id: str) -> bool
    • Returns True if the vector exists, else False.

Versions

The Versions class provides methods for version management.

current_version = collection.versions.get_current()
all_versions = collection.versions.list()

Methods:

  • list() -> dict
    • Returns:
      {
        "versions": [
          {
            "hash": str,
            "version_number": int,
            "timestamp": int,
            "vector_count": int
          },
          ...
        ],
        "current_hash": str
      }
      
  • get_current() -> Version
    • Returns a Version dataclass object with attributes:
      version.hash: str
      version.version_number: int
      version.timestamp: int
      version.vector_count: int
      version.created_at: datetime  # property for creation time
      
  • get(version_hash: str) -> Version
    • Same as above.

Best Practices

  1. Connection Management

    • Reuse the client instance across your application
    • The client automatically handles authentication and token management
  2. Vector Operations

    • Use transactions for batch operations
    • The context manager (with statement) automatically handles commit/abort
    • Maximum batch size is 200 vectors per transaction
  3. Error Handling

    • All operations raise exceptions on failure
    • Use try/except blocks for error handling
    • Transactions automatically abort on exceptions when using the context manager
  4. Performance

    • Adjust index parameters based on your use case
    • Use appropriate vector dimensions
    • Consider batch sizes for large operations
  5. Version Management

    • Create versions before major changes
    • Use versions to track collection evolution
    • Clean up old versions when no longer needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosdata_client-0.2.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cosdata_client-0.2.1-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file cosdata_client-0.2.1.tar.gz.

File metadata

  • Download URL: cosdata_client-0.2.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1ac2913a276ea3d1c4b06a15fa6d20d7dd6d2e6af2048a3e0bb7f9960030e4bc
MD5 7343e57d1cdb4b688b44fbe597c938bc
BLAKE2b-256 aa460872fd078f96a819b926ff74af91ec00fbf16f3b3569eaa49250bc0c9b7b

See more details on using hashes here.

File details

Details for the file cosdata_client-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cosdata_client-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 982c51e77980b2a71105b57794473d5ca3c45bdcbdbc9dc58bb38e1c17bb1c55
MD5 2c54b2604e585fce86b24b01ed118870
BLAKE2b-256 dcf3807b5f419070131f8747d2222415a662c23a3deb9096a224698ab8b4b940

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page