A Python SDK for interacting with the Cosdata Vector Database

These details have not been verified by PyPI

Project links

Project description

Cosdata Python SDK

A Python SDK for interacting with the Cosdata Vector Database.

Installation

pip install cosdata-client

Quick Start

from cosdata import Client  # Import the Client class

# Initialize the client (all parameters are optional)
client = Client(
    host="http://127.0.0.1:8443",  # Default host
    username="admin",               # Default username
    password="admin",               # Default password
    verify=False                    # SSL verification
)

# Create a collection
collection = client.create_collection(
    name="my_collection",
    dimension=768,                  # Vector dimension
    description="My vector collection"
)

# Create an index (all parameters are optional)
index = collection.create_index(
    distance_metric="cosine",       # Default: cosine
    num_layers=10,                  # Default: 10
    max_cache_size=1000,            # Default: 1000
    ef_construction=128,            # Default: 128
    ef_search=64,                   # Default: 64
    neighbors_count=32,             # Default: 32
    level_0_neighbors_count=64      # Default: 64
)

# Generate some vectors (example with random data)
import numpy as np

def generate_random_vector(id: int, dimension: int) -> dict:
    values = np.random.uniform(-1, 1, dimension).tolist()
    return {
        "id": f"vec_{id}",
        "dense_values": values,
        "document_id": f"doc_{id//10}",  # Group vectors into documents
        "metadata": {  # Optional metadata
            "created_at": "2024-03-20",
            "category": "example"
        }
    }

# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]

# Add vectors using a transaction
with collection.transaction() as txn:
    # Single vector upsert
    txn.upsert_vector(vectors[0])
    # Batch upsert for remaining vectors
    txn.batch_upsert_vectors(vectors[1:], max_workers=8, max_retries=3)

# Search for similar vectors
results = collection.search.dense(
    query_vector=vectors[0]["dense_values"],  # Use first vector as query
    top_k=5,                                  # Number of nearest neighbors
    return_raw_text=True
)

# Fetch a specific vector
vector = collection.vectors.get("vec_1")

# Get collection information
collection_info = collection.get_info()
print(f"Collection info: {collection_info}")

# List all collections
print("Available collections:")
for coll in client.collections():
    print(f" - {coll.name}")

# Version management
current_version = collection.versions.get_current()
print(f"Current version: {current_version}")

🧩 Embedding Generation (Optional Convenience Feature)

Cosdata SDK provides a convenience utility for generating embeddings using cosdata-fastembed. This is optional—if you already have your own embeddings, you can use those directly. If you want to generate embeddings in Python, you can use the following utility:

from cosdata.embedding import embed_texts

texts = [
    "Cosdata makes vector search easy!",
    "This is a test of the embedding utility."
]
embeddings = embed_texts(texts, model_name="thenlper/gte-base")  # Specify any supported model

See the cosdata-fastembed supported models list for available model names and dimensions.
The output is a list of lists (one embedding per input text), ready to upsert into your collection.
If cosdata-fastembed is not installed, a helpful error will be raised.

Methods

embed_texts

embed_texts(texts: List[str], model_name: str = "BAAI/bge-small-en-v1.5") -> List[List[float]]
- Generates embeddings for a list of texts using cosdata-fastembed. Returns a list of embedding vectors (as plain Python lists). Raises ImportError if cosdata-fastembed is not installed.
Example:
```
from cosdata.embedding import embed_texts
embeddings = embed_texts(["hello world"], model_name="thenlper/gte-base")
```

API Reference

Client

The main client for interacting with the Vector Database API.

client = Client(
    host="http://127.0.0.1:8443",  # Optional
    username="admin",               # Optional
    password="admin",               # Optional
    verify=False                    # Optional
)

Methods:

create_collection(...) -> Collection

Returns a Collection object. Collection info can be accessed via collection.get_info():

{
  "name": str,
  "description": str,
  "dense_vector": {"enabled": bool, "dimension": int},
  "sparse_vector": {"enabled": bool},
  "tf_idf_options": {"enabled": bool}
}

collections() -> List[Collection]
- Returns a list of Collection objects.
get_collection(name: str) -> Collection
- Returns a Collection object for the given name.

Collection

The Collection class provides access to all collection-specific operations.

collection = client.create_collection(
    name="my_collection",
    dimension=768,
    description="My collection"
)

Methods:

create_index(...) -> Index
- Returns an Index object. Index info can be fetched (if implemented) as:
```
{
  "dense": {...},
  "sparse": {...},
  "tf-idf": {...}
}
```
create_sparse_index(...) -> Index
create_tf_idf_index(...) -> Index
get_index(name: str) -> Index
get_info() -> dict
- Returns collection metadata as above.
delete() -> None
load() -> None
unload() -> None
transaction() -> Transaction (context manager)

Transaction

The Transaction class provides methods for vector operations.

with collection.transaction() as txn:
    txn.upsert_vector(vector)  # Single vector
    txn.batch_upsert_vectors(vectors, max_workers=8, max_retries=3)  # Multiple vectors, with parallelism and retries

Methods:

upsert_vector(vector: Dict[str, Any]) -> None
batch_upsert_vectors(vectors: List[Dict[str, Any]], max_workers: Optional[int] = None, max_retries: int = 3) -> None
- vectors: List of vector dictionaries to upsert
- max_workers: Number of threads to use for parallel upserts (default: all available CPU threads)
- max_retries: Number of times to retry a failed batch (default: 3)
commit() -> None
abort() -> None

Search

The Search class provides methods for vector similarity search.

results = collection.search.dense(
    query_vector=vector,
    top_k=5,
    return_raw_text=True
)

Methods:

dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> dict

Returns:

{
  "results": [
    {
      "id": str,
      "document_id": str,
      "score": float,
      "text": str | None
    },
    ...
  ]
}

sparse(query_terms: List[dict], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> dict
- Same structure as above.
text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> dict
- Same structure as above.

Vectors

The Vectors class provides methods for vector operations.

vector = collection.vectors.get("vec_1")
exists = collection.vectors.exists("vec_1")

Methods:

get(vector_id: str) -> Vector

Returns a Vector dataclass object with attributes:

vector.id: str
vector.document_id: Optional[str]
vector.dense_values: Optional[List[float]]
vector.sparse_indices: Optional[List[int]]
vector.sparse_values: Optional[List[float]]
vector.text: Optional[str]

get_by_document_id(document_id: str) -> List[Vector]
- Returns a list of Vector objects as above.
exists(vector_id: str) -> bool
- Returns True if the vector exists, else False.

Versions

The Versions class provides methods for version management.

current_version = collection.versions.get_current()
all_versions = collection.versions.list()

Methods:

list() -> dict

Returns:

{
  "versions": [
    {
      "hash": str,
      "version_number": int,
      "timestamp": int,
      "vector_count": int
    },
    ...
  ],
  "current_hash": str
}

get_current() -> Version

Returns a Version dataclass object with attributes:

version.hash: str
version.version_number: int
version.timestamp: int
version.vector_count: int
version.created_at: datetime  # property for creation time

get(version_hash: str) -> Version
- Same as above.

Best Practices

Connection Management
- Reuse the client instance across your application
- The client automatically handles authentication and token management
Vector Operations
- Use transactions for batch operations
- The context manager (with statement) automatically handles commit/abort
- Maximum batch size is 200 vectors per transaction
Error Handling
- All operations raise exceptions on failure
- Use try/except blocks for error handling
- Transactions automatically abort on exceptions when using the context manager
Performance
- Adjust index parameters based on your use case
- Use appropriate vector dimensions
- Consider batch sizes for large operations
Version Management
- Create versions before major changes
- Use versions to track collection evolution
- Clean up old versions when no longer needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

May 20, 2025

0.2.1

May 2, 2025

0.2.0

Apr 29, 2025

0.1.5

Apr 28, 2025

0.1.4

Mar 24, 2025

0.1.3

Mar 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosdata_client-0.2.2.tar.gz (15.9 kB view details)

Uploaded May 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cosdata_client-0.2.2-py3-none-any.whl (36.1 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file cosdata_client-0.2.2.tar.gz.

File metadata

Download URL: cosdata_client-0.2.2.tar.gz
Upload date: May 20, 2025
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`d98030773d27b6785c00cb0d86dcb2f7e1dd08e349c0cb20d4e74f7a371cce02`
MD5	`85d26c1e7a7d7e90a7f49b857cb94fe6`
BLAKE2b-256	`4faf2f3ac948547edae4a29ef16ba2f12a9615199f5f651785825e353fbc36a9`

See more details on using hashes here.

File details

Details for the file cosdata_client-0.2.2-py3-none-any.whl.

File metadata

Download URL: cosdata_client-0.2.2-py3-none-any.whl
Upload date: May 20, 2025
Size: 36.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cosdata_client-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`174f4fe7c5861524d534e26de07a339ccea099908924fec252138798733ae92b`
MD5	`4ff2184b17aae391a4f3b70b1a597759`
BLAKE2b-256	`947d1dd65296f3ea34fb84063cbc99a59d0cc85c7f02b45c13ea2b9c9139c89c`

See more details on using hashes here.

cosdata-client 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cosdata Python SDK

Installation

Quick Start

🧩 Embedding Generation (Optional Convenience Feature)

Methods

embed_texts

API Reference

Client

Collection

Transaction

Search

Vectors

Versions

Best Practices

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes