Skip to main content

TinyVecDB is a high performance, lightweight, embedded vector database for similarity search.

Project description

TinyVecDB Python API Documentation

This document provides a comprehensive overview of the TinyVecDB Python API.

Table of Contents

Installation

pip install tinyvecdb

Core Concepts

TinyVecDB is an embedded vector database that emphasizes speed, low memory usage, and simplicity. The core of TinyVecDB is written in C, and this library provides a Python binding to that engine. The key concepts are:

  • Embeddings: Fixed-dimension float vectors (e.g., 512 dimensions)
  • Metadata: JSON-serializable data associated with each vector
  • Similarity Search: Finding the nearest neighbors to a query vector using cosine similarity
  • Filtering: Query vectors based on metadata attributes

Basic Usage

import asyncio
import numpy as np
import tinyvec

async def example():
    # Connect to database (will create the file if it doesn't exist)
    client = tinyvec.TinyVecClient()
    config = tinyvec.ClientConfig(dimensions=512)
    client.connect("./vectors.db", config)

    # Create sample vectors
    insertions = []
    for i in range(50):
        # Using NumPy (more efficient)
        vec = np.random.rand(512).astype(np.float32)
        vec = vec / np.linalg.norm(vec)  # Normalize the vector

        # Or using standard Python lists
        # vec = [random.random() for _ in range(512)]

        insertions.append(tinyvec.Insertion(
            vector=vec,
            metadata={"name": f"item-{i}", "category": "example"}
        ))

    # Insert vectors
    inserted = await client.insert(insertions)
    print("Inserted:", inserted)

    # Search for similar vectors (without filtering)
    query_vec = np.random.rand(512).astype(np.float32)
    results = await client.search(query_vec, 5)
    # Example results:
    # [SearchResult(similarity=0.801587700843811, id=8, metadata={'category': 'example', 'name': 'item-8'}),
    #  SearchResult(similarity=0.7834401726722717, id=16, metadata={'category': 'example', 'name': 'item-16'}),
    #  SearchResult(similarity=0.7815409898757935, id=5, metadata={'category': 'example', 'name': 'item-5'})]

    # Search with filtering
    search_options = tinyvec.SearchOptions(
        filter={"category": {"$eq": "example"}}
    )
    filtered_results = await client.search(query_vec, 5, search_options)

    # Delete items by ID
    delete_result = await client.delete_by_ids([1, 2, 3])
    print(f"Deleted {delete_result.deleted_count} vectors. Success: {delete_result.success}")

    # Delete by metadata filter
    filter_result = await client.delete_by_filter(search_options)
    print(f"Deleted {filter_result.deleted_count} vectors by filter. Success: {filter_result.success}")

    # Get database statistics
    stats = await client.get_index_stats()
    print(f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions")

if __name__ == "__main__":
    asyncio.run(example())

API Reference

TinyVecClient

The main class you'll interact with is TinyVecClient. It provides all methods for managing the vector database.

Constructor and Connection

TinyVecClient()

Creates a new TinyVecDB client instance.

Example:

client = tinyvec.TinyVecClient()
connect(path, config)

Connects to a TinyVecDB database.

Parameters:

  • path: str - Path to the database file
  • config: ClientConfig - Configuration options

Example:

config = tinyvec.ClientConfig(dimensions=512)
client.connect("./vectors.db", config)

Instance Methods

async insert(vectors)

Inserts vectors with metadata into the database. Each metadata item must be a JSON-serializable object.

Parameters:

  • vectors: List[Insertion] - List of vectors to insert

Returns:

  • int - The number of vectors successfully inserted

Example:

vector = np.zeros(512, dtype=np.float32) + 0.1
count = await client.insert([
  tinyvec.Insertion(
    vector=vector,
    metadata={"document_id": "doc1", "title": "Example Document", "category": "reference"}
  )
])
# Example: count = 1
async search(query_vector, top_k, search_options=None)

Searches for the most similar vectors to the query vector.

Parameters:

  • query_vector: Union[List[float], np.ndarray]
    A query vector to search for, which can be any of the following types:

    • Python list of numbers
    • NumPy array (any numeric dtype)

    Internally, it will be converted to a float32 array for similarity calculations.

  • top_k: int - Number of results to return

  • search_options: SearchOptions - Optional. Contains filter criteria for the search.

Returns:

  • List[SearchResult] - List of search results

Example:

# Search without filtering
results = await client.search(query_vector, 10)
# Example results:
# [SearchResult(similarity=0.801587700843811, id=8, metadata={'id': 8}),
#  SearchResult(similarity=0.7834401726722717, id=16, metadata={'id': 16}),
#  SearchResult(similarity=0.7815409898757935, id=5, metadata={'id': 5})]

# Search with filtering
search_options = tinyvec.SearchOptions(
    filter={"year": {"$eq": 2024}}
)
filtered_results = await client.search(query_vector, 10, search_options)
async delete_by_ids(ids)

Deletes vectors by their IDs.

Parameters:

  • ids: List[int] - List of vector IDs to delete

Returns:

  • DeletionResult - Object containing deletion count and success status

Example:

result = await client.delete_by_ids([1, 2, 3])
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
# Example output: Deleted 3 vectors. Success: True
async delete_by_filter(search_options)

Deletes vectors that match the given filter criteria.

Parameters:

  • search_options: SearchOptions - Contains filter criteria for deletion

Returns:

  • DeletionResult - Object containing deletion count and success status

Example:

search_options = tinyvec.SearchOptions(
    filter={"year": {"$eq": 2024}}
)
result = await client.delete_by_filter(search_options)
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
async update_by_id(items)

Updates existing database entries with new metadata and/or vector values.

Parameters:

  • items: List[UpdateItem] - List of items to update, each containing ID and optional metadata/vector data

Returns:

  • UpdateResult - Object containing update count and success status

Example:

import numpy as np
import tinyvec

# Create update items
update_items = [
    tinyvec.UpdateItem(
        id=1,
        vector=np.random.rand(128).astype(np.float32),
        metadata={"category": "electronics", "in_stock": True}
    ),
    tinyvec.UpdateItem(
        id=42,
        metadata={"price": 99.99, "featured": True}
    ),
    tinyvec.UpdateItem(
        id=75,
        vector=np.random.rand(128).astype(np.float32)
    )
]

# Update entries in the database
result = await client.update_by_id(update_items)
print(f"Updated {result.updated_count} entries. Success: {result.success}")
async get_index_stats()

Retrieves statistics about the database.

Parameters:

  • None

Returns:

  • Stats - Database statistics

Example:

stats = await client.get_index_stats()
print(
  f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions"
)
# Example output: Database has 47 vectors with 512 dimensions

Supporting Classes

DeletionResult

Result from delete operations.

Properties:

  • deleted_count: int - Number of vectors deleted
  • success: bool - Whether the operation was successful

Example:

result = await client.delete_by_ids([1, 2, 3])
if result.success:
    print(f"Successfully deleted {result.deleted_count} vectors")
else:
    print("Deletion operation failed")
# Example output: Successfully deleted 3 vectors

ClientConfig

Configuration for the vector database.

Parameters:

  • dimensions: int - The dimensionality of vectors to be stored

Example:

config = tinyvec.ClientConfig(dimensions=512)

Insertion

Class representing a vector to be inserted.

Parameters:

  • vector: Union[List[float], np.ndarray] - The vector data
  • metadata: Dict - JSON-serializable metadata associated with the vector

Example:

insertion = tinyvec.Insertion(
    vector=np.random.rand(512).astype(np.float32),
    metadata={"category": "example"}
)

SearchOptions

Options for search queries, including filtering.

Parameters:

  • filter: Dict - Filter criteria in MongoDB-like query syntax

Available filter operators:

  • $eq: Matches values equal to a specified value
  • $gt: Matches values greater than a specified value
  • $gte: Matches values greater than or equal to a specified value
  • $in: Matches any values specified in an array
  • $lt: Matches values less than a specified value
  • $lte: Matches values less than or equal to a specified value
  • $ne: Matches values not equal to a specified value
  • $nin: Matches none of the values specified in an array

Filters can be nested for complex queries.

Example:

# Simple filter
search_options = tinyvec.SearchOptions(
    filter={"make": {"$eq": "Toyota"}}
)

# Complex nested filter
search_options = tinyvec.SearchOptions(
    filter={
        "category": {
            "subcategory": {
                "value": {"$gt": 200}
            }
        },
        "tags": {"$in": ["premium", "featured"]}
    }
)

SearchResult

Class representing a search result.

Properties:

  • similarity: float - Cosine similarity score
  • id: int - ID of the matched vector
  • metadata: Dict | None - Metadata associated with the matched vector

Example:

# Results from a search query
for result in results:
    print(f"ID: {result.id}, Similarity: {result.similarity}, Metadata: {result.metadata}")
# Example output:
# ID: 34, Similarity: 0.8109967303276062, Metadata: {'category': 'example', 'name': 'item-34'}
# ID: 46, Similarity: 0.789353609085083, Metadata: {'category': 'example', 'name': 'item-46'}
# ID: 22, Similarity: 0.7870827913284302, Metadata: {'category': 'example', 'name': 'item-22'}

Stats

Class representing database statistics.

Properties:

  • vector_count: int - Number of vectors in the database
  • dimensions: int - Dimensionality of the vectors

Example:

stats = await client.get_index_stats()
print(f"Vector count: {stats.vector_count}, Dimensions: {stats.dimensions}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyvecdb-0.2.3.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinyvecdb-0.2.3-py3-none-win_amd64.whl (756.5 kB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file tinyvecdb-0.2.3.tar.gz.

File metadata

  • Download URL: tinyvecdb-0.2.3.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for tinyvecdb-0.2.3.tar.gz
Algorithm Hash digest
SHA256 4e5f5e9623b4f0cf3f818f143b7310ee44b81d23bf65b561ad4e3fc0246cfd97
MD5 5621d8eb8cc6a2bf28f2a5da0fbd9958
BLAKE2b-256 a5c4d6d655424e59fd0d05edd4bf29518eb2480787d60b9ef88ab3434822b939

See more details on using hashes here.

File details

Details for the file tinyvecdb-0.2.3-py3-none-win_amd64.whl.

File metadata

  • Download URL: tinyvecdb-0.2.3-py3-none-win_amd64.whl
  • Upload date:
  • Size: 756.5 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for tinyvecdb-0.2.3-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 1a8423de92dcf2a766b1b9a24b9003f0c5c206839cde32260006cb2bcb61e411
MD5 ce427e3a02834eb39ea7c48f193d0cf3
BLAKE2b-256 40cf1d65e28d76060744ad81afcb7f76d796648d6173fc3600171ea76e815ec0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page