Skip to main content

TinyVecDB is a high performance, lightweight, embedded vector database for similarity search.

Project description

TinyVecDB Python API Documentation

This document provides a comprehensive overview of the TinyVecDB Python API.

Table of Contents

Installation

pip install tinyvecdb

Core Concepts

TinyVecDB is an embedded vector database that emphasizes speed, low memory usage, and simplicity. The core of TinyVecDB is written in C, and this library provides a Python binding to that engine. The key concepts are:

  • Embeddings: Fixed-dimension float vectors (e.g., 512 dimensions)
  • Metadata: JSON-serializable data associated with each vector
  • Similarity Search: Finding the nearest neighbors to a query vector using cosine similarity
  • Filtering: Query vectors based on metadata attributes

Basic Usage

import asyncio
import numpy as np
import tinyvec

async def example():
    # Connect to database (will create the file if it doesn't exist)
    client = tinyvec.TinyVecClient()
    config = tinyvec.ClientConfig(dimensions=512)
    client.connect("./vectors.db", config)

    # Create sample vectors
    insertions = []
    for i in range(50):
        # Using NumPy (more efficient)
        vec = np.random.rand(512).astype(np.float32)
        vec = vec / np.linalg.norm(vec)  # Normalize the vector

        # Or using standard Python lists
        # vec = [random.random() for _ in range(512)]

        insertions.append(tinyvec.Insertion(
            vector=vec,
            metadata={"name": f"item-{i}", "category": "example"}
        ))

    # Insert vectors
    inserted = await client.insert(insertions)
    print("Inserted:", inserted)

    # Search for similar vectors (without filtering)
    query_vec = np.random.rand(512).astype(np.float32)
    results = await client.search(query_vec, 5)
    # Example results:
    # [SearchResult(similarity=0.801587700843811, id=8, metadata={'category': 'example', 'name': 'item-8'}),
    #  SearchResult(similarity=0.7834401726722717, id=16, metadata={'category': 'example', 'name': 'item-16'}),
    #  SearchResult(similarity=0.7815409898757935, id=5, metadata={'category': 'example', 'name': 'item-5'})]

    # Search with filtering
    search_options = tinyvec.SearchOptions(
        filter={"category": {"$eq": "example"}}
    )
    filtered_results = await client.search(query_vec, 5, search_options)

    # Delete items by ID
    delete_result = await client.delete_by_ids([1, 2, 3])
    print(f"Deleted {delete_result.deleted_count} vectors. Success: {delete_result.success}")

    # Delete by metadata filter
    filter_result = await client.delete_by_filter(search_options)
    print(f"Deleted {filter_result.deleted_count} vectors by filter. Success: {filter_result.success}")

    # Get database statistics
    stats = await client.get_index_stats()
    print(f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions")

if __name__ == "__main__":
    asyncio.run(example())

API Reference

TinyVecClient

The main class you'll interact with is TinyVecClient. It provides all methods for managing the vector database.

Constructor and Connection

TinyVecClient()

Creates a new TinyVecDB client instance.

Example:

client = tinyvec.TinyVecClient()
connect(path, config)

Connects to a TinyVecDB database.

Parameters:

  • path: str - Path to the database file
  • config: ClientConfig - Configuration options

Example:

config = tinyvec.ClientConfig(dimensions=512)
client.connect("./vectors.db", config)

Instance Methods

async insert(vectors)

Inserts vectors with metadata into the database. Each metadata item must be a JSON-serializable object.

Parameters:

  • vectors: List[Insertion] - List of vectors to insert

Returns:

  • int - The number of vectors successfully inserted

Example:

vector = np.zeros(512, dtype=np.float32) + 0.1
count = await client.insert([
  tinyvec.Insertion(
    vector=vector,
    metadata={"document_id": "doc1", "title": "Example Document", "category": "reference"}
  )
])
# Example: count = 1
async search(query_vector, top_k, search_options=None)

Searches for the most similar vectors to the query vector.

Parameters:

  • query_vector: Union[List[float], np.ndarray]
    A query vector to search for, which can be any of the following types:

    • Python list of numbers
    • NumPy array (any numeric dtype)

    Internally, it will be converted to a float32 array for similarity calculations.

  • top_k: int - Number of results to return

  • search_options: SearchOptions - Optional. Contains filter criteria for the search.

Returns:

  • List[SearchResult] - List of search results

Example:

# Search without filtering
results = await client.search(query_vector, 10)
# Example results:
# [SearchResult(similarity=0.801587700843811, id=8, metadata={'id': 8}),
#  SearchResult(similarity=0.7834401726722717, id=16, metadata={'id': 16}),
#  SearchResult(similarity=0.7815409898757935, id=5, metadata={'id': 5})]

# Search with filtering
search_options = tinyvec.SearchOptions(
    filter={"year": {"$eq": 2024}}
)
filtered_results = await client.search(query_vector, 10, search_options)
async delete_by_ids(ids)

Deletes vectors by their IDs.

Parameters:

  • ids: List[int] - List of vector IDs to delete

Returns:

  • DeletionResult - Object containing deletion count and success status

Example:

result = await client.delete_by_ids([1, 2, 3])
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
# Example output: Deleted 3 vectors. Success: True
async delete_by_filter(search_options)

Deletes vectors that match the given filter criteria.

Parameters:

  • search_options: SearchOptions - Contains filter criteria for deletion

Returns:

  • DeletionResult - Object containing deletion count and success status

Example:

search_options = tinyvec.SearchOptions(
    filter={"year": {"$eq": 2024}}
)
result = await client.delete_by_filter(search_options)
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
async get_index_stats()

Retrieves statistics about the database.

Parameters:

  • None

Returns:

  • Stats - Database statistics

Example:

stats = await client.get_index_stats()
print(
  f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions"
)
# Example output: Database has 47 vectors with 512 dimensions

Supporting Classes

DeletionResult

Result from delete operations.

Properties:

  • deleted_count: int - Number of vectors deleted
  • success: bool - Whether the operation was successful

Example:

result = await client.delete_by_ids([1, 2, 3])
if result.success:
    print(f"Successfully deleted {result.deleted_count} vectors")
else:
    print("Deletion operation failed")
# Example output: Successfully deleted 3 vectors

ClientConfig

Configuration for the vector database.

Parameters:

  • dimensions: int - The dimensionality of vectors to be stored

Example:

config = tinyvec.ClientConfig(dimensions=512)

Insertion

Class representing a vector to be inserted.

Parameters:

  • vector: Union[List[float], np.ndarray] - The vector data
  • metadata: Dict - JSON-serializable metadata associated with the vector

Example:

insertion = tinyvec.Insertion(
    vector=np.random.rand(512).astype(np.float32),
    metadata={"category": "example"}
)

SearchOptions

Options for search queries, including filtering.

Parameters:

  • filter: Dict - Filter criteria in MongoDB-like query syntax

Available filter operators:

  • $eq: Matches values equal to a specified value
  • $gt: Matches values greater than a specified value
  • $gte: Matches values greater than or equal to a specified value
  • $in: Matches any values specified in an array
  • $lt: Matches values less than a specified value
  • $lte: Matches values less than or equal to a specified value
  • $ne: Matches values not equal to a specified value
  • $nin: Matches none of the values specified in an array

Filters can be nested for complex queries.

Example:

# Simple filter
search_options = tinyvec.SearchOptions(
    filter={"make": {"$eq": "Toyota"}}
)

# Complex nested filter
search_options = tinyvec.SearchOptions(
    filter={
        "category": {
            "subcategory": {
                "value": {"$gt": 200}
            }
        },
        "tags": {"$in": ["premium", "featured"]}
    }
)

SearchResult

Class representing a search result.

Properties:

  • similarity: float - Cosine similarity score
  • id: int - ID of the matched vector
  • metadata: Dict | None - Metadata associated with the matched vector

Example:

# Results from a search query
for result in results:
    print(f"ID: {result.id}, Similarity: {result.similarity}, Metadata: {result.metadata}")
# Example output:
# ID: 34, Similarity: 0.8109967303276062, Metadata: {'category': 'example', 'name': 'item-34'}
# ID: 46, Similarity: 0.789353609085083, Metadata: {'category': 'example', 'name': 'item-46'}
# ID: 22, Similarity: 0.7870827913284302, Metadata: {'category': 'example', 'name': 'item-22'}

Stats

Class representing database statistics.

Properties:

  • vector_count: int - Number of vectors in the database
  • dimensions: int - Dimensionality of the vectors

Example:

stats = await client.get_index_stats()
print(f"Vector count: {stats.vector_count}, Dimensions: {stats.dimensions}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyvecdb-0.2.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinyvecdb-0.2.0-py3-none-win_amd64.whl (754.5 kB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file tinyvecdb-0.2.0.tar.gz.

File metadata

  • Download URL: tinyvecdb-0.2.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for tinyvecdb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d7ae93682b8ffa51a7705fd40da6ec2c0ca491dc95bba92f7e8fdf851a729cfd
MD5 2e774250ee911823bbdf7cdeba7d161b
BLAKE2b-256 8f55abf47d8acccf3af1304fcdb485f7daaf9524b5fddf6c764d20340d9058b6

See more details on using hashes here.

File details

Details for the file tinyvecdb-0.2.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: tinyvecdb-0.2.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 754.5 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for tinyvecdb-0.2.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 ee73bde9fe415721929040bcdcd5499bcce30f7b9f030f07be718c4c9c046a07
MD5 8a6cad87e79c9bf41856c1c9139a7761
BLAKE2b-256 f40874500a96a4777d7edc6f060d03ccf90102a25697e29a28d8a857847cd168

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page