TinyVecDB is a high performance, lightweight, embedded vector database for similarity search.
Project description
TinyVecDB Python API Documentation
This document provides a comprehensive overview of the TinyVecDB Python API.
Table of Contents
Installation
pip install tinyvecdb
Core Concepts
TinyVecDB is an embedded vector database that emphasizes speed, low memory usage, and simplicity. The core of TinyVecDB is written in C, and this library provides a Python binding to that engine. The key concepts are:
- Embeddings: Fixed-dimension float vectors (e.g., 512 dimensions)
- Metadata: JSON-serializable data associated with each vector
- Similarity Search: Finding the nearest neighbors to a query vector using cosine similarity
- Filtering: Query vectors based on metadata attributes
Basic Usage
import asyncio
import numpy as np
import tinyvec
async def example():
# Connect to database (will create the file if it doesn't exist)
client = tinyvec.TinyVecClient()
config = tinyvec.ClientConfig(dimensions=512)
client.connect("./vectors.db", config)
# Create sample vectors
insertions = []
for i in range(50):
# Using NumPy (more efficient)
vec = np.random.rand(512).astype(np.float32)
vec = vec / np.linalg.norm(vec) # Normalize the vector
# Or using standard Python lists
# vec = [random.random() for _ in range(512)]
insertions.append(tinyvec.Insertion(
vector=vec,
metadata={"name": f"item-{i}", "category": "example"}
))
# Insert vectors
inserted = await client.insert(insertions)
print("Inserted:", inserted)
# Search for similar vectors (without filtering)
query_vec = np.random.rand(512).astype(np.float32)
results = await client.search(query_vec, 5)
# Example results:
# [SearchResult(similarity=0.801587700843811, id=8, metadata={'category': 'example', 'name': 'item-8'}),
# SearchResult(similarity=0.7834401726722717, id=16, metadata={'category': 'example', 'name': 'item-16'}),
# SearchResult(similarity=0.7815409898757935, id=5, metadata={'category': 'example', 'name': 'item-5'})]
# Search with filtering
search_options = tinyvec.SearchOptions(
filter={"category": {"$eq": "example"}}
)
filtered_results = await client.search(query_vec, 5, search_options)
# Delete items by ID
delete_result = await client.delete_by_ids([1, 2, 3])
print(f"Deleted {delete_result.deleted_count} vectors. Success: {delete_result.success}")
# Delete by metadata filter
filter_result = await client.delete_by_filter(search_options)
print(f"Deleted {filter_result.deleted_count} vectors by filter. Success: {filter_result.success}")
# Get database statistics
stats = await client.get_index_stats()
print(f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions")
if __name__ == "__main__":
asyncio.run(example())
API Reference
TinyVecClient
The main class you'll interact with is TinyVecClient. It provides all methods for managing the vector database.
Constructor and Connection
TinyVecClient()
Creates a new TinyVecDB client instance.
Example:
client = tinyvec.TinyVecClient()
connect(path, config)
Connects to a TinyVecDB database.
Parameters:
path:str- Path to the database fileconfig:ClientConfig- Configuration options
Example:
config = tinyvec.ClientConfig(dimensions=512)
client.connect("./vectors.db", config)
Instance Methods
async insert(vectors)
Inserts vectors with metadata into the database. Each metadata item must be a JSON-serializable object.
Parameters:
vectors:List[Insertion]- List of vectors to insert
Returns:
int- The number of vectors successfully inserted
Example:
vector = np.zeros(512, dtype=np.float32) + 0.1
count = await client.insert([
tinyvec.Insertion(
vector=vector,
metadata={"document_id": "doc1", "title": "Example Document", "category": "reference"}
)
])
# Example: count = 1
async search(query_vector, top_k, search_options=None)
Searches for the most similar vectors to the query vector.
Parameters:
-
query_vector:Union[List[float], np.ndarray]
A query vector to search for, which can be any of the following types:- Python list of numbers
- NumPy array (any numeric dtype)
Internally, it will be converted to a float32 array for similarity calculations.
-
top_k:int- Number of results to return -
search_options:SearchOptions- Optional. Contains filter criteria for the search.
Returns:
List[SearchResult]- List of search results
Example:
# Search without filtering
results = await client.search(query_vector, 10)
# Example results:
# [SearchResult(similarity=0.801587700843811, id=8, metadata={'id': 8}),
# SearchResult(similarity=0.7834401726722717, id=16, metadata={'id': 16}),
# SearchResult(similarity=0.7815409898757935, id=5, metadata={'id': 5})]
# Search with filtering
search_options = tinyvec.SearchOptions(
filter={"year": {"$eq": 2024}}
)
filtered_results = await client.search(query_vector, 10, search_options)
async delete_by_ids(ids)
Deletes vectors by their IDs.
Parameters:
ids:List[int]- List of vector IDs to delete
Returns:
DeletionResult- Object containing deletion count and success status
Example:
result = await client.delete_by_ids([1, 2, 3])
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
# Example output: Deleted 3 vectors. Success: True
async delete_by_filter(search_options)
Deletes vectors that match the given filter criteria.
Parameters:
search_options:SearchOptions- Contains filter criteria for deletion
Returns:
DeletionResult- Object containing deletion count and success status
Example:
search_options = tinyvec.SearchOptions(
filter={"year": {"$eq": 2024}}
)
result = await client.delete_by_filter(search_options)
print(f"Deleted {result.deleted_count} vectors. Success: {result.success}")
async get_index_stats()
Retrieves statistics about the database.
Parameters:
- None
Returns:
Stats- Database statistics
Example:
stats = await client.get_index_stats()
print(
f"Database has {stats.vector_count} vectors with {stats.dimensions} dimensions"
)
# Example output: Database has 47 vectors with 512 dimensions
Supporting Classes
DeletionResult
Result from delete operations.
Properties:
deleted_count:int- Number of vectors deletedsuccess:bool- Whether the operation was successful
Example:
result = await client.delete_by_ids([1, 2, 3])
if result.success:
print(f"Successfully deleted {result.deleted_count} vectors")
else:
print("Deletion operation failed")
# Example output: Successfully deleted 3 vectors
ClientConfig
Configuration for the vector database.
Parameters:
dimensions:int- The dimensionality of vectors to be stored
Example:
config = tinyvec.ClientConfig(dimensions=512)
Insertion
Class representing a vector to be inserted.
Parameters:
vector:Union[List[float], np.ndarray]- The vector datametadata:Dict- JSON-serializable metadata associated with the vector
Example:
insertion = tinyvec.Insertion(
vector=np.random.rand(512).astype(np.float32),
metadata={"category": "example"}
)
SearchOptions
Options for search queries, including filtering.
Parameters:
filter:Dict- Filter criteria in MongoDB-like query syntax
Available filter operators:
$eq: Matches values equal to a specified value$gt: Matches values greater than a specified value$gte: Matches values greater than or equal to a specified value$in: Matches any values specified in an array$lt: Matches values less than a specified value$lte: Matches values less than or equal to a specified value$ne: Matches values not equal to a specified value$nin: Matches none of the values specified in an array
Filters can be nested for complex queries.
Example:
# Simple filter
search_options = tinyvec.SearchOptions(
filter={"make": {"$eq": "Toyota"}}
)
# Complex nested filter
search_options = tinyvec.SearchOptions(
filter={
"category": {
"subcategory": {
"value": {"$gt": 200}
}
},
"tags": {"$in": ["premium", "featured"]}
}
)
SearchResult
Class representing a search result.
Properties:
similarity:float- Cosine similarity scoreid:int- ID of the matched vectormetadata:Dict | None- Metadata associated with the matched vector
Example:
# Results from a search query
for result in results:
print(f"ID: {result.id}, Similarity: {result.similarity}, Metadata: {result.metadata}")
# Example output:
# ID: 34, Similarity: 0.8109967303276062, Metadata: {'category': 'example', 'name': 'item-34'}
# ID: 46, Similarity: 0.789353609085083, Metadata: {'category': 'example', 'name': 'item-46'}
# ID: 22, Similarity: 0.7870827913284302, Metadata: {'category': 'example', 'name': 'item-22'}
Stats
Class representing database statistics.
Properties:
vector_count:int- Number of vectors in the databasedimensions:int- Dimensionality of the vectors
Example:
stats = await client.get_index_stats()
print(f"Vector count: {stats.vector_count}, Dimensions: {stats.dimensions}")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinyvecdb-0.2.2.tar.gz.
File metadata
- Download URL: tinyvecdb-0.2.2.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79ef34d1dc711c2e4b9f0a7666984188daae2b99e3ab809cd5f6796b9b361b38
|
|
| MD5 |
ccd3485da80ee1e96f6f69c64f1ffd9f
|
|
| BLAKE2b-256 |
95a8ea3a319993744b2d88c584d0e05eb9e7217d16734ad8b8aae41421f07326
|
File details
Details for the file tinyvecdb-0.2.2-py3-none-win_amd64.whl.
File metadata
- Download URL: tinyvecdb-0.2.2-py3-none-win_amd64.whl
- Upload date:
- Size: 754.6 kB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
903259b7317d3220a8809f0bdadbcc96ea14831c5af6c27ffa43f0036eda8f32
|
|
| MD5 |
90c1beba8c1828cfb22df875171a5111
|
|
| BLAKE2b-256 |
fe9df9c57cd36b574e75abbf665018bec6f9e8270f3dc8a144776e6d5edff85d
|