High-performance vector database library for Python with multiple index types and metadata support
Project description
GigaVector
A high-performance vector database library for Python. GigaVector provides efficient similarity search with support for multiple index types, metadata filtering, and persistent storage.
Features
- Multiple index types: KD-tree, HNSW, and IVFPQ
- Distance metrics: Euclidean and Cosine similarity
- Rich metadata support with key-value pairs
- Metadata filtering in search queriesu
- Persistent storage with snapshot and WAL (Write-Ahead Log)
- Batch operations for vector insertion and search
- Thread-safe operations
Installation
Install from PyPI:
pip install gigavector
The package includes pre-built native libraries for supported platforms. No external dependencies required.
Quick Start
from gigavector import Database, DistanceType, IndexType
# Create an in-memory database
with Database.open(None, dimension=128, index=IndexType.HNSW) as db:
# Add vectors with metadata
db.add_vector([0.1] * 128, metadata={"id": "vec1", "category": "A"})
db.add_vector([0.2] * 128, metadata={"id": "vec2", "category": "B"})
# Search for similar vectors
hits = db.search([0.1] * 128, k=5, distance=DistanceType.EUCLIDEAN)
for hit in hits:
print(f"Distance: {hit.distance}, Metadata: {hit.vector.metadata}")
API Reference
Database
The main class for vector database operations.
Database.open(path, dimension, index=IndexType.KDTREE)
Create or open a database instance.
Parameters:
path(str | None): File path for persistent storage. UseNonefor in-memory database.dimension(int): Vector dimension (must be consistent for all vectors).index(IndexType): Index type to use. Defaults toIndexType.KDTREE.
Returns: Database instance
Example:
# In-memory database
db = Database.open(None, dimension=128, index=IndexType.HNSW)
# Persistent database
db = Database.open("vectors.db", dimension=128, index=IndexType.KDTREE)
add_vector(vector, metadata=None)
Add a single vector to the database.
Parameters:
vector(Sequence[float]): Vector data as a sequence of floats. Length must match database dimension.metadata(dict[str, str] | None): Optional dictionary of key-value metadata pairs.
Raises:
ValueError: If vector dimension doesn't match database dimension.RuntimeError: If insertion fails.
Example:
# Vector without metadata
db.add_vector([1.0, 2.0, 3.0])
# Vector with single metadata entry
db.add_vector([1.0, 2.0, 3.0], metadata={"id": "123"})
# Vector with multiple metadata entries
db.add_vector([1.0, 2.0, 3.0], metadata={
"id": "123",
"category": "electronics",
"price": "99.99"
})
add_vectors(vectors)
Add multiple vectors to the database in batch. Vectors added via this method cannot include metadata.
Parameters:
vectors(Iterable[Sequence[float]]): Iterable of vectors. All vectors must have the same dimension.
Raises:
ValueError: If vectors have inconsistent dimensions.RuntimeError: If batch insertion fails.
Example:
vectors = [
[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]
]
db.add_vectors(vectors)
search(query, k, distance=DistanceType.EUCLIDEAN, filter_metadata=None)
Search for k nearest neighbors to a query vector.
Parameters:
query(Sequence[float]): Query vector. Length must match database dimension.k(int): Number of nearest neighbors to return.distance(DistanceType): Distance metric to use. Defaults toDistanceType.EUCLIDEAN.filter_metadata(tuple[str, str] | None): Optional metadata filter as (key, value) tuple. Only vectors matching the filter are considered.
Returns: list[SearchHit] - List of search results, ordered by distance (ascending).
Raises:
ValueError: If query dimension doesn't match database dimension.RuntimeError: If search fails.
Example:
# Basic search
hits = db.search([1.0, 2.0, 3.0], k=5, distance=DistanceType.EUCLIDEAN)
# Search with metadata filter
hits = db.search(
[1.0, 2.0, 3.0],
k=5,
distance=DistanceType.EUCLIDEAN,
filter_metadata=("category", "electronics")
)
search_batch(queries, k, distance=DistanceType.EUCLIDEAN)
Search for k nearest neighbors for multiple query vectors in batch.
Parameters:
queries(Iterable[Sequence[float]]): Iterable of query vectors.k(int): Number of nearest neighbors to return per query.distance(DistanceType): Distance metric to use. Defaults toDistanceType.EUCLIDEAN.
Returns: list[list[SearchHit]] - List of search result lists, one per query.
Raises:
ValueError: If any query dimension doesn't match database dimension.RuntimeError: If batch search fails.
Example:
queries = [
[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]
]
results = db.search_batch(queries, k=5)
for i, hits in enumerate(results):
print(f"Query {i}: {len(hits)} results")
save(path=None)
Persist the database to a binary snapshot file. If a file path was provided when opening the database, writes to that path. Otherwise, use the provided path.
Parameters:
path(str | None): Optional file path. If None and database was opened with a path, uses that path.
Raises:
RuntimeError: If save operation fails.
Example:
# Save to the path used when opening
db.save()
# Save to a different path
db.save("backup.db")
train_ivfpq(data)
Train the IVFPQ index with training vectors. Only applicable when using IndexType.IVFPQ.
Parameters:
data(Sequence[Sequence[float]]): Training vectors. All vectors must match the database dimension.
Raises:
ValueError: If training data is empty or dimensions don't match.RuntimeError: If training fails.
Example:
# Train with at least 256 vectors (recommended)
train_data = [[(i % 10) / 10.0 for _ in range(128)] for i in range(256)]
db.train_ivfpq(train_data)
close()
Close the database and release resources. Automatically called when using the context manager.
Example:
db = Database.open(None, dimension=128)
# ... use database ...
db.close()
IndexType
Enumeration of available index types.
IndexType.KDTREE: KD-tree index. Good for low to medium dimensional data.IndexType.HNSW: Hierarchical Navigable Small World graph. Good for high-dimensional data with fast approximate search.IndexType.IVFPQ: Inverted File with Product Quantization. Memory-efficient for large-scale datasets. Requires training before use.
DistanceType
Enumeration of distance metrics.
DistanceType.EUCLIDEAN: Euclidean (L2) distance.DistanceType.COSINE: Cosine similarity distance.
Vector
Data class representing a vector with metadata.
Attributes:
data(list[float]): Vector data.metadata(dict[str, str]): Dictionary of metadata key-value pairs.
SearchHit
Data class representing a search result.
Attributes:
distance(float): Distance from the query vector.vector(Vector): The matched vector with its metadata.
Usage Examples
Persistent Storage with WAL
from gigavector import Database, IndexType, DistanceType
# Create a persistent database
with Database.open("vectors.db", dimension=128, index=IndexType.KDTREE) as db:
db.add_vector([0.1] * 128, metadata={"id": "1", "tag": "A"})
db.add_vector([0.2] * 128, metadata={"id": "2", "tag": "B"})
db.save() # Create snapshot
# Reopen - WAL automatically replays any uncommitted changes
with Database.open("vectors.db", dimension=128, index=IndexType.KDTREE) as db:
hits = db.search([0.1] * 128, k=5)
# All vectors are restored, including metadata
IVFPQ Index with Training
from gigavector import Database, IndexType, DistanceType
import random
# Create IVFPQ database
db = Database.open(None, dimension=64, index=IndexType.IVFPQ)
# Generate training data (at least 256 vectors recommended)
train_data = [
[random.random() for _ in range(64)]
for _ in range(256)
]
db.train_ivfpq(train_data)
# Add vectors
with db:
for i in range(1000):
vec = [random.random() for _ in range(64)]
db.add_vector(vec, metadata={"id": str(i)})
# Search
query = [random.random() for _ in range(64)]
hits = db.search(query, k=10, distance=DistanceType.EUCLIDEAN)
Metadata Filtering
from gigavector import Database, IndexType, DistanceType
with Database.open(None, dimension=128, index=IndexType.HNSW) as db:
# Add vectors with different categories
db.add_vector([0.1] * 128, metadata={"category": "A", "price": "10"})
db.add_vector([0.2] * 128, metadata={"category": "B", "price": "20"})
db.add_vector([0.15] * 128, metadata={"category": "A", "price": "15"})
# Search only in category A
hits = db.search(
[0.1] * 128,
k=10,
distance=DistanceType.EUCLIDEAN,
filter_metadata=("category", "A")
)
# Returns only vectors with category="A"
Batch Operations
from gigavector import Database, IndexType, DistanceType
with Database.open(None, dimension=128, index=IndexType.KDTREE) as db:
# Batch insert vectors (without metadata)
vectors = [[i * 0.01] * 128 for i in range(1000)]
db.add_vectors(vectors)
# Batch search
queries = [[i * 0.01] * 128 for i in range(10)]
results = db.search_batch(queries, k=5)
for i, hits in enumerate(results):
print(f"Query {i}: {len(hits)} results")
Requirements
- Python 3.9 or higher
- cffi >= 1.16
License
Licensed under the DBaJ-NC-CFL License. See LICENCE.md for details.
Links
- GitHub Repository: https://github.com/jaywyawhare/GigaVector
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gigavector-0.5.0.tar.gz.
File metadata
- Download URL: gigavector-0.5.0.tar.gz
- Upload date:
- Size: 418.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc954562f0d20920bd987defeb21147297bc72905e45fc1aad4444570da4e67
|
|
| MD5 |
a98b8bb12a59cd2a82cf91f7bcd2ef9e
|
|
| BLAKE2b-256 |
d1360840de97f26d16a0977935873389eb30fcd12bf8d31f2f5affb349eadf7c
|
Provenance
The following attestation bundles were made for gigavector-0.5.0.tar.gz:
Publisher:
release.yml on jaywyawhare/GigaVector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gigavector-0.5.0.tar.gz -
Subject digest:
1fc954562f0d20920bd987defeb21147297bc72905e45fc1aad4444570da4e67 - Sigstore transparency entry: 801678792
- Sigstore integration time:
-
Permalink:
jaywyawhare/GigaVector@ebdffccb1b64577f184a4f811a40cac93d5be3f3 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/jaywyawhare
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ebdffccb1b64577f184a4f811a40cac93d5be3f3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gigavector-0.5.0-py3-none-any.whl.
File metadata
- Download URL: gigavector-0.5.0-py3-none-any.whl
- Upload date:
- Size: 420.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc4914d34d02cc65cd474cd017a7092163b6435b5d18450ccc320fc4701a0834
|
|
| MD5 |
7b1094e6c8c66d1ad5824a3fec1c03fc
|
|
| BLAKE2b-256 |
0eec6140333c0e94939b0ba167d5b533cf1134278d693cc85f9d8b3cc5f88183
|
Provenance
The following attestation bundles were made for gigavector-0.5.0-py3-none-any.whl:
Publisher:
release.yml on jaywyawhare/GigaVector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gigavector-0.5.0-py3-none-any.whl -
Subject digest:
fc4914d34d02cc65cd474cd017a7092163b6435b5d18450ccc320fc4701a0834 - Sigstore transparency entry: 801678866
- Sigstore integration time:
-
Permalink:
jaywyawhare/GigaVector@ebdffccb1b64577f184a4f811a40cac93d5be3f3 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/jaywyawhare
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ebdffccb1b64577f184a4f811a40cac93d5be3f3 -
Trigger Event:
release
-
Statement type: