Python client for VectorCache Go DB server
Reason this release was yanked:
cache server issue
Project description
VectorCache
VectorCache is a high-performance, in-memory vector database with gRPC and HTTP support. It allows storing, retrieving, and searching high-dimensional vector embeddings efficiently. The project provides a Go server and Python clients with type-safe APIs.
Features
- ✅ Vector storage and retrieval with user-defined IDs and metadata
- ✅ gRPC and HTTP clients for Python
- ✅ Multiple index types: Flat L2, Flat Inner Product (cosine similarity)
- ✅ Thread-safe in-memory database
- ✅ Graceful shutdown and logging
- ✅ Automatic server binary management for Python clients
Usage
Go Server
[!WARNING]
The Go server is not intended to be used for standalone purpose
go run cmd/main.go --port 8000 --indexType flatL2 --dim 3 --protocol grpc
| Flag | Description | Default |
|---|---|---|
--port |
Port to listen on | 8000 |
--indexType |
Type of index (flatL2, flatIP) |
flatL2 |
--dim |
Vector dimension | 3 |
--protocol |
Communication protocol (grpc, http) |
grpc |
Installation
Via PyPI (Recommended)
pip install vector-cache-memory
Quick Start
1. Standalone In-Memory Cache
Store and search vectors with minimal setup:
import asyncio
from vector_cache import VectorCache
async def main():
# Initialize cache (server starts automatically)
cache = VectorCache(dim=768) # 768-dimensional vectors
# Store embeddings with metadata
response = cache.set(
emb=[0.1, 0.2, 0.3, ...], # Your embedding
data={"title": "Hello World", "url": "https://example.com"}
)
print(f"Stored vector: {response.uid}")
# Search for similar vectors
results = await cache.search([0.1, 0.2, 0.3, ...], top_k=5)
for record in results:
print(f"Match: {record.data['title']} (score: {record.score:.3f})")
asyncio.run(main())
2. With Pinecone Integration
Use VectorCache as a high-speed cache layer in front of Pinecone:
import asyncio
from vector_cache import VectorCache, IndexType
async def main():
cache = VectorCache(
dim=1536, # OpenAI embedding dimension
indexType=IndexType.FLAT_IP, # Cosine-like similarity
cacheCapacity=10000, # Keep 10k vectors in RAM
cache_thresold=0.85, # Use cache if score >= 0.85
# Pinecone configuration
primarydb="pinecone",
api_key="your-pinecone-api-key",
index_name="your-index"
)
# Search: checks cache first, falls back to Pinecone
results = await cache.search([0.1, ...], top_k=5)
print(results)
asyncio.run(main())
3. Batch Operations (High Throughput)
import asyncio
from vector_cache import VectorCache
async def main():
cache = VectorCache(
protocol="grpc", # Faster than HTTP
cacheCapacity=50000,
)
# Batch search multiple queries in parallel
queries = [[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]]
tasks = [cache.search(q, top_k=5) for q in queries]
all_results = await asyncio.gather(*tasks)
for results in all_results:
print(f"Got {len(results)} matches")
asyncio.run(main())
Core Concepts
VectorCache Class
The main entry point. It manages:
- The local vector database server
- Client-side search/store operations
- Automatic lifecycle management (start on init, stop on exit)
from vector_cache import VectorCache, Protocol, IndexType, EvictionPolicy
cache = VectorCache(
port=6379, # Server port
dim=512, # Vector dimensionality
indexType=IndexType.FLAT_L2, # L2 distance or FLAT_IP for cosine
protocol=Protocol.GRPC, # gRPC (faster) or HTTP
eviction_policy=EvictionPolicy.FIFO, # Cache eviction strategy
cacheCapacity=1000, # Max vectors in memory
cache_thresold=0.8, # Use cache if similarity > 0.8
log="stdout", # Log output
)
RecordData
Returned by search(). Contains the matched vector and its metadata:
from vector_cache import RecordData
# structure:
# RecordData(
# uid: str, # Unique ID for this vector
# emb: List[float], # The stored embedding
# data: Dict[str, Any], # Your metadata
# score: float # Similarity score
# )
results = await cache.search(query, top_k=3)
for r in results:
print(f"ID: {r.uid}, Score: {r.score:.3f}, Data: {r.data}")
SetResponseData
Returned by set(). Indicates insertion status:
from vector_cache import SetResponseData
response = cache.set(emb, data)
# response.uid: the assigned ID (save for later reference)
# response.status: "success" or error message
if response and response.status == "success":
print(f"Stored with ID: {response.uid}")
Configuration Guide
Vector Dimensionality
Set dim to match your embeddings:
- 768: BERT, sentence-transformers
- 1536: OpenAI text-embedding-ada-002
- 512: Custom embeddings (default)
- 1024: Cohere, Jina embeddings
Index Type
Choose based on your embedding type:
| IndexType | Use Case | Similarity Metric |
|---|---|---|
FLAT_L2 |
Raw embeddings, image features | Euclidean distance |
FLAT_IP |
Normalized embeddings, cosine-like | Inner product |
# For normalized text embeddings (cosine similarity)
cache = VectorCache(indexType=IndexType.FLAT_IP)
# For raw feature vectors (L2 distance)
cache = VectorCache(indexType=IndexType.FLAT_L2)
Cache Capacity & Eviction
cache = VectorCache(
cacheCapacity=10000, # Keep up to 10k vectors
# Memory ≈ vectors × dim × 4 bytes
# 10k vectors × 768 dim × 4 = ~30 MB
eviction_policy=EvictionPolicy.FIFO, # Remove oldest when full
)
Protocol Selection
# gRPC (default, recommended for production)
cache = VectorCache(protocol=Protocol.GRPC) # Lower latency, binary
# HTTP (good for debugging, firewall-friendly)
cache = VectorCache(protocol=Protocol.HTTP) # Easy to inspect with curl
Integrations
Pinecone
Combine VectorCache's low-latency caching with Pinecone's durability:
cache = VectorCache(
dim=1536,
primarydb="pinecone",
api_key="pcn_...",
index_name="my-index",
cache_thresold=0.85, # Only use cache if score > 0.85
cacheCapacity=5000, # Keep recent/hot results cached
)
# Search workflow:
# 1. Query VectorCache in-memory cache
# 2. If cache miss or low score, query Pinecone
# 3. Asynchronously populate cache with Pinecone results
results = await cache.search(query, top_k=10)
Best Practices:
- Keep
cache_thresoldhigh (0.8+) to minimize stale data - Store lightweight metadata in cache; keep large blobs in external storage
- Monitor cache hit rate and adjust capacity based on workload
API Reference
VectorCache Methods
set(emb: List[float], data: Dict[str, Any]) -> SetResponseData
Store a vector in the cache.
response = cache.set(
emb=[0.1, 0.2, 0.3],
data={"id": "doc1", "title": "My Document"}
)
print(response.uid) # Unique ID for the vector
Parameters:
emb(List[float]): The embedding vector. Must match the configured dimension.data(Dict[str, Any]): Arbitrary metadata (optional, defaults to {}).
Returns: SetResponseData with uid (assigned ID) and status ("success" or error).
search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]
Search for similar vectors.
results = await cache.search(
emb=[0.1, 0.2, 0.3],
top_k=10,
namespace="documents", # For Pinecone filtering
filter={"category": "news"} # Metadata filter (Pinecone)
)
for record in results:
print(f"Score: {record.score:.3f}, Data: {record.data}")
Parameters:
emb(List[float]): Query vector.top_k(int, optional): Number of results. Default: 5.namespace(str, optional): Pinecone namespace filter. Default: "".filter(Optional[Dict], optional): Pinecone metadata filter. Default: None.
Returns: List of RecordData, sorted by similarity score (best first).
Performance Tips
Memory Optimization
cache = VectorCache(
cacheCapacity=5000, # Smaller cache = lower memory
dim=384, # Smaller dimension = lower memory
# Typical memory: 5000 × 384 × 4 bytes ≈ 7 MB
)
Latency Optimization
cache = VectorCache(
protocol=Protocol.GRPC, # Binary protocol is faster
cacheCapacity=50000, # Larger cache = fewer fallbacks to primary DB
cache_thresold=0.9, # More likely to use cached results
)
Batch Processing
# Parallel searches for high throughput
queries = [embedding1, embedding2, embedding3, ...]
results = await asyncio.gather(
*[cache.search(q, top_k=10) for q in queries]
)
Examples
Real-World: Document Search with Metadata
import asyncio
from vector_cache import VectorCache
async def main():
cache = VectorCache(
dim=768,
cacheCapacity=100000,
)
# Index documents
documents = [
{"id": "doc1", "title": "Python 101", "content": "..."},
{"id": "doc2", "title": "Go Guide", "content": "..."},
{"id": "doc3", "title": "Rust Book", "content": "..."},
]
# Simulate embeddings (in reality, use a model)
embeddings = [
[0.1, 0.2, 0.3, ...], # Python doc
[0.4, 0.5, 0.6, ...], # Go doc
[0.7, 0.8, 0.9, ...], # Rust doc
]
for doc, emb in zip(documents, embeddings):
cache.set(emb, {
"id": doc["id"],
"title": doc["title"],
"preview": doc["content"][:100]
})
# Search
query_embedding = [0.15, 0.25, 0.35, ...] # Query for Python docs
results = await cache.search(query_embedding, top_k=2)
for r in results:
print(f"Found: {r.data['title']} (similarity: {r.score:.2f})")
asyncio.run(main())
Advanced: Cache with Pinecone Backup
import asyncio
from vector_cache import VectorCache, IndexType
async def hybrid_search():
# Setup cache + Pinecone
cache = VectorCache(
dim=1536,
indexType=IndexType.FLAT_IP,
cacheCapacity=20000,
cache_thresold=0.8,
primarydb="pinecone",
api_key="your-api-key",
index_name="prod-index",
)
# Add to cache
response = cache.set(
[0.1, ...],
{"doc_id": "123", "updated": "2024-01-01"}
)
# Search: uses cache first, falls back to Pinecone
results = await cache.search([0.1, ...], top_k=20)
# Analyze results
cache_hits = sum(1 for r in results if r.score > 0.8)
print(f"Cache hits: {cache_hits}, Pinecone results: {len(results) - cache_hits}")
asyncio.run(hybrid_search())
Troubleshooting
Server Won't Start
Problem: "Failed to start VectorCache server"
Solution:
- Check if port is already in use:
lsof -i :6379 - Try a different port:
VectorCache(port=6380) - Verify binary is accessible:
echo $PATH
High Memory Usage
Problem: Cache is using too much RAM
Solution:
- Reduce
cacheCapacity - Use smaller embedding dimension if possible
- Enable log rotation for server logs
Slow Search
Problem: Searches are taking >100ms
Solution:
- Use
protocol=Protocol.GRPC(faster than HTTP) - Increase
cacheCapacityto improve hit rate - Profile query embedding dimensions
Made with ❤️ by the Abhishek Maurya & Kinjal Raykarmakar
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vector_cache_memory-0.1.7.tar.gz.
File metadata
- Download URL: vector_cache_memory-0.1.7.tar.gz
- Upload date:
- Size: 32.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04e3aa4b4dbd5389719b661fb8dcfd6e0053cececd59b3730bff662016abf28f
|
|
| MD5 |
79d4f0dee9827629e007576cd147f514
|
|
| BLAKE2b-256 |
a55f024537613823194aa5a7eeb51d2da797b8437d618cb98407cfebdeb8e7b1
|
File details
Details for the file vector_cache_memory-0.1.7-py3-none-any.whl.
File metadata
- Download URL: vector_cache_memory-0.1.7-py3-none-any.whl
- Upload date:
- Size: 32.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96ff2f6c5de3c20934137cbf743a0444b70e5e4461ee865a42273728db6f08c3
|
|
| MD5 |
239bb58af3b54a203f57511756cdc61a
|
|
| BLAKE2b-256 |
9b7f6f047bd85015085773ab63b35465939430bcb0acc978a3e56e8f65c454ed
|