Skip to main content

Python client for VectorCache Go DB server

Reason this release was yanked:

cache server issue

Project description

VectorCache

VectorCache is a high-performance, in-memory vector database with gRPC and HTTP support. It allows storing, retrieving, and searching high-dimensional vector embeddings efficiently. The project provides a Go server and Python clients with type-safe APIs.

Features

  • ✅ Vector storage and retrieval with user-defined IDs and metadata
  • ✅ gRPC and HTTP clients for Python
  • ✅ Multiple index types: Flat L2, Flat Inner Product (cosine similarity)
  • ✅ Thread-safe in-memory database
  • ✅ Graceful shutdown and logging
  • ✅ Automatic server binary management for Python clients

Usage

Go Server

[!WARNING]
The Go server is not intended to be used for standalone purpose

go run cmd/main.go --port 8000 --indexType flatL2 --dim 3 --protocol grpc
Flag Description Default
--port Port to listen on 8000
--indexType Type of index (flatL2, flatIP) flatL2
--dim Vector dimension 3
--protocol Communication protocol (grpc, http) grpc

Installation

Via PyPI (Recommended)

pip install vector-cache-memory

Quick Start

1. Standalone In-Memory Cache

Store and search vectors with minimal setup:

import asyncio
from vector_cache import VectorCache

async def main():
    # Initialize cache (server starts automatically)
    cache = VectorCache(dim=768)  # 768-dimensional vectors
    
    # Store embeddings with metadata
    response = cache.set(
        emb=[0.1, 0.2, 0.3, ...],  # Your embedding
        data={"title": "Hello World", "url": "https://example.com"}
    )
    print(f"Stored vector: {response.uid}")
    
    # Search for similar vectors
    results = await cache.search([0.1, 0.2, 0.3, ...], top_k=5)
    
    for record in results:
        print(f"Match: {record.data['title']} (score: {record.score:.3f})")

asyncio.run(main())

2. With Pinecone Integration

Use VectorCache as a high-speed cache layer in front of Pinecone:

import asyncio
from vector_cache import VectorCache, IndexType

async def main():
    cache = VectorCache(
        dim=1536,  # OpenAI embedding dimension
        indexType=IndexType.FLAT_IP,  # Cosine-like similarity
        cacheCapacity=10000,  # Keep 10k vectors in RAM
        cache_thresold=0.85,  # Use cache if score >= 0.85
        
        # Pinecone configuration
        primarydb="pinecone",
        api_key="your-pinecone-api-key",
        index_name="your-index"
    )
    
    # Search: checks cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=5)
    print(results)

asyncio.run(main())

3. Batch Operations (High Throughput)

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        protocol="grpc",  # Faster than HTTP
        cacheCapacity=50000,
    )
    
    # Batch search multiple queries in parallel
    queries = [[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]]
    tasks = [cache.search(q, top_k=5) for q in queries]
    all_results = await asyncio.gather(*tasks)
    
    for results in all_results:
        print(f"Got {len(results)} matches")

asyncio.run(main())

Core Concepts

VectorCache Class

The main entry point. It manages:

  • The local vector database server
  • Client-side search/store operations
  • Automatic lifecycle management (start on init, stop on exit)
from vector_cache import VectorCache, Protocol, IndexType, EvictionPolicy

cache = VectorCache(
    port=6379,                          # Server port
    dim=512,                            # Vector dimensionality
    indexType=IndexType.FLAT_L2,        # L2 distance or FLAT_IP for cosine
    protocol=Protocol.GRPC,             # gRPC (faster) or HTTP
    eviction_policy=EvictionPolicy.FIFO,  # Cache eviction strategy
    cacheCapacity=1000,                 # Max vectors in memory
    cache_thresold=0.8,                 # Use cache if similarity > 0.8
    log="stdout",                       # Log output
)

RecordData

Returned by search(). Contains the matched vector and its metadata:

from vector_cache import RecordData

# structure:
# RecordData(
#   uid: str,              # Unique ID for this vector
#   emb: List[float],      # The stored embedding
#   data: Dict[str, Any],  # Your metadata
#   score: float           # Similarity score
# )

results = await cache.search(query, top_k=3)
for r in results:
    print(f"ID: {r.uid}, Score: {r.score:.3f}, Data: {r.data}")

SetResponseData

Returned by set(). Indicates insertion status:

from vector_cache import SetResponseData

response = cache.set(emb, data)
# response.uid: the assigned ID (save for later reference)
# response.status: "success" or error message

if response and response.status == "success":
    print(f"Stored with ID: {response.uid}")

Configuration Guide

Vector Dimensionality

Set dim to match your embeddings:

  • 768: BERT, sentence-transformers
  • 1536: OpenAI text-embedding-ada-002
  • 512: Custom embeddings (default)
  • 1024: Cohere, Jina embeddings

Index Type

Choose based on your embedding type:

IndexType Use Case Similarity Metric
FLAT_L2 Raw embeddings, image features Euclidean distance
FLAT_IP Normalized embeddings, cosine-like Inner product
# For normalized text embeddings (cosine similarity)
cache = VectorCache(indexType=IndexType.FLAT_IP)

# For raw feature vectors (L2 distance)
cache = VectorCache(indexType=IndexType.FLAT_L2)

Cache Capacity & Eviction

cache = VectorCache(
    cacheCapacity=10000,        # Keep up to 10k vectors
    # Memory ≈ vectors × dim × 4 bytes
    # 10k vectors × 768 dim × 4 = ~30 MB
    eviction_policy=EvictionPolicy.FIFO,  # Remove oldest when full
)

Protocol Selection

# gRPC (default, recommended for production)
cache = VectorCache(protocol=Protocol.GRPC)  # Lower latency, binary

# HTTP (good for debugging, firewall-friendly)
cache = VectorCache(protocol=Protocol.HTTP)  # Easy to inspect with curl

Integrations

Pinecone

Combine VectorCache's low-latency caching with Pinecone's durability:

cache = VectorCache(
    dim=1536,
    primarydb="pinecone",
    api_key="pcn_...",
    index_name="my-index",
    cache_thresold=0.85,  # Only use cache if score > 0.85
    cacheCapacity=5000,   # Keep recent/hot results cached
)

# Search workflow:
# 1. Query VectorCache in-memory cache
# 2. If cache miss or low score, query Pinecone
# 3. Asynchronously populate cache with Pinecone results
results = await cache.search(query, top_k=10)

Best Practices:

  • Keep cache_thresold high (0.8+) to minimize stale data
  • Store lightweight metadata in cache; keep large blobs in external storage
  • Monitor cache hit rate and adjust capacity based on workload

API Reference

VectorCache Methods

set(emb: List[float], data: Dict[str, Any]) -> SetResponseData

Store a vector in the cache.

response = cache.set(
    emb=[0.1, 0.2, 0.3],
    data={"id": "doc1", "title": "My Document"}
)
print(response.uid)  # Unique ID for the vector

Parameters:

  • emb (List[float]): The embedding vector. Must match the configured dimension.
  • data (Dict[str, Any]): Arbitrary metadata (optional, defaults to {}).

Returns: SetResponseData with uid (assigned ID) and status ("success" or error).


search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]

Search for similar vectors.

results = await cache.search(
    emb=[0.1, 0.2, 0.3],
    top_k=10,
    namespace="documents",  # For Pinecone filtering
    filter={"category": "news"}  # Metadata filter (Pinecone)
)

for record in results:
    print(f"Score: {record.score:.3f}, Data: {record.data}")

Parameters:

  • emb (List[float]): Query vector.
  • top_k (int, optional): Number of results. Default: 5.
  • namespace (str, optional): Pinecone namespace filter. Default: "".
  • filter (Optional[Dict], optional): Pinecone metadata filter. Default: None.

Returns: List of RecordData, sorted by similarity score (best first).


Performance Tips

Memory Optimization

cache = VectorCache(
    cacheCapacity=5000,   # Smaller cache = lower memory
    dim=384,              # Smaller dimension = lower memory
    # Typical memory: 5000 × 384 × 4 bytes ≈ 7 MB
)

Latency Optimization

cache = VectorCache(
    protocol=Protocol.GRPC,  # Binary protocol is faster
    cacheCapacity=50000,     # Larger cache = fewer fallbacks to primary DB
    cache_thresold=0.9,      # More likely to use cached results
)

Batch Processing

# Parallel searches for high throughput
queries = [embedding1, embedding2, embedding3, ...]
results = await asyncio.gather(
    *[cache.search(q, top_k=10) for q in queries]
)

Examples

Real-World: Document Search with Metadata

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        dim=768,
        cacheCapacity=100000,
    )
    
    # Index documents
    documents = [
        {"id": "doc1", "title": "Python 101", "content": "..."},
        {"id": "doc2", "title": "Go Guide", "content": "..."},
        {"id": "doc3", "title": "Rust Book", "content": "..."},
    ]
    
    # Simulate embeddings (in reality, use a model)
    embeddings = [
        [0.1, 0.2, 0.3, ...],  # Python doc
        [0.4, 0.5, 0.6, ...],  # Go doc
        [0.7, 0.8, 0.9, ...],  # Rust doc
    ]
    
    for doc, emb in zip(documents, embeddings):
        cache.set(emb, {
            "id": doc["id"],
            "title": doc["title"],
            "preview": doc["content"][:100]
        })
    
    # Search
    query_embedding = [0.15, 0.25, 0.35, ...]  # Query for Python docs
    results = await cache.search(query_embedding, top_k=2)
    
    for r in results:
        print(f"Found: {r.data['title']} (similarity: {r.score:.2f})")

asyncio.run(main())

Advanced: Cache with Pinecone Backup

import asyncio
from vector_cache import VectorCache, IndexType

async def hybrid_search():
    # Setup cache + Pinecone
    cache = VectorCache(
        dim=1536,
        indexType=IndexType.FLAT_IP,
        cacheCapacity=20000,
        cache_thresold=0.8,
        primarydb="pinecone",
        api_key="your-api-key",
        index_name="prod-index",
    )
    
    # Add to cache
    response = cache.set(
        [0.1, ...],
        {"doc_id": "123", "updated": "2024-01-01"}
    )
    
    # Search: uses cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=20)
    
    # Analyze results
    cache_hits = sum(1 for r in results if r.score > 0.8)
    print(f"Cache hits: {cache_hits}, Pinecone results: {len(results) - cache_hits}")

asyncio.run(hybrid_search())

Troubleshooting

Server Won't Start

Problem: "Failed to start VectorCache server"

Solution:

  • Check if port is already in use: lsof -i :6379
  • Try a different port: VectorCache(port=6380)
  • Verify binary is accessible: echo $PATH

High Memory Usage

Problem: Cache is using too much RAM

Solution:

  • Reduce cacheCapacity
  • Use smaller embedding dimension if possible
  • Enable log rotation for server logs

Slow Search

Problem: Searches are taking >100ms

Solution:

  • Use protocol=Protocol.GRPC (faster than HTTP)
  • Increase cacheCapacity to improve hit rate
  • Profile query embedding dimensions

Made with ❤️ by the Abhishek Maurya & Kinjal Raykarmakar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_cache_memory-0.1.7.tar.gz (32.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vector_cache_memory-0.1.7-py3-none-any.whl (32.1 MB view details)

Uploaded Python 3

File details

Details for the file vector_cache_memory-0.1.7.tar.gz.

File metadata

  • Download URL: vector_cache_memory-0.1.7.tar.gz
  • Upload date:
  • Size: 32.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vector_cache_memory-0.1.7.tar.gz
Algorithm Hash digest
SHA256 04e3aa4b4dbd5389719b661fb8dcfd6e0053cececd59b3730bff662016abf28f
MD5 79d4f0dee9827629e007576cd147f514
BLAKE2b-256 a55f024537613823194aa5a7eeb51d2da797b8437d618cb98407cfebdeb8e7b1

See more details on using hashes here.

File details

Details for the file vector_cache_memory-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for vector_cache_memory-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 96ff2f6c5de3c20934137cbf743a0444b70e5e4461ee865a42273728db6f08c3
MD5 239bb58af3b54a203f57511756cdc61a
BLAKE2b-256 9b7f6f047bd85015085773ab63b35465939430bcb0acc978a3e56e8f65c454ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page