Skip to main content

Python client for VectorCache Go DB server

Reason this release was yanked:

Binaries should be made executable manually

Project description

VectorCache

VectorCache is a high-performance, in-memory vector database with gRPC and HTTP support. It allows storing, retrieving, and searching high-dimensional vector embeddings efficiently. The project provides a Go server and Python clients with type-safe APIs.

Features

  • ✅ Vector storage and retrieval with user-defined IDs and metadata
  • ✅ gRPC and HTTP clients for Python
  • ✅ Multiple index types: Flat L2, Flat Inner Product (cosine similarity)
  • ✅ Thread-safe in-memory database
  • ✅ Graceful shutdown and logging
  • ✅ Automatic server binary management for Python clients

Usage

Go Server

[!WARNING]
The Go server is not intended to be used for standalone purpose

go run cmd/main.go --port 8000 --indexType flatL2 --dim 3 --protocol grpc
Flag Description Default
--port Port to listen on 8000
--indexType Type of index (flatL2, flatIP) flatL2
--dim Vector dimension 3
--protocol Communication protocol (grpc, http) grpc

Installation

Via PyPI (Recommended)

pip install vector-cache-memory

Quick Start

1. Standalone In-Memory Cache

Store and search vectors with minimal setup:

import asyncio
from vector_cache import VectorCache

async def main():
    # Initialize cache (server starts automatically)
    cache = VectorCache(dim=768)  # 768-dimensional vectors
    
    # Store embeddings with metadata
    response = cache.set(
        emb=[0.1, 0.2, 0.3, ...],  # Your embedding
        data={"title": "Hello World", "url": "https://example.com"}
    )
    print(f"Stored vector: {response.uid}")
    
    # Search for similar vectors
    results = await cache.search([0.1, 0.2, 0.3, ...], top_k=5)
    
    for record in results:
        print(f"Match: {record.data['title']} (score: {record.score:.3f})")

asyncio.run(main())

2. With Pinecone Integration

Use VectorCache as a high-speed cache layer in front of Pinecone:

import asyncio
from vector_cache import VectorCache, IndexType

async def main():
    cache = VectorCache(
        dim=1536,  # OpenAI embedding dimension
        indexType=IndexType.FLAT_IP,  # Cosine-like similarity
        cacheCapacity=10000,  # Keep 10k vectors in RAM
        cache_thresold=0.85,  # Use cache if score >= 0.85
        
        # Pinecone configuration
        primarydb="pinecone",
        api_key="your-pinecone-api-key",
        index_name="your-index"
    )
    
    # Search: checks cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=5)
    print(results)

asyncio.run(main())

3. Batch Operations (High Throughput)

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        protocol="grpc",  # Faster than HTTP
        cacheCapacity=50000,
    )
    
    # Batch search multiple queries in parallel
    queries = [[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]]
    tasks = [cache.search(q, top_k=5) for q in queries]
    all_results = await asyncio.gather(*tasks)
    
    for results in all_results:
        print(f"Got {len(results)} matches")

asyncio.run(main())

Core Concepts

VectorCache Class

The main entry point. It manages:

  • The local vector database server
  • Client-side search/store operations
  • Automatic lifecycle management (start on init, stop on exit)
from vector_cache import VectorCache, Protocol, IndexType, EvictionPolicy

cache = VectorCache(
    port=6379,                          # Server port
    dim=512,                            # Vector dimensionality
    indexType=IndexType.FLAT_L2,        # L2 distance or FLAT_IP for cosine
    protocol=Protocol.GRPC,             # gRPC (faster) or HTTP
    eviction_policy=EvictionPolicy.FIFO,  # Cache eviction strategy
    cacheCapacity=1000,                 # Max vectors in memory
    cache_thresold=0.8,                 # Use cache if similarity > 0.8
    log="stdout",                       # Log output
)

RecordData

Returned by search(). Contains the matched vector and its metadata:

from vector_cache import RecordData

# structure:
# RecordData(
#   uid: str,              # Unique ID for this vector
#   emb: List[float],      # The stored embedding
#   data: Dict[str, Any],  # Your metadata
#   score: float           # Similarity score
# )

results = await cache.search(query, top_k=3)
for r in results:
    print(f"ID: {r.uid}, Score: {r.score:.3f}, Data: {r.data}")

SetResponseData

Returned by set(). Indicates insertion status:

from vector_cache import SetResponseData

response = cache.set(emb, data)
# response.uid: the assigned ID (save for later reference)
# response.status: "success" or error message

if response and response.status == "success":
    print(f"Stored with ID: {response.uid}")

Configuration Guide

Vector Dimensionality

Set dim to match your embeddings:

  • 768: BERT, sentence-transformers
  • 1536: OpenAI text-embedding-ada-002
  • 512: Custom embeddings (default)
  • 1024: Cohere, Jina embeddings

Index Type

Choose based on your embedding type:

IndexType Use Case Similarity Metric
FLAT_L2 Raw embeddings, image features Euclidean distance
FLAT_IP Normalized embeddings, cosine-like Inner product
# For normalized text embeddings (cosine similarity)
cache = VectorCache(indexType=IndexType.FLAT_IP)

# For raw feature vectors (L2 distance)
cache = VectorCache(indexType=IndexType.FLAT_L2)

Cache Capacity & Eviction

cache = VectorCache(
    cacheCapacity=10000,        # Keep up to 10k vectors
    # Memory ≈ vectors × dim × 4 bytes
    # 10k vectors × 768 dim × 4 = ~30 MB
    eviction_policy=EvictionPolicy.FIFO,  # Remove oldest when full
)

Protocol Selection

# gRPC (default, recommended for production)
cache = VectorCache(protocol=Protocol.GRPC)  # Lower latency, binary

# HTTP (good for debugging, firewall-friendly)
cache = VectorCache(protocol=Protocol.HTTP)  # Easy to inspect with curl

Integrations

Pinecone

Combine VectorCache's low-latency caching with Pinecone's durability:

cache = VectorCache(
    dim=1536,
    primarydb="pinecone",
    api_key="pcn_...",
    index_name="my-index",
    cache_thresold=0.85,  # Only use cache if score > 0.85
    cacheCapacity=5000,   # Keep recent/hot results cached
)

# Search workflow:
# 1. Query VectorCache in-memory cache
# 2. If cache miss or low score, query Pinecone
# 3. Asynchronously populate cache with Pinecone results
results = await cache.search(query, top_k=10)

Best Practices:

  • Keep cache_thresold high (0.8+) to minimize stale data
  • Store lightweight metadata in cache; keep large blobs in external storage
  • Monitor cache hit rate and adjust capacity based on workload

API Reference

VectorCache Methods

set(emb: List[float], data: Dict[str, Any]) -> SetResponseData

Store a vector in the cache.

response = cache.set(
    emb=[0.1, 0.2, 0.3],
    data={"id": "doc1", "title": "My Document"}
)
print(response.uid)  # Unique ID for the vector

Parameters:

  • emb (List[float]): The embedding vector. Must match the configured dimension.
  • data (Dict[str, Any]): Arbitrary metadata (optional, defaults to {}).

Returns: SetResponseData with uid (assigned ID) and status ("success" or error).


search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]

Search for similar vectors.

results = await cache.search(
    emb=[0.1, 0.2, 0.3],
    top_k=10,
    namespace="documents",  # For Pinecone filtering
    filter={"category": "news"}  # Metadata filter (Pinecone)
)

for record in results:
    print(f"Score: {record.score:.3f}, Data: {record.data}")

Parameters:

  • emb (List[float]): Query vector.
  • top_k (int, optional): Number of results. Default: 5.
  • namespace (str, optional): Pinecone namespace filter. Default: "".
  • filter (Optional[Dict], optional): Pinecone metadata filter. Default: None.

Returns: List of RecordData, sorted by similarity score (best first).


Performance Tips

Memory Optimization

cache = VectorCache(
    cacheCapacity=5000,   # Smaller cache = lower memory
    dim=384,              # Smaller dimension = lower memory
    # Typical memory: 5000 × 384 × 4 bytes ≈ 7 MB
)

Latency Optimization

cache = VectorCache(
    protocol=Protocol.GRPC,  # Binary protocol is faster
    cacheCapacity=50000,     # Larger cache = fewer fallbacks to primary DB
    cache_thresold=0.9,      # More likely to use cached results
)

Batch Processing

# Parallel searches for high throughput
queries = [embedding1, embedding2, embedding3, ...]
results = await asyncio.gather(
    *[cache.search(q, top_k=10) for q in queries]
)

Examples

Real-World: Document Search with Metadata

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        dim=768,
        cacheCapacity=100000,
    )
    
    # Index documents
    documents = [
        {"id": "doc1", "title": "Python 101", "content": "..."},
        {"id": "doc2", "title": "Go Guide", "content": "..."},
        {"id": "doc3", "title": "Rust Book", "content": "..."},
    ]
    
    # Simulate embeddings (in reality, use a model)
    embeddings = [
        [0.1, 0.2, 0.3, ...],  # Python doc
        [0.4, 0.5, 0.6, ...],  # Go doc
        [0.7, 0.8, 0.9, ...],  # Rust doc
    ]
    
    for doc, emb in zip(documents, embeddings):
        cache.set(emb, {
            "id": doc["id"],
            "title": doc["title"],
            "preview": doc["content"][:100]
        })
    
    # Search
    query_embedding = [0.15, 0.25, 0.35, ...]  # Query for Python docs
    results = await cache.search(query_embedding, top_k=2)
    
    for r in results:
        print(f"Found: {r.data['title']} (similarity: {r.score:.2f})")

asyncio.run(main())

Advanced: Cache with Pinecone Backup

import asyncio
from vector_cache import VectorCache, IndexType

async def hybrid_search():
    # Setup cache + Pinecone
    cache = VectorCache(
        dim=1536,
        indexType=IndexType.FLAT_IP,
        cacheCapacity=20000,
        cache_thresold=0.8,
        primarydb="pinecone",
        api_key="your-api-key",
        index_name="prod-index",
    )
    
    # Add to cache
    response = cache.set(
        [0.1, ...],
        {"doc_id": "123", "updated": "2024-01-01"}
    )
    
    # Search: uses cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=20)
    
    # Analyze results
    cache_hits = sum(1 for r in results if r.score > 0.8)
    print(f"Cache hits: {cache_hits}, Pinecone results: {len(results) - cache_hits}")

asyncio.run(hybrid_search())

Troubleshooting

Server Won't Start

Problem: "Failed to start VectorCache server"

Solution:

  • Check if port is already in use: lsof -i :6379
  • Try a different port: VectorCache(port=6380)
  • Verify binary is accessible: echo $PATH

High Memory Usage

Problem: Cache is using too much RAM

Solution:

  • Reduce cacheCapacity
  • Use smaller embedding dimension if possible
  • Enable log rotation for server logs

Slow Search

Problem: Searches are taking >100ms

Solution:

  • Use protocol=Protocol.GRPC (faster than HTTP)
  • Increase cacheCapacity to improve hit rate
  • Profile query embedding dimensions

Made with ❤️ by the Abhishek Maurya & Kinjal Raykarmakar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_cache_memory-0.1.8.tar.gz (32.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vector_cache_memory-0.1.8-py3-none-any.whl (32.1 MB view details)

Uploaded Python 3

File details

Details for the file vector_cache_memory-0.1.8.tar.gz.

File metadata

  • Download URL: vector_cache_memory-0.1.8.tar.gz
  • Upload date:
  • Size: 32.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vector_cache_memory-0.1.8.tar.gz
Algorithm Hash digest
SHA256 9f6e3174ab27ffcb7915f6338f9045041af33324db28841140fd9804f03b48cf
MD5 b4142e89d8224db93af102c6da84fc6d
BLAKE2b-256 40792db031db536520a83664f68801ec13ac50d347d23dd15d09475f7035d261

See more details on using hashes here.

File details

Details for the file vector_cache_memory-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for vector_cache_memory-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 923497a0cf0cda2911a1e8e310e3bf4563498dd8d7a6e52d72e76d7a75419528
MD5 261418e4789b5d3b67f7aa73d0cd2ea7
BLAKE2b-256 78d5c35605956db1b6c9ce276122e2b1687eba8bbc12b7d84714646f8965108f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page