Python client for VectorCache Go DB server

Reason this release was yanked:

Binaries should be made executable manually

Project description

VectorCache

VectorCache is a high-performance, in-memory vector database with gRPC and HTTP support. It allows storing, retrieving, and searching high-dimensional vector embeddings efficiently. The project provides a Go server and Python clients with type-safe APIs.

Features

✅ Vector storage and retrieval with user-defined IDs and metadata
✅ gRPC and HTTP clients for Python
✅ Multiple index types: Flat L2, Flat Inner Product (cosine similarity)
✅ Thread-safe in-memory database
✅ Graceful shutdown and logging
✅ Automatic server binary management for Python clients

Usage

Go Server

[!WARNING]
The Go server is not intended to be used for standalone purpose

go run cmd/main.go --port 8000 --indexType flatL2 --dim 3 --protocol grpc

Flag	Description	Default
`--port`	Port to listen on	8000
`--indexType`	Type of index (`flatL2`, `flatIP`)	flatL2
`--dim`	Vector dimension	3
`--protocol`	Communication protocol (`grpc`, `http`)	grpc

Installation

Via PyPI (Recommended)

pip install vector-cache-memory

Quick Start

1. Standalone In-Memory Cache

Store and search vectors with minimal setup:

import asyncio
from vector_cache import VectorCache

async def main():
    # Initialize cache (server starts automatically)
    cache = VectorCache(dim=768)  # 768-dimensional vectors
    
    # Store embeddings with metadata
    response = cache.set(
        emb=[0.1, 0.2, 0.3, ...],  # Your embedding
        data={"title": "Hello World", "url": "https://example.com"}
    )
    print(f"Stored vector: {response.uid}")
    
    # Search for similar vectors
    results = await cache.search([0.1, 0.2, 0.3, ...], top_k=5)
    
    for record in results:
        print(f"Match: {record.data['title']} (score: {record.score:.3f})")

asyncio.run(main())

2. With Pinecone Integration

Use VectorCache as a high-speed cache layer in front of Pinecone:

import asyncio
from vector_cache import VectorCache, IndexType

async def main():
    cache = VectorCache(
        dim=1536,  # OpenAI embedding dimension
        indexType=IndexType.FLAT_IP,  # Cosine-like similarity
        cacheCapacity=10000,  # Keep 10k vectors in RAM
        cache_thresold=0.85,  # Use cache if score >= 0.85
        
        # Pinecone configuration
        primarydb="pinecone",
        api_key="your-pinecone-api-key",
        index_name="your-index"
    )
    
    # Search: checks cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=5)
    print(results)

asyncio.run(main())

3. Batch Operations (High Throughput)

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        protocol="grpc",  # Faster than HTTP
        cacheCapacity=50000,
    )
    
    # Batch search multiple queries in parallel
    queries = [[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]]
    tasks = [cache.search(q, top_k=5) for q in queries]
    all_results = await asyncio.gather(*tasks)
    
    for results in all_results:
        print(f"Got {len(results)} matches")

asyncio.run(main())

Core Concepts

VectorCache Class

The main entry point. It manages:

The local vector database server
Client-side search/store operations
Automatic lifecycle management (start on init, stop on exit)

from vector_cache import VectorCache, Protocol, IndexType, EvictionPolicy

cache = VectorCache(
    port=6379,                          # Server port
    dim=512,                            # Vector dimensionality
    indexType=IndexType.FLAT_L2,        # L2 distance or FLAT_IP for cosine
    protocol=Protocol.GRPC,             # gRPC (faster) or HTTP
    eviction_policy=EvictionPolicy.FIFO,  # Cache eviction strategy
    cacheCapacity=1000,                 # Max vectors in memory
    cache_thresold=0.8,                 # Use cache if similarity > 0.8
    log="stdout",                       # Log output
)

RecordData

Returned by search(). Contains the matched vector and its metadata:

from vector_cache import RecordData

# structure:
# RecordData(
#   uid: str,              # Unique ID for this vector
#   emb: List[float],      # The stored embedding
#   data: Dict[str, Any],  # Your metadata
#   score: float           # Similarity score
# )

results = await cache.search(query, top_k=3)
for r in results:
    print(f"ID: {r.uid}, Score: {r.score:.3f}, Data: {r.data}")

SetResponseData

Returned by set(). Indicates insertion status:

from vector_cache import SetResponseData

response = cache.set(emb, data)
# response.uid: the assigned ID (save for later reference)
# response.status: "success" or error message

if response and response.status == "success":
    print(f"Stored with ID: {response.uid}")

Configuration Guide

Vector Dimensionality

Set dim to match your embeddings:

768: BERT, sentence-transformers
1536: OpenAI text-embedding-ada-002
512: Custom embeddings (default)
1024: Cohere, Jina embeddings

Index Type

Choose based on your embedding type:

IndexType	Use Case	Similarity Metric
`FLAT_L2`	Raw embeddings, image features	Euclidean distance
`FLAT_IP`	Normalized embeddings, cosine-like	Inner product

# For normalized text embeddings (cosine similarity)
cache = VectorCache(indexType=IndexType.FLAT_IP)

# For raw feature vectors (L2 distance)
cache = VectorCache(indexType=IndexType.FLAT_L2)

Cache Capacity & Eviction

cache = VectorCache(
    cacheCapacity=10000,        # Keep up to 10k vectors
    # Memory ≈ vectors × dim × 4 bytes
    # 10k vectors × 768 dim × 4 = ~30 MB
    eviction_policy=EvictionPolicy.FIFO,  # Remove oldest when full
)

Protocol Selection

# gRPC (default, recommended for production)
cache = VectorCache(protocol=Protocol.GRPC)  # Lower latency, binary

# HTTP (good for debugging, firewall-friendly)
cache = VectorCache(protocol=Protocol.HTTP)  # Easy to inspect with curl

Integrations

Pinecone

Combine VectorCache's low-latency caching with Pinecone's durability:

cache = VectorCache(
    dim=1536,
    primarydb="pinecone",
    api_key="pcn_...",
    index_name="my-index",
    cache_thresold=0.85,  # Only use cache if score > 0.85
    cacheCapacity=5000,   # Keep recent/hot results cached
)

# Search workflow:
# 1. Query VectorCache in-memory cache
# 2. If cache miss or low score, query Pinecone
# 3. Asynchronously populate cache with Pinecone results
results = await cache.search(query, top_k=10)

Best Practices:

Keep cache_thresold high (0.8+) to minimize stale data
Store lightweight metadata in cache; keep large blobs in external storage
Monitor cache hit rate and adjust capacity based on workload

API Reference

VectorCache Methods

`set(emb: List[float], data: Dict[str, Any]) -> SetResponseData`

Store a vector in the cache.

response = cache.set(
    emb=[0.1, 0.2, 0.3],
    data={"id": "doc1", "title": "My Document"}
)
print(response.uid)  # Unique ID for the vector

Parameters:

emb (List[float]): The embedding vector. Must match the configured dimension.
data (Dict[str, Any]): Arbitrary metadata (optional, defaults to {}).

Returns: SetResponseData with uid (assigned ID) and status ("success" or error).

`search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]`

Search for similar vectors.

results = await cache.search(
    emb=[0.1, 0.2, 0.3],
    top_k=10,
    namespace="documents",  # For Pinecone filtering
    filter={"category": "news"}  # Metadata filter (Pinecone)
)

for record in results:
    print(f"Score: {record.score:.3f}, Data: {record.data}")

Parameters:

emb (List[float]): Query vector.
top_k (int, optional): Number of results. Default: 5.
namespace (str, optional): Pinecone namespace filter. Default: "".
filter (Optional[Dict], optional): Pinecone metadata filter. Default: None.

Returns: List of RecordData, sorted by similarity score (best first).

Performance Tips

Memory Optimization

cache = VectorCache(
    cacheCapacity=5000,   # Smaller cache = lower memory
    dim=384,              # Smaller dimension = lower memory
    # Typical memory: 5000 × 384 × 4 bytes ≈ 7 MB
)

Latency Optimization

cache = VectorCache(
    protocol=Protocol.GRPC,  # Binary protocol is faster
    cacheCapacity=50000,     # Larger cache = fewer fallbacks to primary DB
    cache_thresold=0.9,      # More likely to use cached results
)

Batch Processing

# Parallel searches for high throughput
queries = [embedding1, embedding2, embedding3, ...]
results = await asyncio.gather(
    *[cache.search(q, top_k=10) for q in queries]
)

Examples

Real-World: Document Search with Metadata

import asyncio
from vector_cache import VectorCache

async def main():
    cache = VectorCache(
        dim=768,
        cacheCapacity=100000,
    )
    
    # Index documents
    documents = [
        {"id": "doc1", "title": "Python 101", "content": "..."},
        {"id": "doc2", "title": "Go Guide", "content": "..."},
        {"id": "doc3", "title": "Rust Book", "content": "..."},
    ]
    
    # Simulate embeddings (in reality, use a model)
    embeddings = [
        [0.1, 0.2, 0.3, ...],  # Python doc
        [0.4, 0.5, 0.6, ...],  # Go doc
        [0.7, 0.8, 0.9, ...],  # Rust doc
    ]
    
    for doc, emb in zip(documents, embeddings):
        cache.set(emb, {
            "id": doc["id"],
            "title": doc["title"],
            "preview": doc["content"][:100]
        })
    
    # Search
    query_embedding = [0.15, 0.25, 0.35, ...]  # Query for Python docs
    results = await cache.search(query_embedding, top_k=2)
    
    for r in results:
        print(f"Found: {r.data['title']} (similarity: {r.score:.2f})")

asyncio.run(main())

Advanced: Cache with Pinecone Backup

import asyncio
from vector_cache import VectorCache, IndexType

async def hybrid_search():
    # Setup cache + Pinecone
    cache = VectorCache(
        dim=1536,
        indexType=IndexType.FLAT_IP,
        cacheCapacity=20000,
        cache_thresold=0.8,
        primarydb="pinecone",
        api_key="your-api-key",
        index_name="prod-index",
    )
    
    # Add to cache
    response = cache.set(
        [0.1, ...],
        {"doc_id": "123", "updated": "2024-01-01"}
    )
    
    # Search: uses cache first, falls back to Pinecone
    results = await cache.search([0.1, ...], top_k=20)
    
    # Analyze results
    cache_hits = sum(1 for r in results if r.score > 0.8)
    print(f"Cache hits: {cache_hits}, Pinecone results: {len(results) - cache_hits}")

asyncio.run(hybrid_search())

Troubleshooting

Server Won't Start

Problem: "Failed to start VectorCache server"

Solution:

Check if port is already in use: lsof -i :6379
Try a different port: VectorCache(port=6380)
Verify binary is accessible: echo $PATH

High Memory Usage

Problem: Cache is using too much RAM

Solution:

Reduce cacheCapacity
Use smaller embedding dimension if possible
Enable log rotation for server logs

Slow Search

Problem: Searches are taking >100ms

Solution:

Use protocol=Protocol.GRPC (faster than HTTP)
Increase cacheCapacity to improve hit rate
Profile query embedding dimensions

Made with ❤️ by the Abhishek Maurya & Kinjal Raykarmakar

Project details

Release history Release notifications | RSS feed

0.1.9

Nov 28, 2025

This version

0.1.8 yanked

Nov 28, 2025

Reason this release was yanked:

Binaries should be made executable manually

0.1.7 yanked

Nov 28, 2025

Reason this release was yanked:

cache server issue

0.1.6 yanked

Oct 13, 2025

Reason this release was yanked:

Does not have most of the features.

0.1.5 yanked

Oct 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_cache_memory-0.1.8.tar.gz (32.6 MB view details)

Uploaded Nov 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vector_cache_memory-0.1.8-py3-none-any.whl (32.1 MB view details)

Uploaded Nov 28, 2025 Python 3

File details

Details for the file vector_cache_memory-0.1.8.tar.gz.

File metadata

Download URL: vector_cache_memory-0.1.8.tar.gz
Upload date: Nov 28, 2025
Size: 32.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vector_cache_memory-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`9f6e3174ab27ffcb7915f6338f9045041af33324db28841140fd9804f03b48cf`
MD5	`b4142e89d8224db93af102c6da84fc6d`
BLAKE2b-256	`40792db031db536520a83664f68801ec13ac50d347d23dd15d09475f7035d261`

See more details on using hashes here.

File details

Details for the file vector_cache_memory-0.1.8-py3-none-any.whl.

File metadata

Download URL: vector_cache_memory-0.1.8-py3-none-any.whl
Upload date: Nov 28, 2025
Size: 32.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vector_cache_memory-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`923497a0cf0cda2911a1e8e310e3bf4563498dd8d7a6e52d72e76d7a75419528`
MD5	`261418e4789b5d3b67f7aa73d0cd2ea7`
BLAKE2b-256	`78d5c35605956db1b6c9ce276122e2b1687eba8bbc12b7d84714646f8965108f`

See more details on using hashes here.

vector-cache-memory 0.1.8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

VectorCache

Features

Usage

Go Server

Installation

Via PyPI (Recommended)

Quick Start

1. Standalone In-Memory Cache

2. With Pinecone Integration

3. Batch Operations (High Throughput)

Core Concepts

VectorCache Class

RecordData

SetResponseData

Configuration Guide

Vector Dimensionality

Index Type

Cache Capacity & Eviction

Protocol Selection

Integrations

Pinecone

API Reference

VectorCache Methods

set(emb: List[float], data: Dict[str, Any]) -> SetResponseData

search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]

Performance Tips

Memory Optimization

Latency Optimization

Batch Processing

Examples

Real-World: Document Search with Metadata

Advanced: Cache with Pinecone Backup

Troubleshooting

Server Won't Start

High Memory Usage

Slow Search

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`set(emb: List[float], data: Dict[str, Any]) -> SetResponseData`

`search(emb: List[float], top_k: int = 5, namespace: str = "", filter: Optional[Dict] = None) -> List[RecordData]`