Optimization of Latent-Space Compression using Game-Theoretic Techniques for Transformer-Based Vector Search
Project description
NashVec
Optimization of Latent-Space Compression using Game-Theoretic Techniques for Transformer-Based Vector Search
Overview
NashVec is a game-theoretic approach to optimizing vector search by balancing information preservation and retrieval efficiency. By formulating vector compression as a zero-sum game between an encoder (compressor) and retriever, NashVec achieves superior search performance with reduced storage requirements.
Key Features
- ๐ฌ Game-Theoretic Optimization: Balances reconstruction quality with retrieval performance using a Nash equilibrium approach
- ๐ฏ Hybrid Search: Combines compressed latent representations with HNSW indexing for fast, accurate retrieval
- ๐ Dual Backend Support: FAISS for baseline comparison and custom HNSW for hybrid search
- ๐ Production-Ready: Modular design with CLI tools, comprehensive tests, and extensive documentation
- ๐ง Customizable: Adjustable loss weights, dimensions, and hyperparameters for different use cases
Installation
Requirements
- Python 3.8 or higher
- pip
Install from PyPI
pip install nashvec
Install from Source
git clone https://github.com/kushagraagrawal/NashVec.git
cd NashVec
pip install -e .
Install with Test Dependencies
pip install "nashvec[dev]"
Quick Start
Using the Python API
from nashvec import HybridSearcher
# Initialize hybrid searcher with game-theoretic compression
searcher = HybridSearcher(use_hybrid=True)
# Load data and train
searcher.load_and_train(limit=500, epochs=10)
# Search
results = searcher.search("Explain the process of photosynthesis", top_n=5)
for sentence, score in results:
print(f"Score: {score:.4f}\n{sentence}\n")
Using the CLI
# Train a model
nashvec-train --epochs 10 --batch-size 32
# Query the model
nashvec-query "Explain the process of photosynthesis" --top-n 5
# Benchmark performance
nashvec-benchmark --limit 500 --epochs 10
Architecture
Game-Theoretic Framework
NashVec implements a two-player game:
- Encoder (Compressor): Minimizes reconstruction error to preserve information
- Retriever: Maximizes search efficiency by clustering similar items in latent space
The game-theoretic loss function:
L_total = L_reconstruction + ฮป ร L_triplet
Where:
L_reconstruction: MSE between original and reconstructed embeddingsL_triplet: Triplet loss ensuring similar items are close in latent spaceฮป: Balance parameter (typically 0.5)
Component Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Transformer Embeddings โ
โ (Sentence-BERT, 384-dim) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Game-Theoretic Autoencoder โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Encoder โโโโโโโถโ Latent โ โ
โ โ (compress) โ โ Space โ โ
โ โโโโโโโโโโโโโโโโ โ (128-dim) โ โ
โ โโโโโโโโฌโโโโโโโโ โ
โ โ โ โ โ
โ โ โ โผ โ
โ โ โ Decoder โ
โ โ โ (reconstruct) โ
โ โผ โผ โผ โ
โ Triplet Loss + MSE Loss โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HNSW Index (Fast Retrieval) โ
โ + โ
โ Re-ranking (Original Space) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Documentation
Core Modules
nashvec.data
Dataset loading and preprocessing:
from nashvec.data import load_alpaca_data, load_custom_dataset
# Load Alpaca dataset
instructions = load_alpaca_data(limit=500)
# Load custom dataset
texts = load_custom_dataset("data.csv", text_column="text")
nashvec.embedding
Sentence embedding generation:
from nashvec.embedding import SentenceEmbedder
embedder = SentenceEmbedder(model_name="all-MiniLM-L6-v2")
embeddings = embedder.encode_batch(texts, batch_size=32)
nashvec.autoencoder
Game-theoretic autoencoder:
from nashvec.autoencoder import GameTheoreticAutoencoder, build_encoder_decoder
encoder, decoder = build_encoder_decoder(input_dim=384, latent_dim=128)
model = GameTheoreticAutoencoder(encoder, decoder, lambda_retrieval=0.5)
model.compile(optimizer='adam', ae_loss_fn='mse')
nashvec.search
High-level search interface:
from nashvec.search import HybridSearcher, compute_utility
searcher = HybridSearcher(use_hybrid=True)
results = searcher.search("query text", top_n=5)
utility = compute_utility(accuracy=0.95, query_time=0.01)
Configuration
from nashvec.utils import NashVecConfig
config = NashVecConfig(
latent_dim=128,
epochs=10,
lambda_retrieval=0.5,
margin=0.2
)
Examples
Example 1: Basic Search
from nashvec import HybridSearcher
# Create searcher
searcher = HybridSearcher(use_hybrid=True)
# Train on dataset
searcher.load_and_train(limit=500, epochs=10)
# Query
results = searcher.search("How does photosynthesis work?", top_n=3)
for i, (text, score) in enumerate(results, 1):
print(f"{i}. [{score:.4f}] {text}")
Example 2: Evaluation with Metrics
from nashvec import HybridSearcher
searcher = HybridSearcher(use_hybrid=True)
searcher.load_and_train(limit=500, epochs=10)
# Get evaluation metrics
metrics = searcher.evaluate("query text", top_n=5)
print(f"Query Time: {metrics['query_time']:.4f}s")
print(f"Avg Similarity: {metrics['avg_similarity']:.4f}")
print(f"Utility: {metrics['utility']:.4f}")
Example 3: Comparison Study
from nashvec import HybridSearcher
queries = [
"Explain machine learning",
"What is Python?",
"Describe neural networks"
]
# Test hybrid system
hybrid_searcher = HybridSearcher(use_hybrid=True)
hybrid_searcher.load_and_train(limit=500, epochs=10)
# Test baseline
faiss_searcher = HybridSearcher(use_hybrid=False)
faiss_searcher.load_and_train(limit=500)
for query in queries:
hybrid_results = hybrid_searcher.evaluate(query)
faiss_results = faiss_searcher.evaluate(query)
if hybrid_results['utility'] > faiss_results['utility']:
print(f"โ Hybrid wins for '{query}'")
else:
print(f"โ Baseline wins for '{query}'")
Running Examples
# Run the demo
python examples/demo_search.py
# Run tests
pytest tests/
# Run specific test
pytest tests/test_autoencoder.py
API Reference
See full API documentation for detailed information about all modules, classes, and functions.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Citation
If you use NashVec in your research, please cite:
@misc{agrawal2025optimizationlatentspacecompressionusing,
title={Optimization of Latent-Space Compression using Game-Theoretic Techniques for Transformer-Based Vector Search},
author={Kushagra Agrawal and Nisharg Nargund and Oishani Banerjee},
year={2025},
eprint={2508.18877},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2508.18877},
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Sentence Transformers for embedding generation
- Facebook AI Research for FAISS
- Yury Malkov for HNSW algorithm
- Hugging Face for datasets
Support
For issues, questions, or contributions:
NashVec - Game-Theoretic Vector Search for the Modern Era ๐ฌ๐
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nashvec-0.1.0.tar.gz.
File metadata
- Download URL: nashvec-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2758e53b88d9b29ed45608b88f45686eb4d27f6789a8424d09e0b59680b9567d
|
|
| MD5 |
cc52189a27ce9ad2ff3d983519cdb488
|
|
| BLAKE2b-256 |
641bbc683ab1fb03619c7b76f92f0ec7eb35b3c6794fcd882e274633a85c9927
|
File details
Details for the file nashvec-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nashvec-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d14d3312f227310de8ab7b31385cb287a38cdcf6cd05c2dfb18acde93801de3
|
|
| MD5 |
586de34cda1e2cf250d10e98a79d0e07
|
|
| BLAKE2b-256 |
000eedb08f962ca4255bc5f4abc93021a2e0ffa10f6ae0b921cf784bb0a25f30
|