Modern ColBERT for Late Interaction with native multi-vector support
Project description
Lateness - Modern ColBERT for Late Interaction
A Python package for Modern ColBERT (late interaction) embeddings with native multi-vector support for efficient retrieval using Qdrant vector database.
Features
- Dual Backend Architecture: ONNX for fast retrieval, PyTorch for GPU indexing
- Native Multi-Vector Support: Optimized for Qdrant's MaxSim comparator
- Smart Installation: Lightweight retrieval or heavy indexing based on your needs
- Production Ready: Separate deployment targets for different workloads
Quick Start
Installation
# Lightweight retrieval (ONNX + Qdrant)
pip install lateness
# Heavy indexing (PyTorch + Transformers + ONNX + Qdrant)
pip install lateness[index]
Backend Selection
Basic Usage
Default Installation (ONNX Backend):
# pip install lateness
from lateness import ModernColBERT
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using ONNX backend Using ONNX backend (default, for GPU accelerated indexing, install lateness[index] and set LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# ✅ ONNX ColBERT loaded with providers: ['CPUExecutionProvider']
# Query max length: 256, Document max length: 300
Index Installation (PyTorch Backend):
# pip install lateness[index]
import os
os.environ['LATENESS_USE_TORCH'] = 'true'
from lateness import ModernColBERT
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using PyTorch backend (LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# Loading model from: /root/.cache/huggingface/hub/models--prithivida--modern_colbert_base_en_v1/...
# ✅ PyTorch ColBERT loaded on cuda
# Query max length: 256, Document max length: 300
Complete Example with Qdrant:
For a complete working example with Qdrant integration, environment setup, and testing instructions, see the examples/qdrant folder.
The examples include:
- Environment setup and testing
- Local Qdrant server management
- Complete indexing and retrieval workflows
- Both ONNX and PyTorch backend examples
Architecture
Two Deployment Models
Retrieval Service (Lightweight)
pip install lateness
- ONNX backend (fast CPU inference)
- Qdrant integration
- ~50MB total dependencies
- Perfect for user-facing search APIs
Indexing Service (Heavy)
pip install lateness[index]
- PyTorch backend (GPU acceleration)
- Full Transformers support
- ~2GB+ dependencies
- Perfect for batch document processing
Backend Selection
The package uses environment variables for backend control:
- Default behavior → ONNX backend (CPU retrieval)
LATENESS_USE_TORCH=true→ PyTorch backend (GPU indexing)
Note: PyTorch backend requires pip install lateness[index] to install PyTorch dependencies.
API Reference
ModernColBERT
from lateness import ModernColBERT
# Initialize
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Encode queries
query_embeddings = colbert.encode_queries(["What is AI?"])
# Encode documents
doc_embeddings = colbert.encode_documents(["AI is artificial intelligence"])
# Compute similarity
scores = ModernColBERT.compute_similarity(query_embeddings, doc_embeddings)
Qdrant Integration
from lateness import QdrantIndexer, QdrantRetriever
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
# Indexing
indexer = QdrantIndexer(client, "documents")
indexer.create_collection()
indexer.index_documents_simple(documents)
# Retrieval
retriever = QdrantRetriever(client, "documents")
results = retriever.search_simple("query", top_k=10)
License
Apache License 2.0
Contributing
Contributions welcome! Please check our contributing guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lateness-0.1.0.tar.gz.
File metadata
- Download URL: lateness-0.1.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe807e2e8cd6757f4acf78eb15dcfbdef21e6aa917bf5964143ae6703b5e18e7
|
|
| MD5 |
f62faf8d16cd0194bcaeedbd55c87121
|
|
| BLAKE2b-256 |
bdfd352f8bd8c48526cc37f7dc8614ec899cdd1624bf40b818856d5037767622
|
File details
Details for the file lateness-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lateness-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a36d2ac44be6b48ba06a2e9c8bd643318f947ba56cee24347cdcada034b6a580
|
|
| MD5 |
c015281960e6d63d9169054611c47448
|
|
| BLAKE2b-256 |
0e1402351c6bdf42d9880e999ab5234bbd10f6b99b99b8674cd68b8e40b4b26f
|