Skip to main content

Modern ColBERT for Late Interaction with native multi-vector support

Project description

Lateness - Modern ColBERT for Late Interaction

A Python package for Modern ColBERT (late interaction) embeddings with native multi-vector support for efficient retrieval using Qdrant vector database.

Features

  • Dual Backend Architecture: ONNX for fast retrieval, PyTorch for GPU indexing
  • Native Multi-Vector Support: Optimized for Qdrant's MaxSim comparator
  • Smart Installation: Lightweight retrieval or heavy indexing based on your needs
  • Production Ready: Separate deployment targets for different workloads

Quick Start

Installation

# Lightweight retrieval (ONNX + Qdrant)
pip install lateness

# Heavy indexing (PyTorch + Transformers + ONNX + Qdrant)
pip install lateness[index]

Backend Selection

Basic Usage

Default Installation (ONNX Backend):

# pip install lateness
from lateness import ModernColBERT
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using ONNX backend Using ONNX backend (default, for GPU accelerated indexing, install lateness[index] and set LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# ✅ ONNX ColBERT loaded with providers: ['CPUExecutionProvider']
# Query max length: 256, Document max length: 300

Index Installation (PyTorch Backend):

# pip install lateness[index]
import os
os.environ['LATENESS_USE_TORCH'] = 'true'
from lateness import ModernColBERT

colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")
# Output:
# 🚀 Using PyTorch backend (LATENESS_USE_TORCH=true)
# 🔄 Downloading model: prithivida/modern_colbert_base_en_v1
# Loading model from: /root/.cache/huggingface/hub/models--prithivida--modern_colbert_base_en_v1/...
# ✅ PyTorch ColBERT loaded on cuda
# Query max length: 256, Document max length: 300

Complete Example with Qdrant:

For a complete working example with Qdrant integration, environment setup, and testing instructions, see the examples/qdrant folder.

The examples include:

  • Environment setup and testing
  • Local Qdrant server management
  • Complete indexing and retrieval workflows
  • Both ONNX and PyTorch backend examples

Architecture

Two Deployment Models

Retrieval Service (Lightweight)

pip install lateness
  • ONNX backend (fast CPU inference)
  • Qdrant integration
  • ~50MB total dependencies
  • Perfect for user-facing search APIs

Indexing Service (Heavy)

pip install lateness[index]
  • PyTorch backend (GPU acceleration)
  • Full Transformers support
  • ~2GB+ dependencies
  • Perfect for batch document processing

Backend Selection

The package uses environment variables for backend control:

  • Default behavior → ONNX backend (CPU retrieval)
  • LATENESS_USE_TORCH=true → PyTorch backend (GPU indexing)

Note: PyTorch backend requires pip install lateness[index] to install PyTorch dependencies.

API Reference

ModernColBERT

from lateness import ModernColBERT

# Initialize
colbert = ModernColBERT("prithivida/modern_colbert_base_en_v1")

# Encode queries
query_embeddings = colbert.encode_queries(["What is AI?"])

# Encode documents  
doc_embeddings = colbert.encode_documents(["AI is artificial intelligence"])

# Compute similarity
scores = ModernColBERT.compute_similarity(query_embeddings, doc_embeddings)

Qdrant Integration

from lateness import QdrantIndexer, QdrantRetriever
from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

# Indexing
indexer = QdrantIndexer(client, "documents")
indexer.create_collection()
indexer.index_documents_simple(documents)

# Retrieval
retriever = QdrantRetriever(client, "documents")
results = retriever.search_simple("query", top_k=10)

License

Apache License 2.0

Contributing

Contributions welcome! Please check our contributing guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lateness-0.1.0.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lateness-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file lateness-0.1.0.tar.gz.

File metadata

  • Download URL: lateness-0.1.0.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for lateness-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe807e2e8cd6757f4acf78eb15dcfbdef21e6aa917bf5964143ae6703b5e18e7
MD5 f62faf8d16cd0194bcaeedbd55c87121
BLAKE2b-256 bdfd352f8bd8c48526cc37f7dc8614ec899cdd1624bf40b818856d5037767622

See more details on using hashes here.

File details

Details for the file lateness-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lateness-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for lateness-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a36d2ac44be6b48ba06a2e9c8bd643318f947ba56cee24347cdcada034b6a580
MD5 c015281960e6d63d9169054611c47448
BLAKE2b-256 0e1402351c6bdf42d9880e999ab5234bbd10f6b99b99b8674cd68b8e40b4b26f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page