Fast Plaid.
Project description
FastPlaid
⭐️ Overview
Traditional vector search relies on single, fixed-size embeddings (dense vectors) for documents and queries. While powerful, this approach can lose nuanced, token-level details.
-
Multi-vector search, used in models like ColBERT or ColPali, replaces a single document or image vector with a set of per-token vectors. This enables a "late interaction" mechanism, where fine-grained similarity is calculated term-by-term to boost retrieval accuracy.
-
Higher Accuracy: By matching at a granular, token-level, FastPlaid captures subtle relevance that single-vector models simply miss.
-
PLAID: stands for Per-Token Late Interaction Dense Search.
-
Blazing Performance: Engineered in Rust and optimized for GPUs.
💻 Installation
pip install fast-plaid
⚡️ Quick Start
Get started with creating an index and performing a search in just a few lines of Python.
import torch
from fast_plaid import search
fast_plaid = search.FastPlaid(index="index")
embedding_dim = 128
# Index 100 documents, each with 300 tokens, each token is a 128-dim vector.
fast_plaid.create(
documents_embeddings=[torch.randn(300, embedding_dim) for _ in range(100)]
)
# Search for 2 queries, each with 50 tokens, each token is a 128-dim vector
scores = fast_plaid.search(
queries_embeddings=torch.randn(2, 50, embedding_dim),
top_k=10,
)
print(scores)
The output will be a list of lists, where each inner list contains tuples of (document_index, similarity_score) for the top top_k results for each query:
[
[
(20, 1334.55),
(91, 1299.57),
(59, 1285.78),
(10, 1273.53),
(62, 1267.96),
(44, 1265.55),
(15, 1264.42),
(34, 1261.19),
(19, 1261.05),
(86, 1260.94),
],
[
(58, 1313.85),
(75, 1313.82),
(79, 1305.32),
(61, 1304.45),
(64, 1303.67),
(68, 1302.98),
(66, 1301.23),
(65, 1299.78),
],
]
FastPlaid does not support index updates. Once an index is created, it is immutable. If you need to add or remove documents, you must create a new index. FastPlaid is optimized for GPUs but is compatible with CPUs.
📊 Benchmarks
FastPlaid significantly outperforms the original PLAID engine across various datasets, delivering comparable accuracy with faster indexing and query speeds.
NDCG@10 Indexing Time (s) Queries per seconds (QPS)
dataset size library
arguana 8674 PLAID 0.46 4.30 56.73
FastPlaid 0.46 4.72 155.25 (+174%)
fiqa 57638 PLAID 0.41 17.65 48.13
FastPlaid 0.41 12.62 146.62 (+205%)
nfcorpus 3633 PLAID 0.37 2.30 78.31
FastPlaid 0.37 2.10 243.42 (+211%)
quora 522931 PLAID 0.88 40.01 43.06
FastPlaid 0.87 11.23 281.51 (+554%)
scidocs 25657 PLAID 0.19 13.32 57.17
FastPlaid 0.18 10.86 157.47 (+175%)
scifact 5183 PLAID 0.74 3.43 67.66
FastPlaid 0.75 3.16 190.08 (+181%)
trec-covid 171332 PLAID 0.84 69.46 32.09
FastPlaid 0.83 45.19 54.11 (+69%)
webis-touche2020 382545 PLAID 0.25 128.11 31.94
FastPlaid 0.24 74.50 70.15 (+120%)
All benchmarks were performed on an H100 GPU. It's important to note that PLAID relies on Just-In-Time (JIT) compilation. This means the very first execution can exhibit longer runtimes. To ensure our performance analysis is representative, we've excluded these initial JIT-affected runs from the reported results. In contrast, FastPlaid does not employ JIT compilation, so its performance on the first run is directly indicative of its typical execution speed.
📝 Citation
FastPlaid builds upon the groundbreaking work of the original PLAID engine Santhanam, Keshav, et al..
You can cite FastPlaid in your work as follows:
@misc{fastplaid2025,
author = {Sourty, Raphaël},
title = {FastPlaid: A High-Performance Engine for Multi-Vector Search},
year = {2025},
url = {https://github.com/lightonai/fast-plaid}
}
And for the original PLAID research:
@inproceedings{santhanam2022plaid,
title={{PLAID}: an efficient engine for late interaction retrieval},
author={Santhanam, Keshav and Khattab, Omar and Potts, Christopher and Zaharia, Matei},
booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
pages={1747--1756},
year={2022}
}
📖 FastPlaid Class
The FastPlaid class is the core component for building and querying multi-vector search indexes. It's designed for high performance, especially when leveraging GPUs.
Initialization
To create an instance of FastPlaid, you'll provide the directory where your index will be stored and specify the device(s) for computation.
class FastPlaid:
def __init__(
self,
index: str,
device: str | list[str] | None = None,
) -> None:
index: str
The file path to the directory where your index will be saved or loaded from.
device: str | list[str] | None = None
Specifies the device(s) to use for computation.
- If None (default) and CUDA is available, it defaults to "cuda".
- If CUDA is not available, it defaults to "cpu".
- Can be a single device string (e.g., "cuda:0" or "cpu").
- Can be a list of device strings (e.g., ["cuda:0", "cuda:1"]).
- If multiple GPUs are specified and available, multiprocessing is automatically set up for parallel execution.
Remember to include your code within an `if __name__ == "__main__":` block for proper multiprocessing behavior.
Creating an Index
The create method builds the multi-vector index from your document embeddings. It uses K-means clustering to organize your data for efficient retrieval.
def create(
self,
documents_embeddings: list[torch.Tensor],
kmeans_niters: int = 4,
max_points_per_centroid: int = 256,
nbits: int = 4,
) -> "FastPlaid":
documents_embeddings: list[torch.Tensor]
A list where each element is a PyTorch tensor representing the multi-vector embedding for a single document.
Each document's embedding should have a shape of `(num_tokens, embedding_dimension)`.
kmeans_niters: int = 4 (optional)
The number of iterations for the K-means algorithm used during index creation.
This influences the quality of the initial centroid assignments.
max_points_per_centroid: int = 256 (optional)
The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means.
This helps in balancing the clusters.
nbits: int = 4 (optional)
The number of bits to use for product quantization.
This parameter controls the compression of your embeddings, impacting both index size and search speed.
Lower values mean more compression and potentially faster searches but can reduce accuracy.
Searching the Index
The search method lets you query the created index with your query embeddings and retrieve the most relevant documents.
def search(
self,
queries_embeddings: torch.Tensor,
top_k: int = 10,
batch_size: int = 1 << 18,
n_full_scores: int = 8192,
n_ivf_probe: int = 8,
show_progress: bool = True,
) -> list[list[dict]]:
queries_embeddings: torch.Tensor
A PyTorch tensor representing the multi-vector embeddings of your queries.
Its shape should be `(num_queries, num_tokens_per_query, embedding_dimension)`.
top_k: int = 10 (optional)
The number of top-scoring documents to retrieve for each query.
batch_size: int = 1 << 18 (optional)
The internal batch size used for processing queries.
A larger batch size might improve throughput on powerful GPUs but can consume more memory.
n_full_scores: int = 8192 (optional)
The number of candidate documents for which full (re-ranked) scores are computed.
This is a crucial parameter for accuracy; higher values lead to more accurate results but increase computation.
n_ivf_probe: int = 8 (optional)
The number of inverted file list "probes" to perform during the search.
This parameter controls the number of clusters to search within the index for each query.
Higher values improve recall but increase search time.
show_progress: bool = True (optional)
If set to `True`, a progress bar will be displayed during the search operation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_plaid-1.0.2-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bea940816328596f65c3a43ce9913a17c1d3d703807ed04ce1bb4c26a12a4198
|
|
| MD5 |
6c26613cb660e797e0fceb5e40e0fc74
|
|
| BLAKE2b-256 |
64ba8b54db7a0d5bbfde8078e991f4ba33942966ac00124e5438dd6b17700358
|
File details
Details for the file fast_plaid-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6e613e76b9571282db3bd776f872d07090977cd964839fa4873d24783eab4c4
|
|
| MD5 |
c20ab5d5746c90f1243686885f454a6e
|
|
| BLAKE2b-256 |
d12693e502c3d1771cc9af26560df2d52f084fee99993511ca317b9b772b8553
|
File details
Details for the file fast_plaid-1.0.2-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bf88de24b19c5dff667d2f3dbb985a3ba81a30a0f922fb24a19a8e9c8c7e842
|
|
| MD5 |
7e3a42377eadd0e165cb3c1ca733a2f1
|
|
| BLAKE2b-256 |
7ae3e799c79616f3bf0c3af082ed89618c1f5ef22bdaf330d01e0e62ea369cc9
|
File details
Details for the file fast_plaid-1.0.2-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d5559abd74950c3e858ffe5fc08e8861acb522c14f12a8b5f96b85c048cbec6
|
|
| MD5 |
fee84fca73c4f1527c406e8dd7aa383f
|
|
| BLAKE2b-256 |
947a3bd7a63a70b6e36a57664a56a658acc53ff271592175a1f0795c22776d69
|
File details
Details for the file fast_plaid-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26eb3cea56d6bd5816e95e1e051b883f9c382c7789564e4fc1b40bbecbe5d472
|
|
| MD5 |
eaafa00d7c62d5f58abf9ac030a82296
|
|
| BLAKE2b-256 |
60d10fe18f6280f8dba463d5fefb7a6b62738f307c78703799d83501798a450b
|
File details
Details for the file fast_plaid-1.0.2-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71a2952994ad1bd85528ac61063bd55e1a139ea963c46071eaa267275d9aa4d4
|
|
| MD5 |
d01cab75598cac51f2efa7ae4e1ffe92
|
|
| BLAKE2b-256 |
502d668cd8659d3b99209d5b7f81008424a1b78fc5661536cc1b76b53d0bcb8f
|
File details
Details for the file fast_plaid-1.0.2-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38b0817ae9602ca6d93c3494b78aa192f5ca8b9f1ffc796179a02232f09f9564
|
|
| MD5 |
6b88919385252dbacbe4f2d6b6265ac4
|
|
| BLAKE2b-256 |
d8e433f6869c6a5cd7f3e667835d9d4bf3ed518da616e1b91eee63d2fb444711
|
File details
Details for the file fast_plaid-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e102e76ce8f1cdd57b285b2e0f3dcdd7ff8a13123d6e887934d2a70da2aa655
|
|
| MD5 |
a9490bb3f52d2178d6d6381d292b6a08
|
|
| BLAKE2b-256 |
784a5cf0975b2205a811d459f6e6c2d0e43fcfd0b5f7bd0a971d3091797042e7
|
File details
Details for the file fast_plaid-1.0.2-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb9226fd330d5cdc65fc63172c0e63ea2f6081f4cd20dcdf8ee3cbd090339471
|
|
| MD5 |
aa4aa0f93b4f8d610ff02885bdcd983a
|
|
| BLAKE2b-256 |
a5b6178a8fbbcd15f77ecf11cf7cb262f6a5fe1e1c0ea367931a70eaebf21da1
|
File details
Details for the file fast_plaid-1.0.2-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
201b5548074c5b35a6afc0cecf4811a5469171acffced687a0707c6a84f9f3b4
|
|
| MD5 |
b27573aaddbff28f07b512441eee4129
|
|
| BLAKE2b-256 |
9536db25d0af8088324bdd19802e0c3074e4802b8e89de27f8f7531503b96ac3
|
File details
Details for the file fast_plaid-1.0.2-cp39-cp39-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb5a2f08555d25ddda47de43941c92d9726920927f2937e36e6ecc7d9710c080
|
|
| MD5 |
154e309bea8f07bf7d52517d5a029b25
|
|
| BLAKE2b-256 |
c6b124804aff1ef61795637515b0e532c4db21a9b656a3debe19f30fe51663e8
|
File details
Details for the file fast_plaid-1.0.2-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: fast_plaid-1.0.2-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbbc4dc224f19443ef2cca7c2ae7fe03c8d891b12f8774b490341dc9d2f29024
|
|
| MD5 |
a5b475c4fcb167561ba287a6bcf33681
|
|
| BLAKE2b-256 |
c597adb48935089721bca30d5bcb80ba6da9abe6610eb381a5f67bfa3d7ebec8
|