Skip to main content

Fast Plaid.

Project description

FastPlaid

rust PyO₃ tch-rs

 

FastPlaid - A High-Performance Engine for Multi-Vector Search

 

⭐️ Overview

Traditional vector search relies on single, fixed-size embeddings (dense vectors) for documents and queries. While powerful, this approach can lose nuanced, token-level details.

  • Multi-vector search, used in models like ColBERT or ColPali, replaces a single document or image vector with a set of per-token vectors. This enables a "late interaction" mechanism, where fine-grained similarity is calculated term-by-term to boost retrieval accuracy.

  • Higher Accuracy: By matching at a granular, token-level, FastPlaid captures subtle relevance that single-vector models simply miss.

  • PLAID: stands for Per-Token Late Interaction Dense Search.

  • Blazing Performance: Engineered in Rust and optimized for GPUs.

 

💻 Installation

pip install fast-plaid

 

⚡️ Quick Start

Get started with creating an index and performing a search in just a few lines of Python.

import torch

from fast_plaid import search

fast_plaid = search.FastPlaid(index="index")

embedding_dim = 128

# Index 100 documents, each with 300 tokens, each token is a 128-dim vector.
fast_plaid.create(
    documents_embeddings=[torch.randn(300, embedding_dim) for _ in range(100)] 
)

# Search for 2 queries, each with 50 tokens, each token is a 128-dim vector
scores = fast_plaid.search(
    queries_embeddings=torch.randn(2, 50, embedding_dim),
    top_k=10,
)

print(scores)

The output will be a list of lists, where each inner list contains tuples of (document_index, similarity_score) for the top top_k results for each query:

[
    [
        (20, 1334.55),
        (91, 1299.57),
        (59, 1285.78),
        (10, 1273.53),
        (62, 1267.96),
        (44, 1265.55),
        (15, 1264.42),
        (34, 1261.19),
        (19, 1261.05),
        (86, 1260.94),
    ],
    [
        (58, 1313.85),
        (75, 1313.82),
        (79, 1305.32),
        (61, 1304.45),
        (64, 1303.67),
        (68, 1302.98),
        (66, 1301.23),
        (65, 1299.78),
    ],
]

FastPlaid does not support index updates. Once an index is created, it is immutable. If you need to add or remove documents, you must create a new index. FastPlaid is optimized for GPUs but is compatible with CPUs.

 

📊 Benchmarks

FastPlaid significantly outperforms the original PLAID engine across various datasets, delivering comparable accuracy with faster indexing and query speeds.

                                   NDCG@10  Indexing Time (s) Queries per seconds (QPS)
dataset          size   library
arguana          8674   PLAID         0.46               4.30                     56.73
                        FastPlaid     0.46               4.72            155.25 (+174%)

fiqa             57638  PLAID         0.41              17.65                     48.13
                        FastPlaid     0.41              12.62            146.62 (+205%)

nfcorpus         3633   PLAID         0.37               2.30                     78.31
                        FastPlaid     0.37               2.10            243.42 (+211%)

quora            522931 PLAID         0.88              40.01                     43.06
                        FastPlaid     0.87              11.23            281.51 (+554%)

scidocs          25657  PLAID         0.19              13.32                     57.17
                        FastPlaid     0.18              10.86            157.47 (+175%)

scifact          5183   PLAID         0.74               3.43                     67.66
                        FastPlaid     0.75               3.16            190.08 (+181%)

trec-covid       171332 PLAID         0.84              69.46                     32.09
                        FastPlaid     0.83              45.19              54.11 (+69%)

webis-touche2020 382545 PLAID         0.25             128.11                     31.94
                        FastPlaid     0.24              74.50             70.15 (+120%)

All benchmarks were performed on an H100 GPU. It's important to note that PLAID relies on Just-In-Time (JIT) compilation. This means the very first execution can exhibit longer runtimes. To ensure our performance analysis is representative, we've excluded these initial JIT-affected runs from the reported results. In contrast, FastPlaid does not employ JIT compilation, so its performance on the first run is directly indicative of its typical execution speed.

 

📝 Citation

FastPlaid builds upon the groundbreaking work of the original PLAID engine Santhanam, Keshav, et al..

You can cite FastPlaid in your work as follows:

@misc{fastplaid2025,
  author = {Sourty, Raphaël},
  title = {FastPlaid: A High-Performance Engine for Multi-Vector Search},
  year = {2025},
  url = {https://github.com/lightonai/fast-plaid}
}

And for the original PLAID research:

@inproceedings{santhanam2022plaid,
  title={{PLAID}: an efficient engine for late interaction retrieval},
  author={Santhanam, Keshav and Khattab, Omar and Potts, Christopher and Zaharia, Matei},
  booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  pages={1747--1756},
  year={2022}
}

 

📖 FastPlaid Class

The FastPlaid class is the core component for building and querying multi-vector search indexes. It's designed for high performance, especially when leveraging GPUs.

Initialization

To create an instance of FastPlaid, you'll provide the directory where your index will be stored and specify the device(s) for computation.

class FastPlaid:
    def __init__(
        self,
        index: str,
        device: str | list[str] | None = None,
    ) -> None:
index: str
    The file path to the directory where your index will be saved or loaded from.

device: str | list[str] | None = None
    Specifies the device(s) to use for computation.
    - If None (default) and CUDA is available, it defaults to "cuda".
    - If CUDA is not available, it defaults to "cpu".
    - Can be a single device string (e.g., "cuda:0" or "cpu").
    - Can be a list of device strings (e.g., ["cuda:0", "cuda:1"]).
    - If multiple GPUs are specified and available, multiprocessing is automatically set up for parallel execution.
      Remember to include your code within an `if __name__ == "__main__":` block for proper multiprocessing behavior.

Creating an Index

The create method builds the multi-vector index from your document embeddings. It uses K-means clustering to organize your data for efficient retrieval.

    def create(
        self,
        documents_embeddings: list[torch.Tensor],
        kmeans_niters: int = 4,
        max_points_per_centroid: int = 256,
        nbits: int = 4,
    ) -> "FastPlaid":
documents_embeddings: list[torch.Tensor]
    A list where each element is a PyTorch tensor representing the multi-vector embedding for a single document.
    Each document's embedding should have a shape of `(num_tokens, embedding_dimension)`.

kmeans_niters: int = 4 (optional)
    The number of iterations for the K-means algorithm used during index creation.
    This influences the quality of the initial centroid assignments.

max_points_per_centroid: int = 256 (optional)
    The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means.
    This helps in balancing the clusters.

nbits: int = 4 (optional)
    The number of bits to use for product quantization.
    This parameter controls the compression of your embeddings, impacting both index size and search speed.
    Lower values mean more compression and potentially faster searches but can reduce accuracy.

Searching the Index

The search method lets you query the created index with your query embeddings and retrieve the most relevant documents.

    def search(
        self,
        queries_embeddings: torch.Tensor,
        top_k: int = 10,
        batch_size: int = 1 << 18,
        n_full_scores: int = 8192,
        n_ivf_probe: int = 8,
        show_progress: bool = True,
    ) -> list[list[dict]]:
queries_embeddings: torch.Tensor
    A PyTorch tensor representing the multi-vector embeddings of your queries.
    Its shape should be `(num_queries, num_tokens_per_query, embedding_dimension)`.

top_k: int = 10 (optional)
    The number of top-scoring documents to retrieve for each query.

batch_size: int = 1 << 18 (optional)
    The internal batch size used for processing queries.
    A larger batch size might improve throughput on powerful GPUs but can consume more memory.

n_full_scores: int = 8192 (optional)
    The number of candidate documents for which full (re-ranked) scores are computed.
    This is a crucial parameter for accuracy; higher values lead to more accurate results but increase computation.

n_ivf_probe: int = 8 (optional)
    The number of inverted file list "probes" to perform during the search.
    This parameter controls the number of clusters to search within the index for each query.
    Higher values improve recall but increase search time.

show_progress: bool = True (optional)
    If set to `True`, a progress bar will be displayed during the search operation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_plaid-1.0.2-cp312-cp312-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.12Windows x86-64

fast_plaid-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.2-cp312-cp312-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_plaid-1.0.2-cp311-cp311-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.11Windows x86-64

fast_plaid-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.2-cp311-cp311-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fast_plaid-1.0.2-cp310-cp310-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.10Windows x86-64

fast_plaid-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.2-cp310-cp310-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fast_plaid-1.0.2-cp39-cp39-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.9Windows x86-64

fast_plaid-1.0.2-cp39-cp39-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.2-cp39-cp39-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file fast_plaid-1.0.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bea940816328596f65c3a43ce9913a17c1d3d703807ed04ce1bb4c26a12a4198
MD5 6c26613cb660e797e0fceb5e40e0fc74
BLAKE2b-256 64ba8b54db7a0d5bbfde8078e991f4ba33942966ac00124e5438dd6b17700358

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e6e613e76b9571282db3bd776f872d07090977cd964839fa4873d24783eab4c4
MD5 c20ab5d5746c90f1243686885f454a6e
BLAKE2b-256 d12693e502c3d1771cc9af26560df2d52f084fee99993511ca317b9b772b8553

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8bf88de24b19c5dff667d2f3dbb985a3ba81a30a0f922fb24a19a8e9c8c7e842
MD5 7e3a42377eadd0e165cb3c1ca733a2f1
BLAKE2b-256 7ae3e799c79616f3bf0c3af082ed89618c1f5ef22bdaf330d01e0e62ea369cc9

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4d5559abd74950c3e858ffe5fc08e8861acb522c14f12a8b5f96b85c048cbec6
MD5 fee84fca73c4f1527c406e8dd7aa383f
BLAKE2b-256 947a3bd7a63a70b6e36a57664a56a658acc53ff271592175a1f0795c22776d69

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 26eb3cea56d6bd5816e95e1e051b883f9c382c7789564e4fc1b40bbecbe5d472
MD5 eaafa00d7c62d5f58abf9ac030a82296
BLAKE2b-256 60d10fe18f6280f8dba463d5fefb7a6b62738f307c78703799d83501798a450b

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 71a2952994ad1bd85528ac61063bd55e1a139ea963c46071eaa267275d9aa4d4
MD5 d01cab75598cac51f2efa7ae4e1ffe92
BLAKE2b-256 502d668cd8659d3b99209d5b7f81008424a1b78fc5661536cc1b76b53d0bcb8f

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 38b0817ae9602ca6d93c3494b78aa192f5ca8b9f1ffc796179a02232f09f9564
MD5 6b88919385252dbacbe4f2d6b6265ac4
BLAKE2b-256 d8e433f6869c6a5cd7f3e667835d9d4bf3ed518da616e1b91eee63d2fb444711

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4e102e76ce8f1cdd57b285b2e0f3dcdd7ff8a13123d6e887934d2a70da2aa655
MD5 a9490bb3f52d2178d6d6381d292b6a08
BLAKE2b-256 784a5cf0975b2205a811d459f6e6c2d0e43fcfd0b5f7bd0a971d3091797042e7

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bb9226fd330d5cdc65fc63172c0e63ea2f6081f4cd20dcdf8ee3cbd090339471
MD5 aa4aa0f93b4f8d610ff02885bdcd983a
BLAKE2b-256 a5b6178a8fbbcd15f77ecf11cf7cb262f6a5fe1e1c0ea367931a70eaebf21da1

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 201b5548074c5b35a6afc0cecf4811a5469171acffced687a0707c6a84f9f3b4
MD5 b27573aaddbff28f07b512441eee4129
BLAKE2b-256 9536db25d0af8088324bdd19802e0c3074e4802b8e89de27f8f7531503b96ac3

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bb5a2f08555d25ddda47de43941c92d9726920927f2937e36e6ecc7d9710c080
MD5 154e309bea8f07bf7d52517d5a029b25
BLAKE2b-256 c6b124804aff1ef61795637515b0e532c4db21a9b656a3debe19f30fe51663e8

See more details on using hashes here.

File details

Details for the file fast_plaid-1.0.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dbbc4dc224f19443ef2cca7c2ae7fe03c8d891b12f8774b490341dc9d2f29024
MD5 a5b475c4fcb167561ba287a6bcf33681
BLAKE2b-256 c597adb48935089721bca30d5bcb80ba6da9abe6610eb381a5f67bfa3d7ebec8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page