Skip to main content

Fast Plaid.

Project description

FastPlaid

rust PyO₃ tch-rs

 

FastPlaid - A High-Performance Engine for Multi-Vector Search

 

⭐️ Overview

Traditional vector search relies on single, fixed-size embeddings (dense vectors) for documents and queries. While powerful, this approach can lose nuanced, token-level details.

  • Multi-vector search, used in models like ColBERT or ColPali, replaces a single document or image vector with a set of per-token vectors. This enables a "late interaction" mechanism, where fine-grained similarity is calculated term-by-term to boost retrieval accuracy.

  • Higher Accuracy: By matching at a granular, token-level, FastPlaid captures subtle relevance that single-vector models simply miss.

  • PLAID: stands for Per-Token Late Interaction Dense Search.

  • Blazing Performance: Engineered in Rust and optimized for GPUs.

 

💻 Installation

pip install fast-plaid

 

⚡️ Quick Start

Get started with creating an index and performing a search in just a few lines of Python.

import torch

from fast_plaid import search

fast_plaid = search.FastPlaid(index="index")

embedding_dim = 128

# Index 100 documents, each with 300 tokens, each token is a 128-dim vector.
fast_plaid.create(
    documents_embeddings=[torch.randn(300, embedding_dim) for _ in range(100)]
)

# Search for 2 queries, each with 50 tokens, each token is a 128-dim vector
scores = fast_plaid.search(
    queries_embeddings=torch.randn(2, 50, embedding_dim),
    top_k=10,
)

print(scores)

The output will be a list of lists, where each inner list contains tuples of (document_index, similarity_score) for the top top_k results for each query:

[
    [
        (20, 1334.55),
        (91, 1299.57),
        (59, 1285.78),
        (10, 1273.53),
        (62, 1267.96),
        (44, 1265.55),
        (15, 1264.42),
        (34, 1261.19),
        (19, 1261.05),
        (86, 1260.94),
    ],
    [
        (58, 1313.85),
        (75, 1313.82),
        (79, 1305.32),
        (61, 1304.45),
        (64, 1303.67),
        (68, 1302.98),
        (66, 1301.23),
        (65, 1299.78),
    ],
]

FastPlaid does not support index updates. Once an index is created, it is immutable. If you need to add or remove documents, you must create a new index. FastPlaid is optimized for GPUs but is compatible with CPUs.

 

📊 Benchmarks

FastPlaid significantly outperforms the original PLAID engine across various datasets, delivering comparable accuracy with faster indexing and query speeds.

                                   NDCG@10  Indexing Time (s) Queries per seconds (QPS)
dataset          size   library
arguana          8674   PLAID         0.46               4.30                     56.73
                        FastPlaid     0.46               4.72            155.25 (+174%)

fiqa             57638  PLAID         0.41              17.65                     48.13
                        FastPlaid     0.41              12.62            146.62 (+205%)

nfcorpus         3633   PLAID         0.37               2.30                     78.31
                        FastPlaid     0.37               2.10            243.42 (+211%)

quora            522931 PLAID         0.88              40.01                     43.06
                        FastPlaid     0.87              11.23            281.51 (+554%)

scidocs          25657  PLAID         0.19              13.32                     57.17
                        FastPlaid     0.18              10.86            157.47 (+175%)

scifact          5183   PLAID         0.74               3.43                     67.66
                        FastPlaid     0.75               3.16            190.08 (+181%)

trec-covid       171332 PLAID         0.84              69.46                     32.09
                        FastPlaid     0.83              45.19              54.11 (+69%)

webis-touche2020 382545 PLAID         0.25             128.11                     31.94
                        FastPlaid     0.24              74.50             70.15 (+120%)

All benchmarks were performed on an H100 GPU. It's important to note that PLAID relies on Just-In-Time (JIT) compilation. This means the very first execution can exhibit longer runtimes. To ensure our performance analysis is representative, we've excluded these initial JIT-affected runs from the reported results. In contrast, FastPlaid does not employ JIT compilation, so its performance on the first run is directly indicative of its typical execution speed.

 

📝 Citation

FastPlaid builds upon the groundbreaking work of the original PLAID engine Santhanam, Keshav, et al..

You can cite FastPlaid in your work as follows:

@misc{fastplaid2025,
  author = {Sourty, Raphaël},
  title = {FastPlaid: A High-Performance Engine for Multi-Vector Search},
  year = {2025},
  url = {https://github.com/lightonai/fast-plaid}
}

And for the original PLAID research:

@inproceedings{santhanam2022plaid,
  title={{PLAID}: an efficient engine for late interaction retrieval},
  author={Santhanam, Keshav and Khattab, Omar and Potts, Christopher and Zaharia, Matei},
  booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  pages={1747--1756},
  year={2022}
}

 

📖 FastPlaid Class

The FastPlaid class is the core component for building and querying multi-vector search indexes. It's designed for high performance, especially when leveraging GPUs.

Initialization

To create an instance of FastPlaid, you'll provide the directory where your index will be stored and specify the device(s) for computation.

class FastPlaid:
    def __init__(
        self,
        index: str,
        device: str | list[str] | None = None,
    ) -> None:
index: str
    The file path to the directory where your index will be saved or loaded from.

device: str | list[str] | None = None
    Specifies the device(s) to use for computation.
    - If None (default) and CUDA is available, it defaults to "cuda".
    - If CUDA is not available, it defaults to "cpu".
    - Can be a single device string (e.g., "cuda:0" or "cpu").
    - Can be a list of device strings (e.g., ["cuda:0", "cuda:1"]).
    - If multiple GPUs are specified and available, multiprocessing is automatically set up for parallel execution.
      Remember to include your code within an `if __name__ == "__main__":` block for proper multiprocessing behavior.

Creating an Index

The create method builds the multi-vector index from your document embeddings. It uses K-means clustering to organize your data for efficient retrieval.

    def create(
        self,
        documents_embeddings: list[torch.Tensor],
        kmeans_niters: int = 4,
        max_points_per_centroid: int = 256,
        nbits: int = 4,
    ) -> "FastPlaid":
documents_embeddings: list[torch.Tensor]
    A list where each element is a PyTorch tensor representing the multi-vector embedding for a single document.
    Each document's embedding should have a shape of `(num_tokens, embedding_dimension)`.

kmeans_niters: int = 4 (optional)
    The number of iterations for the K-means algorithm used during index creation.
    This influences the quality of the initial centroid assignments.

max_points_per_centroid: int = 256 (optional)
    The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means.
    This helps in balancing the clusters.

nbits: int = 4 (optional)
    The number of bits to use for product quantization.
    This parameter controls the compression of your embeddings, impacting both index size and search speed.
    Lower values mean more compression and potentially faster searches but can reduce accuracy.

Searching the Index

The search method lets you query the created index with your query embeddings and retrieve the most relevant documents.

    def search(
        self,
        queries_embeddings: torch.Tensor,
        top_k: int = 10,
        batch_size: int = 1 << 18,
        n_full_scores: int = 8192,
        n_ivf_probe: int = 8,
        show_progress: bool = True,
    ) -> list[list[dict]]:
queries_embeddings: torch.Tensor
    A PyTorch tensor representing the multi-vector embeddings of your queries.
    Its shape should be `(num_queries, num_tokens_per_query, embedding_dimension)`.

top_k: int = 10 (optional)
    The number of top-scoring documents to retrieve for each query.

batch_size: int = 1 << 18 (optional)
    The internal batch size used for processing queries.
    A larger batch size might improve throughput on powerful GPUs but can consume more memory.

n_full_scores: int = 8192 (optional)
    The number of candidate documents for which full (re-ranked) scores are computed.
    This is a crucial parameter for accuracy; higher values lead to more accurate results but increase computation.

n_ivf_probe: int = 8 (optional)
    The number of inverted file list "probes" to perform during the search.
    This parameter controls the number of clusters to search within the index for each query.
    Higher values improve recall but increase search time.

show_progress: bool = True (optional)
    If set to `True`, a progress bar will be displayed during the search operation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_plaid-1.0.3-cp312-cp312-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.12Windows x86-64

fast_plaid-1.0.3-cp312-cp312-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.3-cp312-cp312-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_plaid-1.0.3-cp311-cp311-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.11Windows x86-64

fast_plaid-1.0.3-cp311-cp311-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.3-cp311-cp311-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fast_plaid-1.0.3-cp310-cp310-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.10Windows x86-64

fast_plaid-1.0.3-cp310-cp310-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.3-cp310-cp310-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fast_plaid-1.0.3-cp39-cp39-win_amd64.whl (1.8 MB view details)

Uploaded CPython 3.9Windows x86-64

fast_plaid-1.0.3-cp39-cp39-manylinux_2_28_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

fast_plaid-1.0.3-cp39-cp39-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file fast_plaid-1.0.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2f44c0390e0daa2ad483fee2a884648bfbac04f28ff6c9d85828bfcedb8dc31a
MD5 49945994bac85a23d14f1b25cfd0f82e
BLAKE2b-256 ab2fd4cdddff298a2c53b52fcf3c6b10baa31f5b495637ef4ce97d9ba56f3198

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp312-cp312-win_amd64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 71b6151a15efa174865b35f3e2f41069b73e16e144fdcce6385d20d53101bf6f
MD5 dd6cc068017190c716dbab2d7e45b3f2
BLAKE2b-256 18937b04f0a3d0473de5b84e93fb1424a8055b47f22be60685fcd7ac2b82203e

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f11da08120db29aab9042d3a67b90a9fd806d5bdd36ceb934dd28abe43b66522
MD5 2ef3b7bd1c5fc430924518438d2a03c1
BLAKE2b-256 1a939f76dbc21f2d0a8d4d0085e3b617b248846756a370772338452739bd8707

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 99a03b76f6ce2c576ba043df5ccd47b1947da1573074b59e57368e87a573c206
MD5 4f73756b3de299a9a430cf633e1b45f5
BLAKE2b-256 5a13572076b0bbe6469463efcd53b522e1cd49525cfdce8b0b0a7004a16e0718

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp311-cp311-win_amd64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0132de61e1341bd78a1c2933145c97919f2c2d008fce8b2228c330152ec5c908
MD5 0ec984b58f2cfa9ea9ade9e6703a17dd
BLAKE2b-256 3f156b37bcb131bc1b5d841fe2c66dc8f8832c8e283149fdfed80fc45ffb01cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e72ebbf75d94f60f07cad60482c34c98131553527f72847d203e46da939ded6c
MD5 784d11789ef6b0b88a3263bea51c2bb6
BLAKE2b-256 1db9bf872a176f8d3610a7e7c96f7832facd94c2149745abaa62c75e060bc5de

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2db15b374063b1fc712cb040b2ed8a3e958ca02f743d923304293be94c8c01a4
MD5 332444a7ccecaaf73860d7c00b176901
BLAKE2b-256 e0507e261bfcddb3d0e9cd07df0ff06f87ef15f1bbe30da2d00fdfee3f7d4a2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp310-cp310-win_amd64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1be86893e7e44feb8ebb21612a2a5228ea8f29c2d466601afe26954aa4fe3aac
MD5 cdc6bae90e78cc087b927b8d8a069caa
BLAKE2b-256 46dea4527c34a13c4953e501066bd66094eff338173cae01f7a19360fcff2652

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f487f8f7da84e120acaa106da880a8733833ffc822359d199dfbe3a6214a661e
MD5 a74cac3cf9b7849d9605d3c9d5cf3ce3
BLAKE2b-256 bf966d3f1a560f3a0a18a188cf996d05fde971ff952c3a2168dcf8af35600687

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: fast_plaid-1.0.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for fast_plaid-1.0.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 089b28d10b5fa93d99937288a77e003ac9ed3fec4a8e3ce4e537e77e706c5e2b
MD5 0603d6ceb8ed4a9a833dbeb70b44f9dc
BLAKE2b-256 38855c716e816beb4f722acd4aa972998bb9d5b09eb0cc1255774d8c8b94f31e

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp39-cp39-win_amd64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7c541e79db9150da94f4055b7f8f6265c19b4d67a0e2d8e2de86f20bd2a3342f
MD5 7a685b8c04e13b270d7c5c77ab81c8ae
BLAKE2b-256 bd816d0d506f66bd129b6a52a1f5696edcadf438edf1368dda9b8c78b2b1727a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp39-cp39-manylinux_2_28_x86_64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plaid-1.0.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plaid-1.0.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cb547c847888b40e2b031a43981acd72145f9f220577e1103443247ca47b9ad7
MD5 3eca1735e728cc9a921ed7a692085800
BLAKE2b-256 ed7781aafa053ffa2060b3247f4fed6166ff3fd1f0c639e089fa5c74d841c104

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plaid-1.0.3-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: publish.yaml on lightonai/fast-plaid

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page