Skip to main content

Fast, State of the Art Quantized Embedding Models

Project description

⚡️ What is FastEmbed?

FastEmbed is an easy to use -- lightweight, fast, Python library built for retrieval augmented generation. The default embedding supports "query" and "passage" prefixes for the input text.

  1. Light

    • Quantized model weights
    • ONNX Runtime for inference
    • No hidden dependencies on PyTorch or TensorFlow via Huggingface Transformers
  2. Accuracy/Recall

    • Better than OpenAI Ada-002
    • Default is Flag Embedding, which is top of the MTEB leaderboard
  3. Fast

    • About 2x faster than Huggingface (PyTorch) transformers on single queries
    • Lot faster for batches!
    • ONNX Runtime allows you to use dedicated runtimes for even higher throughput and lower latency

🚀 Installation

To install the FastEmbed library, pip works:

pip install fastembed

📖 Usage

from fastembed.embedding import FlagEmbedding as Embedding

documents: List[str] = [
    "passage: Hello, World!",
    "query: Hello, World!", # these are two different embedding
    "passage: This is an example passage.",
    # You can leave out the prefix but it's recommended
    "fastembed is supported by and maintained by Qdrant." 
]
embedding_model = Embedding(model_name="BAAI/bge-base-en", max_length=512) 
embeddings: List[np.ndarray] = list(embedding_model.embed(documents))

🚒 Under the hood

Why fast?

It's important we justify the "fast" in FastEmbed. FastEmbed is fast because:

  1. Quantized model weights
  2. ONNX Runtime which allows for inference on CPU, GPU, and other dedicated runtimes

Why light?

  1. No hidden dependencies on PyTorch or TensorFlow via Huggingface Transformers

Why accurate?

  1. Better than OpenAI Ada-002
  2. Top of the Embedding leaderboards e.g. MTEB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastembed-0.0.3.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

fastembed-0.0.3-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file fastembed-0.0.3.tar.gz.

File metadata

  • Download URL: fastembed-0.0.3.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Darwin/22.5.0

File hashes

Hashes for fastembed-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e37942a4474bd53919e7178b32f535d512b0e07c08ab07c85ed41257719932a0
MD5 89e3b78f2b5944ecbf14f490d75173eb
BLAKE2b-256 90abbab6a5b9949fbaab23923d2c68f0a25437d27868a0d7d723be22a90232c3

See more details on using hashes here.

File details

Details for the file fastembed-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: fastembed-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Darwin/22.5.0

File hashes

Hashes for fastembed-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 de0b4ef7f564e0904989499db32ed29bc89637b115d7b5df7106ffb04cd971cf
MD5 ad2d4988cc4f4474d766790ac370b9fc
BLAKE2b-256 a98158de8b7caddc5eebccb361e33d5574712851985890f4f2643c148182c353

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page