Skip to main content

Fast, State of the Art Quantized Embedding Models

Project description

⚡️ What is FastEmbed?

FastEmbed is an easy to use -- lightweight, fast, Python library built for retrieval augmented generation. The default embedding supports "query" and "passage" prefixes for the input text.

  1. Light

    • Quantized model weights
    • ONNX Runtime for inference
    • No hidden dependencies on PyTorch or TensorFlow via Huggingface Transformers
  2. Accuracy/Recall

    • Better than OpenAI Ada-002
    • Default is Flag Embedding, which is top of the MTEB leaderboard
  3. Fast

    • About 2x faster than Huggingface (PyTorch) transformers on single queries
    • Lot faster for batches!
    • ONNX Runtime allows you to use dedicated runtimes for even higher throughput and lower latency

🚀 Installation

To install the FastEmbed library, pip works:

pip install fastembed

📖 Usage

from fastembed.embedding import FlagEmbedding as Embedding

documents: List[str] = [
    "passage: Hello, World!",
    "query: Hello, World!", # these are two different embedding
    "passage: This is an example passage.",
    # You can leave out the prefix but it's recommended
    "fastembed is supported by and maintained by Qdrant." 
]
embedding_model = Embedding(model_name="BAAI/bge-base-en", max_length=512) 
embeddings: List[np.ndarray] = list(embedding_model.embed(documents))

🚒 Under the hood

Why fast?

It's important we justify the "fast" in FastEmbed. FastEmbed is fast because:

  1. Quantized model weights
  2. ONNX Runtime which allows for inference on CPU, GPU, and other dedicated runtimes

Why light?

  1. No hidden dependencies on PyTorch or TensorFlow via Huggingface Transformers

Why accurate?

  1. Better than OpenAI Ada-002
  2. Top of the Embedding leaderboards e.g. MTEB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastembed-0.0.4.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

fastembed-0.0.4-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file fastembed-0.0.4.tar.gz.

File metadata

  • Download URL: fastembed-0.0.4.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Darwin/22.5.0

File hashes

Hashes for fastembed-0.0.4.tar.gz
Algorithm Hash digest
SHA256 dab2f0f0370e7cbd2965fe4206957a5dffffd0c8c07fa90bdfd50d7743062856
MD5 b0d1024f9291b016489c47f31fa8ef9e
BLAKE2b-256 f513f0a82b61c4f49890417d9ab346f074abfe8241e1b74cdb7ebebd6e638469

See more details on using hashes here.

File details

Details for the file fastembed-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: fastembed-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.9 Darwin/22.5.0

File hashes

Hashes for fastembed-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3ea83bf164f4e2875e53b2795ce79986d99f8ecbc62bfb26d82592fbe87e0858
MD5 54452d8b50b6c0577477231b406529b4
BLAKE2b-256 632e9f4697bf6ef11f4aa124474dc297f5ff4434c381008e32b2e8ab0bb61ed7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page