Fast, light, accurate library built for retrieval embedding generation
Project description
⚡️ What is Cosdata-FastEmbed?
Notice: This is a maintained fork of Qdrant's fastembed library, now supported by Cosdata.
cosdata-fastembed will continue to evolve with planned improvements and integrations specific to Cosdata's ecosystem, while maintaining compatibility with the original fastembed project.
Credit: Huge thanks to the Qdrant team for their foundational work on fastembed. This fork would not exist without their efforts.
Cosdata-FastEmbed is a lightweight, fast Python library built for embedding generation. It supports a variety of popular text models. Please open a GitHub issue if you want us to add a new model or have feature requests.
The default text embedding (TextEmbedding) model is Flag Embedding, presented in the MTEB leaderboard. It supports "query" and "passage" prefixes for the input text. Here is an example for Retrieval Embedding Generation and how to use FastEmbed with Qdrant.
📈 Why FastEmbed?
-
Light: FastEmbed is a lightweight library with few external dependencies. We don't require a GPU and don't download GBs of PyTorch dependencies, and instead use the ONNX Runtime. This makes it a great candidate for serverless runtimes like AWS Lambda.
-
Fast: FastEmbed is designed for speed. We use the ONNX Runtime, which is faster than PyTorch. We also use data parallelism for encoding large datasets.
-
Accurate: FastEmbed is better than OpenAI Ada-002. We also support an ever-expanding set of models, including a few multilingual models.
🚀 Installation
To install the Cosdata-FastEmbed library, pip works best. You can install it with or without GPU support:
pip install cosdata-fastembed
# or with GPU support
pip install cosdata-fastembed-gpu
📖 Quickstart
from fastembed import TextEmbedding
# Example list of documents
documents: list[str] = [
"This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
"fastembed is supported by and maintained by Qdrant.",
]
# This will trigger the model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")
embeddings_generator = embedding_model.embed(documents) # reminder this is a generator
embeddings_list = list(embedding_model.embed(documents))
# you can also convert the generator to a list, and that to a numpy array
len(embeddings_list[0]) # Vector of 384 dimensions
Fastembed supports a variety of models for different tasks and modalities. The list of all the available models can be found here
🎒 Dense text embeddings
from fastembed import TextEmbedding
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))
# [
# array([-0.1115, 0.0097, 0.0052, 0.0195, ...], dtype=float32),
# array([-0.1019, 0.0635, -0.0332, 0.0522, ...], dtype=float32)
# ]
Dense text embedding can also be extended with models which are not in the list of supported models.
from fastembed import TextEmbedding
from fastembed.common.model_description import PoolingType, ModelSource
TextEmbedding.add_custom_model(
model="intfloat/multilingual-e5-small",
pooling=PoolingType.MEAN,
normalization=True,
sources=ModelSource(hf="intfloat/multilingual-e5-small"), # can be used with an `url` to load files from a private storage
dim=384,
model_file="onnx/model.onnx", # can be used to load an already supported model with another optimization or quantization, e.g. onnx/model_O4.onnx
)
model = TextEmbedding(model_name="intfloat/multilingual-e5-small")
embeddings = list(model.embed(documents))
🔱 Sparse text embeddings
- SPLADE++
from fastembed import SparseTextEmbedding
model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))
# [
# SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]),
# SparseEmbedding(indices=[ 38, 12, 91, ... ], values=[0.11, 0.22, 0.39, ...])
# ]
🦥 Late interaction models (aka ColBERT)
from fastembed import LateInteractionTextEmbedding
model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))
# [
# array([
# [-0.1115, 0.0097, 0.0052, 0.0195, ...],
# [-0.1019, 0.0635, -0.0332, 0.0522, ...],
# ]),
# array([
# [-0.9019, 0.0335, -0.0032, 0.0991, ...],
# [-0.2115, 0.8097, 0.1052, 0.0195, ...],
# ]),
# ]
🖼️ Image embeddings
from fastembed import ImageEmbedding
images = [
"./path/to/image1.jpg",
"./path/to/image2.jpg",
]
model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))
# [
# array([-0.1115, 0.0097, 0.0052, 0.0195, ...], dtype=float32),
# array([-0.1019, 0.0635, -0.0332, 0.0522, ...], dtype=float32)
# ]
Late interaction multimodal models (ColPali)
from fastembed import LateInteractionMultimodalEmbedding
doc_images = [
"./path/to/qdrant_pdf_doc_1_screenshot.jpg",
"./path/to/colpali_pdf_doc_2_screenshot.jpg",
]
query = "What is Qdrant?"
model = LateInteractionMultimodalEmbedding(model_name="Qdrant/colpali-v1.3-fp16")
doc_images_embeddings = list(model.embed_image(doc_images))
# shape (2, 1030, 128)
# [array([[-0.03353882, -0.02090454, ..., -0.15576172, -0.07678223]], dtype=float32)]
query_embedding = model.embed_text(query)
# shape (1, 20, 128)
# [array([[-0.00218201, 0.14758301, ..., -0.02207947, 0.16833496]], dtype=float32)]
🔄 Rerankers
from fastembed.rerank.cross_encoder import TextCrossEncoder
query = "Who is maintaining Qdrant?"
documents: list[str] = [
"This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
"fastembed is supported by and maintained by Qdrant.",
]
encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank(query, documents))
# [-11.48061752319336, 5.472434997558594]
Text cross encoders can also be extended with models which are not in the list of supported models.
from fastembed.rerank.cross_encoder import TextCrossEncoder
from fastembed.common.model_description import ModelSource
TextCrossEncoder.add_custom_model(
model="Xenova/ms-marco-MiniLM-L-4-v2",
model_file="onnx/model.onnx",
sources=ModelSource(hf="Xenova/ms-marco-MiniLM-L-4-v2"),
)
model = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-4-v2")
scores = list(model.rerank_pairs(
[("What is AI?", "Artificial intelligence is ..."), ("What is ML?", "Machine learning is ..."),]
))
⚡️ FastEmbed on a GPU
FastEmbed supports running on GPU devices.
It requires installation of the fastembed-gpu package.
pip install fastembed-gpu
Check our example for detailed instructions, CUDA 12.x support and troubleshooting of the common issues.
from fastembed import TextEmbedding
embedding_model = TextEmbedding(
model_name="BAAI/bge-small-en-v1.5",
providers=["CUDAExecutionProvider"]
)
print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.")
Usage
Cosdata-FastEmbed supports a variety of models for different tasks and modalities. The list of all the available models can be found here.
pip install cosdata-fastembed
# or with GPU support
pip install cosdata-fastembed-gpu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosdata_fastembed_gpu-0.7.1.tar.gz.
File metadata
- Download URL: cosdata_fastembed_gpu-0.7.1.tar.gz
- Upload date:
- Size: 59.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd4032c4f9503897dda184e526e9e53c40023d0abd686787963914a40716278a
|
|
| MD5 |
4b101f8a1205878f6c5f0d5cb69a71be
|
|
| BLAKE2b-256 |
4e0b9f3aad4ce7fa3a55d1f16ac51fdc4b3250e031830378ec18cf6dded3b39e
|
File details
Details for the file cosdata_fastembed_gpu-0.7.1-py3-none-any.whl.
File metadata
- Download URL: cosdata_fastembed_gpu-0.7.1-py3-none-any.whl
- Upload date:
- Size: 99.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.2 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80abf579c761197769ced076ffdd2aa9f255dadebe31745736e4cc6348bf8851
|
|
| MD5 |
42e18e1934a0e6472b40cdfefcec0fe0
|
|
| BLAKE2b-256 |
ba567c63ebb32f953fb764ea521f606ead93b6dfc1330be39b65f5ab69b38306
|