A gguf embeddings plugin for OVOS
Project description
GGUFTextEmbeddingsPlugin
The GGUFTextEmbeddingsPlugin
is a plugin for recognizing and managing text embeddings.
It integrates with ovos-chromadb-embeddings-plugin for storing and retrieving text embeddings.
This plugin leverages the llama-cpp-python
library to generate text embeddings.
GGUF models are used to keep 3rd party dependencies to a minimum and ensuring this solver is lightweight and suitable for low powered hardware
Features
- Text Embeddings Extraction: Converts text into embeddings using the
llama_cpp
model. - Text Data Storage: Stores and retrieves text embeddings using
ChromaEmbeddingsDB
. - Text Data Management: Allows for adding, querying, and deleting text embeddings associated with documents.
Suggested Models
You can specify a downloaded model path, or use one of the pre-defined model strings in the table below.
If needed a model will be automatically downloaded to ~/.cache/gguf_models
Model Name | URL | Description | Suggested Use Cases |
---|---|---|---|
all-MiniLM-L6-v2 | Link | A sentence-transformers model that maps sentences & paragraphs to a 384-dimensional dense vector space. Fine-tuned on a 1B sentence pairs dataset using contrastive learning. Ideal for general-purpose tasks like information retrieval, clustering, and sentence similarity. | Suitable for tasks that require fast inference and can handle slightly less accuracy, such as real-time applications. |
all-MiniLM-L12-v2 | Link | A larger MiniLM model mapping sentences & paragraphs to a 384-dimensional dense vector space. Fine-tuned on a 1B sentence pairs dataset using contrastive learning. Provides higher accuracy for complex tasks. | Suitable for more complex NLP tasks requiring higher accuracy, such as detailed semantic analysis, document ranking, and clustering. |
multi-qa-MiniLM-L6-cos-v1 | Link | A sentence-transformers model mapping sentences & paragraphs to a 384-dimensional dense vector space, trained on 215M QA pairs. Designed for semantic search. | Best for semantic search, encoding queries/questions, and finding relevant documents or passages in QA tasks. |
gist-all-minilm-l6-v2 | Link | Enhanced version of all-MiniLM-L6-v2 using GISTEmbed method, improving in-batch negative selection during training. Demonstrates state-of-the-art performance on specific tasks with a focus on reducing data noise and improving model fine-tuning. | Ideal for high-accuracy retrieval tasks, semantic search, and applications requiring efficient smaller models with robust performance, such as resource-constrained environments. |
paraphrase-multilingual-minilm-l12-v2 | Link | A sentence-transformers model mapping sentences & paragraphs to a 384-dimensional dense vector space. Supports multiple languages, optimized for paraphrasing tasks. | Perfect for multilingual applications, translation services, and tasks requiring paraphrase detection and generation. |
e5-small-v2 | Link | Text Embeddings by Weakly-Supervised Contrastive Pre-training. This model has 12 layers and the embedding size is 384. Size is about 30MB. | Ideal for applications requiring efficient, small-sized models with robust text embeddings. |
gte-small | Link | General Text Embeddings (GTE) model. Trained using multi-stage contrastive learning by Alibaba DAMO Academy. Based on the BERT framework, it covers a wide range of domains and scenarios. About 30MB. | Suitable for information retrieval, semantic textual similarity, text reranking, and various other downstream tasks requiring text embeddings. |
gte-base | Link | Larger version of previous model, about 75 MB | |
gte-large | Link | Larger version of previous model, about 220 MB | |
snowflake-arctic-embed-l | Link | Part of the snowflake-arctic-embed suite, this model focuses on high-quality retrieval and achieves state-of-the-art performance on the MTEB/BEIR leaderboard. Trained using a multi-stage pipeline with a mix of public and proprietary data. About 215MB. | Optimized for high-performance text retrieval tasks and achieving top accuracy in retrieval benchmarks. |
snowflake-arctic-embed-m | Link | Based on the intfloat/e5-base-unsupervised model, this medium-sized model balances high retrieval performance with efficient inference. About 75MB | Ideal for general-purpose retrieval tasks requiring a balance between performance and efficiency. |
snowflake-arctic-embed-m.long | Link | Based on the nomic-ai/nomic-embed-text-v1-unsupervised model, this long-context variant supports up to 2048 tokens without RPE and up to 8192 tokens with RPE. Perfect for long-context workloads. About 90MB | Suitable for tasks requiring long-context embeddings, such as complex document analysis or extensive information retrieval. |
snowflake-arctic-embed-s | Link | Based on the intfloat/e5-small-unsupervised model, this small model offers high retrieval accuracy with only 33M parameters and 384 dimensions. | Suitable for applications needing efficient, high-accuracy retrieval in constrained environments. |
snowflake-arctic-embed-xs | Link | Based on the all-MiniLM-L6-v2 model, this tiny model has only 22M parameters and 384 dimensions, providing a balance of low latency and high retrieval accuracy. | Best for ultra-low latency applications with strict size and cost constraints. |
nomic-embed-text-v1.5 | Link | About 85MB. Resizable Production Embeddings with Matryoshka Representation Learning. The model is trained in two stages, starting with unsupervised contrastive learning on weakly related text pairs, followed by finetuning with high-quality labeled datasets. It is now multimodal, aligning with nomic-embed-vision-v1 | Ideal for applications requiring flexible embedding sizes and multimodal capabilities. |
uae-large-v1 | Link | Universal AnglE Embedding. AnglE-optimized Text Embeddings with a novel angle optimization approach. About 220MB. | Best for high-quality text embeddings in semantic textual similarity tasks, including short-text and long-text STS. |
labse | Link | A port of the LaBSE model. Maps 109 languages to a shared vector space, supports up to 512 tokens of context. The model is optimized for producing similar representations for bilingual sentence pairs. About 390MB. | Suitable for multilingual applications, translation mining, and cross-lingual text embedding tasks. |
bge-large-en-v1.5 | Link | The model is part of the BGE series and is designed for diverse retrieval tasks. Size is 216MB. | |
bge-base-en-v1.5 | Link | Medium version of the above. About 80MB | |
bge-small-en-v1.5 | Link | Small version of the above. About 30MB | |
gist-embedding-v0 | Link | GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Fine-tuned on top of the BAAI/bge-base-en-v1.5 using the MEDI dataset augmented with mined triplets from the MTEB Classification training dataset. | Ideal for applications requiring embeddings without crafting instructions for queries. |
gist-large-embedding-v0 | Link | Large version of the model above | |
gist-small-embedding-v0 | Link | Small version of the model above | |
mxbai-embed-large-v1 | Link | trained using AnglE loss on our high-quality large scale data. It achieves SOTA performance on BERT-large scale. About 220MB. | Best for tasks requiring high precision and detailed embeddings. Provides state-of-the-art performance among efficiently sized models. |
acge_text_embedding | Link | The ACGE model is developed by the Huhu Information Technology team on the TextIn platform. It is a general-purpose text encoding model that uses Matryoshka Representation Learning for variable-length vectorization. About 200MB | Ideal for chinese text |
gte-Qwen2-7B-instruct | Link | The latest in the GTE model family, ranking No.1 in English and Chinese evaluations on the MTEB benchmark. Based on the Qwen2-7B LLM model, it integrates bidirectional attention mechanisms and instruction tuning, with comprehensive multilingual training. 4.68GB | Best for high-performance multilingual text embeddings and complex tasks requiring top-tier contextual understanding. |
gte-Qwen2-1.5B-instruct | Link | gte-Qwen2-1.5B-instruct is the latest model in the gte (General Text Embedding) model family. The model is built on Qwen2-1.5B LLM model and use the same training data and strategies as the gte-Qwen2-7B-instruct model. 1.12GB |
By default paraphrase-multilingual-minilm-l12-v2
will be used if model is not specified
Usage
Here is a quick example of how to use the GGUFTextEmbeddingsPlugin
:
from ovos_gguf_embeddings import GGUFTextEmbeddingsStore
from ovos_chromadb_embeddings import ChromaEmbeddingsDB
db = ChromaEmbeddingsDB("./my_db")
gguf = GGUFTextEmbeddingsStore(db, model=f"all-MiniLM-L6-v2.Q4_K_M.gguf")
corpus = [
"a cat is a feline and likes to purr",
"a dog is the human's best friend and loves to play",
"a bird is a beautiful animal that can fly",
"a fish is a creature that lives in water and swims",
]
for s in corpus:
gguf.add_document(s)
docs = gguf.query_document("does the fish purr like a cat?", top_k=2)
print(docs)
# [('a cat is a feline and likes to purr', 0.6548102001030748),
# ('a fish is a creature that lives in water and swims', 0.5436657174406345)]
CLI Interface
$ovos-gguf-embeddings --help
Usage: ovos-gguf-embeddings [OPTIONS] COMMAND [ARGS]...
CLI for interacting with the GGUF Text Embeddings Store.
Options:
--help Show this message and exit.
Commands:
add-document Add a document to the embeddings store.
delete-document Delete a document from the embeddings store.
query-document Query the embeddings store to find similar documents...
$ovos-gguf-embeddings add-document --help
Usage: ovos-gguf-embeddings add-document [OPTIONS] DOCUMENT
Add a document to the embeddings store.
DOCUMENT: The document string or file path to be added to the store.
FROM-FILE: Flag indicating whether the DOCUMENT argument is a file path. If
set, the file is read and processed.
USE-SENTENCES: Flag indicating whether to tokenize the document into
sentences. If not set, the document is split into paragraphs.
DATABASE: Path to the ChromaDB database where the embeddings are stored.
(Required)
MODEL: Name or URL of the model used for generating embeddings. (Defaults to
'paraphrase-multilingual-minilm-l12-v2')
Options:
--database TEXT Path to the ChromaDB database where the embeddings are
stored.
--model TEXT Model name or URL used for generating embeddings. Defaults
to "paraphrase-multilingual-minilm-l12-v2".
--from-file Indicates if the document argument is a file path.
--use-sentences Indicates if the document should be tokenized into
sentences; otherwise, it is split into paragraphs.
--help Show this message and exit.
$ovos-gguf-embeddings query-document --help
Usage: ovos-gguf-embeddings query-document [OPTIONS] QUERY
Query the embeddings store to find similar documents to the given query.
QUERY: The query string used to search for similar documents.
DATABASE: Path to the ChromaDB database where the embeddings are stored. Can
be a full path or a simple string. If a simple string is provided,
it will be saved in the XDG cache directory (~/.cache/chromadb/{database}).
MODEL: Name or URL of the model used for generating embeddings. (Defaults to
'paraphrase-multilingual-minilm-l12-v2')
TOP-K: Number of top results to return. (Defaults to 5)
Options:
--database TEXT Path to the ChromaDB database where the embeddings are
stored.
--model TEXT Model name or URL used for generating embeddings. Defaults
to "paraphrase-multilingual-minilm-l12-v2".
--top-k INTEGER Number of top results to return. Defaults to 5.
--help Show this message and exit.
$ovos-gguf-embeddings delete-document --help
Usage: ovos-gguf-embeddings delete-document [OPTIONS] DOCUMENT
Delete a document from the embeddings store.
DOCUMENT: The document string to be deleted from the store.
DATABASE: Path to the ChromaDB database where the embeddings are stored. Can
be a full path or a simple string. If a simple string is provided,
it will be saved in the XDG cache directory (~/.cache/chromadb/{database}).
MODEL: Name or URL of the model used for generating embeddings. (Defaults to
'paraphrase-multilingual-minilm-l12-v2')
Options:
--database TEXT ChromaDB database where the embeddings are stored.
--model TEXT Model name or URL used for generating embeddings. Defaults
to "paraphrase-multilingual-minilm-l12-v2".
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ovos-gguf-embeddings-plugin-0.0.0a2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc365f1e64c64368442f1831f13c4fa21154a207a11647e81ef9ea9efb1e28cc |
|
MD5 | f8212a6b8baedbe725558d021c02e6e4 |
|
BLAKE2b-256 | 138914f1780ab2e09c1560d0cb0e611203204fc72410a27b402d47adf30a1e4c |
Hashes for ovos_gguf_embeddings_plugin-0.0.0a2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36e9353b1ad9a194ce4ce8a6ec05fdfea5c8a7d8e25b22ee1d5219c9d5d207a2 |
|
MD5 | 2042c2c22003d903ad0b8db6e283ca91 |
|
BLAKE2b-256 | 55ce2be50c428b4e35ccda7b1f577a73a9d43fb451841f96246072b859dfd1e1 |