No project description provided
Project description
Introduction
Implement the sentence embedding retriever with local cache from the embedding store.
Features
-
Embedding store abstraction class
-
Support Jina client implementation embedding store
-
Support LFU, LRU cache eviction policy for limited cache size, if the eviction policy is not specified then won't apply any eviction policy
-
Save the cache to parquet file
-
Load the cache from existed parquet file
Quick Start
Option 1. Using Jina flow serve the embedding model
- Installation
pip install embestore"[jina]"
- To start up the Jina flow service with sentence embedding model
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
, you can just clone this github repo directly and serve by the docker container.
git clone https://github.com/ycc789741ycc/sentence-embedding-dataframe-cache.git
cd sentence-embedding-dataframe-cache
make serve-jina-embedding
- Retrieve the embedding
from embestore.store.jina import JinaEmbeddingStore
JINA_EMBESTORE_GRPC = "grpc://0.0.0.0:54321"
query_sentences = ["I want to listen the music.", "Music don't want to listen me."]
jina_embestore = JinaEmbeddingStore(embedding_grpc=JINA_EMBESTORE_GRPC)
results = jina_embestore.retrieve_embeddings(sentences=query_sentences)
- Stop the docker container
make stop-jina-embedding
Option 2. Using local sentence embedding model
- Installation
pip install embestore"[sentence-transformers]"
- Serve the sentence embedding model
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
by in-memory
from embestore.store.torch import TorchEmbeddingStore
query_sentences = ["I want to listen the music.", "Music don't want to listen me."]
torch_embestore = TorchEmbeddingStore()
results = torch_embestore.retrieve_embeddings(sentences=query_sentences)
Option 3. Inherit from the abstraction class
- Installation
pip install embestore
from typing import List, Text
import numpy as np
from sentence_transformers import SentenceTransformer
from embestore.store.base import EmbeddingStore
model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2").eval()
class TorchEmbeddingStore(EmbeddingStore):
def _retrieve_embeddings_from_model(self, sentences: List[Text]) -> np.ndarray:
return model.encode(sentences)
Save the cache
torch_embestore.save("cache.parquet")
Load from the cache
torch_embestore = TorchEmbeddingStore("cache.parquet")
Apply eviction policy
- LRU
torch_embestore = TorchEmbeddingStore(max_size=100, eviction_policy="lru")
- LFU
torch_embestore = TorchEmbeddingStore(max_size=100, eviction_policy="lfu")
Road Map
[TODO] Documentation
[TODO] Badges
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
embestore-0.2.0.tar.gz
(6.1 kB
view details)
Built Distribution
File details
Details for the file embestore-0.2.0.tar.gz
.
File metadata
- Download URL: embestore-0.2.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ae8b780999b7e8947b2dbf865ed36ce09c6de2a8359e07c89a179ab2ee3b878 |
|
MD5 | e356e306a511fd94cd6a85fa2d8597ba |
|
BLAKE2b-256 | b97428a1d8a04ca2a8cb638ecfe219f1126c2a201bd68c82d8bdb64833378e20 |
File details
Details for the file embestore-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: embestore-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2ed24424bcfdb4bb40cff01d200dde1edad3ea06a59c7edaf974a8e7bc397a4 |
|
MD5 | 50df2fd3a87f5ff9bab7da53d31dc7b8 |
|
BLAKE2b-256 | 26cd13e683035970f3c20414d7232dce232ab47b7c75c484bd67cca8591e682b |