A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.
Project description
llm-elasticsearch-cache
A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.
Install
pip install llm-elasticsearch-cache
Chat cache usage
The LangChain cache can be used similarly to the other cache integrations.
Basic example
from langchain.globals import set_llm_cache
from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch
es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(
ElasticsearchCache(
es_client=es_client,
es_index="llm-chat-cache",
metadata={"project": "my_chatgpt_project"}
)
)
The es_index
parameter can also take aliases. This allows to use the
ILM: Manage the index lifecycle
that we suggest to consider for managing retention and controlling cache growth.
Look at the class docstring for all parameters.
Index the generated text
The cached data won't be searchable by default. The developer can customize the building of the Elasticsearch document in order to add indexed text fields, where to put, for example, the text generated by the LLM.
This can be done by subclassing end overriding methods. The new cache class can be applied also to a pre-existing cache index:
from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch
from langchain_core.caches import RETURN_VAL_TYPE
from typing import Any, Dict, List
from langchain.globals import set_llm_cache
import json
class SearchableElasticsearchCache(ElasticsearchCache):
@property
def mapping(self) -> Dict[str, Any]:
mapping = super().mapping
mapping["mappings"]["properties"]["parsed_llm_output"] = {"type": "text", "analyzer": "english"}
return mapping
def build_document(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> Dict[str, Any]:
body = super().build_document(prompt, llm_string, return_val)
body["parsed_llm_output"] = self._parse_output(body["llm_output"])
return body
@staticmethod
def _parse_output(data: List[str]) -> List[str]:
return [json.loads(output)["kwargs"]["message"]["kwargs"]["content"] for output in data]
es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(SearchableElasticsearchCache(es_client=es_client, es_index="llm-chat-cache"))
Embeddings cache usage
Caching embeddings is obtained by using the CacheBackedEmbeddings, in a slightly different way than the official documentation.
from llmescache.langchain import ElasticsearchStore
from elasticsearch import Elasticsearch
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
es_client = Elasticsearch(hosts="http://localhost:9200")
underlying_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = ElasticsearchStore(
es_client=es_client,
es_index="llm-embeddings-cache",
namespace=underlying_embeddings.model,
metadata={"project": "my_llm_project"}
)
cached_embeddings = CacheBackedEmbeddings(
underlying_embeddings,
store
)
Similarly to the chat cache, one can subclass ElasticsearchStore
in order to index vectors for search.
from llmescache.langchain import ElasticsearchStore
from typing import Any, Dict, List
class SearchableElasticsearchStore(ElasticsearchStore):
@property
def mapping(self) -> Dict[str, Any]:
mapping = super().mapping
mapping["mappings"]["properties"]["vector"] = {"type": "dense_vector", "dims": 1536, "index": True, "similarity": "dot_product"}
return mapping
def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:
body = super().build_document(llm_input, vector)
body["vector"] = vector
return body
Be aware that CacheBackedEmbeddings
does
not currently support caching queries,
this means that text queries, for vector searches, won't be cached.
However, by overriding the embed_query
method one should be able to easily implement it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llm_elasticsearch_cache-0.2.1.tar.gz
.
File metadata
- Download URL: llm_elasticsearch_cache-0.2.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.6 Darwin/20.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f52d77fadf62d0fc3dca69021363430eb0d2edeef26c12d5812ca88cc5a95edc |
|
MD5 | 68bbe1337d7e2339f70b1517023abdc1 |
|
BLAKE2b-256 | 4c777bbc3e5773967434cf28fb1786babfa92e5f326bf08ab13606f64706a4ad |
File details
Details for the file llm_elasticsearch_cache-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: llm_elasticsearch_cache-0.2.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.10.6 Darwin/20.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4c529817decdd9750a38805031bce6205e3da73dd4f291d877bb60ead4b9e9e |
|
MD5 | 37eb1cc165ff609bb3a9cb7abcecc6b7 |
|
BLAKE2b-256 | c8e9846db60db1d37b21ddd786f0a2cbf32bc317910ec7d57fdc9705b1de32a7 |