Skip to main content

[IMPORTANT: This library is now part of LangChain, follow its official documentation] A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.

Project description

[!IMPORTANT]

This library is now part of LangChain, follow the official documentation, e.g. for the LLM cache

llm-elasticsearch-cache

A caching layer for LLMs that exploits Elasticsearch, fully compatible with LangChain caching, both for chat and embeddings models.

Install

pip install llm-elasticsearch-cache

Chat cache usage

The LangChain cache can be used similarly to the other cache integrations.

Basic example

from langchain.globals import set_llm_cache
from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch

es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(
    ElasticsearchCache(
        es_client=es_client, 
        es_index="llm-chat-cache", 
        metadata={"project": "my_chatgpt_project"}
    )
)

The es_index parameter can also take aliases. This allows to use the ILM: Manage the index lifecycle that we suggest to consider for managing retention and controlling cache growth.

Look at the class docstring for all parameters.

Index the generated text

The cached data won't be searchable by default. The developer can customize the building of the Elasticsearch document in order to add indexed text fields, where to put, for example, the text generated by the LLM.

This can be done by subclassing end overriding methods. The new cache class can be applied also to a pre-existing cache index:

from llmescache.langchain import ElasticsearchCache
from elasticsearch import Elasticsearch
from langchain_core.caches import RETURN_VAL_TYPE
from typing import Any, Dict, List
from langchain.globals import set_llm_cache
import json


class SearchableElasticsearchCache(ElasticsearchCache):

    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["parsed_llm_output"] = {"type": "text", "analyzer": "english"}
        return mapping
    
    def build_document(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> Dict[str, Any]:
        body = super().build_document(prompt, llm_string, return_val)
        body["parsed_llm_output"] = self._parse_output(body["llm_output"])
        return body

    @staticmethod
    def _parse_output(data: List[str]) -> List[str]:
        return [json.loads(output)["kwargs"]["message"]["kwargs"]["content"] for output in data]


es_client = Elasticsearch(hosts="http://localhost:9200")
set_llm_cache(SearchableElasticsearchCache(es_client=es_client, es_index="llm-chat-cache"))

Embeddings cache usage

Caching embeddings is obtained by using the CacheBackedEmbeddings, in a slightly different way than the official documentation.

from llmescache.langchain import ElasticsearchStore
from elasticsearch import Elasticsearch
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings

es_client = Elasticsearch(hosts="http://localhost:9200")

underlying_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = ElasticsearchStore(
    es_client=es_client, 
    es_index="llm-embeddings-cache",
    namespace=underlying_embeddings.model,
    metadata={"project": "my_llm_project"}
)
cached_embeddings = CacheBackedEmbeddings(
    underlying_embeddings, 
    store
)

Similarly to the chat cache, one can subclass ElasticsearchStore in order to index vectors for search.

from llmescache.langchain import ElasticsearchStore
from typing import Any, Dict, List

class SearchableElasticsearchStore(ElasticsearchStore):

    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["vector"] = {"type": "dense_vector", "dims": 1536, "index": True, "similarity": "dot_product"}
        return mapping
    
    def build_document(self, llm_input: str, vector: List[float]) -> Dict[str, Any]:
        body = super().build_document(llm_input, vector)
        body["vector"] = vector
        return body

Be aware that CacheBackedEmbeddings does not currently support caching queries, this means that text queries, for vector searches, won't be cached. However, by overriding the embed_query method one should be able to easily implement it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_elasticsearch_cache-0.2.6.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

llm_elasticsearch_cache-0.2.6-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_elasticsearch_cache-0.2.6.tar.gz.

File metadata

  • Download URL: llm_elasticsearch_cache-0.2.6.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.6 Darwin/23.4.0

File hashes

Hashes for llm_elasticsearch_cache-0.2.6.tar.gz
Algorithm Hash digest
SHA256 8acd013ba7746177c72d36573efaad3a3547b351128a2d3f67deb4298aa6d07b
MD5 be97511b6b95c9b37028c90ba2b148b5
BLAKE2b-256 2c7ff7fa66f58ac3817f5fb28bd7dfabc8698a059f409cb701e2683e1aea6d3c

See more details on using hashes here.

File details

Details for the file llm_elasticsearch_cache-0.2.6-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_elasticsearch_cache-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 987481231a3770512942d84dccbd010a664a896d05b95cff96caf51c65ab06b2
MD5 1627eab02b194fed0cec45771099c4a7
BLAKE2b-256 ab4981551f5bee42d127668e9784c94d18b69cacd86f7b064e7706434e3ceef8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page