Skip to main content

Semantic caching for LLM responses on SAP HANA Cloud

Project description

langchain-hana-cache

Semantic caching for LLM responses on SAP HANA Cloud.

Stores prompt embeddings and LLM responses in HANA Cloud. When a semantically similar prompt comes in, it returns the cached response instead of calling the LLM — saving tokens and reducing latency.

How it works

  1. User sends a prompt to the LLM
  2. The cache embeds the prompt using the configured embedding model
  3. Searches HANA for cached entries using COSINE_SIMILARITY on a REAL_VECTOR column
  4. If similarity exceeds the threshold (default 0.95), returns the cached response — no LLM call
  5. If no match, calls the LLM normally, caches the prompt embedding + response, returns the response

Installation

pip install langchain-hana-cache

Usage

As LangChain global cache

import hdbcli.dbapi
from langchain_hana_cache import HANASemanticLLMCache
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.globals import set_llm_cache

connection = hdbcli.dbapi.connect(
    address="your-host.hanacloud.ondemand.com",
    port=443,
    user="DBADMIN",
    password="your-password",
    encrypt=True,
)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

cache = HANASemanticLLMCache(
    connection=connection,
    embedding=embeddings,
    table_name="LLM_CACHE",
    similarity_threshold=0.95,
    ttl_seconds=86400,
)

set_llm_cache(cache)

llm = ChatOpenAI(model="gpt-4o")
response1 = llm.invoke("What are the reporting requirements for article 12?")
response2 = llm.invoke("Tell me about article 12 reporting requirements")  # cache hit

Manual usage

from langchain_core.outputs import Generation

# Store a response
cache.update(
    "What is the capital of France?",
    "gpt-4o",
    [Generation(text="The capital of France is Paris.")],
)

# Look up a similar prompt
result = cache.lookup("Tell me the capital of France", "gpt-4o")
# result = [Generation(text="The capital of France is Paris.")]

Eviction

# Remove entries older than TTL
cache.evict_expired()

# Keep only the 1000 most recently accessed entries
cache.evict_lru(max_entries=1000)

# Clear all cached entries
cache.clear()

Parameters

Parameter Type Default Description
connection hdbcli.dbapi.Connection required HANA database connection
embedding Embeddings required LangChain embedding model for encoding prompts
table_name str "LLM_CACHE" Name of the cache table
similarity_threshold float 0.95 Minimum cosine similarity for a cache hit
ttl_seconds int | None None Time-to-live in seconds (None = no expiry)

Development

git clone https://github.com/stubborncoder/langchain-hana-cache.git
cd langchain-hana-cache
pip install -e ".[dev]"

# Run unit tests
pytest tests/test_utils.py tests/test_llm_cache.py -v

# Run integration tests (requires HANA credentials in .env)
pytest tests/test_integration.py -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_hana_cache-0.1.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_hana_cache-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file langchain_hana_cache-0.1.0.tar.gz.

File metadata

File hashes

Hashes for langchain_hana_cache-0.1.0.tar.gz
Algorithm Hash digest
SHA256 826f19f21fe92110b3bbd27454fe31b39151d34ba6ac30c5071ca9ce84472119
MD5 60572d4d14d674140d04f5c0dfbb6fd3
BLAKE2b-256 867fa7bac88905aada76d7c94ed269e28eaa437bf90a2433939a02d3267a63c6

See more details on using hashes here.

File details

Details for the file langchain_hana_cache-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_hana_cache-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d936d10f97b5aa7ca2ede231e92609e785ed0f143610998a6058235e605e9f89
MD5 c1de630636fb4f5c8eb054e83713a1fd
BLAKE2b-256 5c72b6c58921cda2007ef8b0ff86296e86d2dd12a9834fc3c142160f39349543

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page