Semantic caching for LLM responses on SAP HANA Cloud
Project description
langchain-hana-cache
Semantic caching for LLM responses on SAP HANA Cloud.
Stores prompt embeddings and LLM responses in HANA Cloud. When a semantically similar prompt comes in, it returns the cached response instead of calling the LLM — saving tokens and reducing latency.
How it works
- User sends a prompt to the LLM
- The cache embeds the prompt using the configured embedding model
- Searches HANA for cached entries using
COSINE_SIMILARITYon aREAL_VECTORcolumn - If similarity exceeds the threshold (default 0.95), returns the cached response — no LLM call
- If no match, calls the LLM normally, caches the prompt embedding + response, returns the response
Installation
pip install langchain-hana-cache
Usage
As LangChain global cache
import hdbcli.dbapi
from langchain_hana_cache import HANASemanticLLMCache
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.globals import set_llm_cache
connection = hdbcli.dbapi.connect(
address="your-host.hanacloud.ondemand.com",
port=443,
user="DBADMIN",
password="your-password",
encrypt=True,
)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
cache = HANASemanticLLMCache(
connection=connection,
embedding=embeddings,
table_name="LLM_CACHE",
similarity_threshold=0.95,
ttl_seconds=86400,
)
set_llm_cache(cache)
llm = ChatOpenAI(model="gpt-4o")
response1 = llm.invoke("What are the reporting requirements for article 12?")
response2 = llm.invoke("Tell me about article 12 reporting requirements") # cache hit
Manual usage
from langchain_core.outputs import Generation
# Store a response
cache.update(
"What is the capital of France?",
"gpt-4o",
[Generation(text="The capital of France is Paris.")],
)
# Look up a similar prompt
result = cache.lookup("Tell me the capital of France", "gpt-4o")
# result = [Generation(text="The capital of France is Paris.")]
Eviction
# Remove entries older than TTL
cache.evict_expired()
# Keep only the 1000 most recently accessed entries
cache.evict_lru(max_entries=1000)
# Clear all cached entries
cache.clear()
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
connection |
hdbcli.dbapi.Connection |
required | HANA database connection |
embedding |
Embeddings |
required | LangChain embedding model for encoding prompts |
table_name |
str |
"LLM_CACHE" |
Name of the cache table |
similarity_threshold |
float |
0.95 |
Minimum cosine similarity for a cache hit |
ttl_seconds |
int | None |
None |
Time-to-live in seconds (None = no expiry) |
Development
git clone https://github.com/stubborncoder/langchain-hana-cache.git
cd langchain-hana-cache
pip install -e ".[dev]"
# Run unit tests
pytest tests/test_utils.py tests/test_llm_cache.py -v
# Run integration tests (requires HANA credentials in .env)
pytest tests/test_integration.py -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_hana_cache-0.1.0.tar.gz.
File metadata
- Download URL: langchain_hana_cache-0.1.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
826f19f21fe92110b3bbd27454fe31b39151d34ba6ac30c5071ca9ce84472119
|
|
| MD5 |
60572d4d14d674140d04f5c0dfbb6fd3
|
|
| BLAKE2b-256 |
867fa7bac88905aada76d7c94ed269e28eaa437bf90a2433939a02d3267a63c6
|
File details
Details for the file langchain_hana_cache-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_hana_cache-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d936d10f97b5aa7ca2ede231e92609e785ed0f143610998a6058235e605e9f89
|
|
| MD5 |
c1de630636fb4f5c8eb054e83713a1fd
|
|
| BLAKE2b-256 |
5c72b6c58921cda2007ef8b0ff86296e86d2dd12a9834fc3c142160f39349543
|