Cachly semantic cache integration for LangChain — pgvector-backed, zero dependencies, fleet-wide cache hits
Project description
langchain-cachly
Cachly semantic cache for LangChain — fleet-wide LLM response caching by meaning, not by exact string match. Zero extra dependencies.
The problem
Your users ask the same question dozens of different ways. Without semantic caching, every rephrasing hits your LLM and costs money. RedisCache only catches exact duplicates. RedisSemanticCache requires a local embedding model per instance.
langchain-cachly uses server-side similarity search — one managed service, shared across your whole fleet.
Quick start
pip install langchain-cachly
import langchain
from langchain_cachly import CachlySemanticCache
langchain.llm_cache = CachlySemanticCache(
vector_url="https://api.cachly.dev/v1/sem/YOUR_VECTOR_TOKEN",
threshold=0.92, # cosine similarity 0–1; 0.92 recommended
ttl=86400, # seconds; default 24 h
)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
llm.predict("What is semantic caching?") # → LLM called, answer cached
llm.predict("Can you explain semantic caching?") # → cache HIT, $0 LLM cost
Get your CACHLY_VECTOR_URL from cachly.dev — free tier available.
Why CachlySemanticCache?
| Feature | RedisCache | RedisSemanticCache | CachlySemanticCache |
|---|---|---|---|
| Match type | Exact | Per-user embedding | Cluster-wide semantic |
| Infra needed | Redis | Redis + embedding model | None (managed) |
| Embeddings | ❌ | Local model | Managed (server-side) |
| Cross-instance sharing | ❌ | ❌ | ✅ |
| Extra dependencies | redis |
redis + embedding lib |
None (stdlib only) |
| Cold start | Fast | Slow (model load) | Fast |
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
vector_url |
str |
required | Full Cachly semantic URL incl. token |
threshold |
float |
0.92 |
Min cosine similarity for a cache hit |
ttl |
int |
86400 |
TTL in seconds (24 h) |
Async support
CachlySemanticCache implements the full BaseCache interface including alookup, aupdate, and aclear via thread-pool delegation — compatible with langchain's async chains out of the box.
Fail-open design
Network errors and HTTP failures return None (cache miss) and are logged at WARNING level. The cache never raises in the hot path — your LLM calls always complete even if cachly is temporarily unreachable.
Links
- cachly.dev — Free signup, dashboard, pricing
- Docs — Full integration docs
- GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_cachly-0.1.0.tar.gz.
File metadata
- Download URL: langchain_cachly-0.1.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cf0c2ec4ca85fae60dfc7d077100fe7391ffab66c05378b202ff81ed0f691d9
|
|
| MD5 |
c85e144d97a51eadf78daa66419c1389
|
|
| BLAKE2b-256 |
1e06fedba7e186d2a7e9b10b2441743a85aa40eeb5c00627fad1f282d8c7c321
|
File details
Details for the file langchain_cachly-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_cachly-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bb569a59835dec4d6feaa5ba38b09192a38b206d9906c936e67f480d0760537
|
|
| MD5 |
4aca48713ff66107c6be99017f7ebfc3
|
|
| BLAKE2b-256 |
b9f5d052504b22695357cfe6f7f611867e18845907e601c98ad0f048eb96a2ed
|