Skip to main content

Cachly semantic cache integration for LangChain — pgvector-backed, zero dependencies, fleet-wide cache hits

Project description

langchain-cachly

Cachly semantic cache for LangChain — fleet-wide LLM response caching by meaning, not by exact string match. Zero extra dependencies.

PyPI Python 3.10+ License: MIT


The problem

Your users ask the same question dozens of different ways. Without semantic caching, every rephrasing hits your LLM and costs money. RedisCache only catches exact duplicates. RedisSemanticCache requires a local embedding model per instance.

langchain-cachly uses server-side similarity search — one managed service, shared across your whole fleet.

Quick start

pip install langchain-cachly
import langchain
from langchain_cachly import CachlySemanticCache

langchain.llm_cache = CachlySemanticCache(
    vector_url="https://api.cachly.dev/v1/sem/YOUR_VECTOR_TOKEN",
    threshold=0.92,   # cosine similarity 0–1; 0.92 recommended
    ttl=86400,        # seconds; default 24 h
)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

llm.predict("What is semantic caching?")           # → LLM called, answer cached
llm.predict("Can you explain semantic caching?")   # → cache HIT, $0 LLM cost

Get your CACHLY_VECTOR_URL from cachly.dev — free tier available.

Why CachlySemanticCache?

Feature RedisCache RedisSemanticCache CachlySemanticCache
Match type Exact Per-user embedding Cluster-wide semantic
Infra needed Redis Redis + embedding model None (managed)
Embeddings Local model Managed (server-side)
Cross-instance sharing
Extra dependencies redis redis + embedding lib None (stdlib only)
Cold start Fast Slow (model load) Fast

Configuration

Parameter Type Default Description
vector_url str required Full Cachly semantic URL incl. token
threshold float 0.92 Min cosine similarity for a cache hit
ttl int 86400 TTL in seconds (24 h)

Async support

CachlySemanticCache implements the full BaseCache interface including alookup, aupdate, and aclear via thread-pool delegation — compatible with langchain's async chains out of the box.

Fail-open design

Network errors and HTTP failures return None (cache miss) and are logged at WARNING level. The cache never raises in the hot path — your LLM calls always complete even if cachly is temporarily unreachable.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_cachly-0.1.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_cachly-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_cachly-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_cachly-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for langchain_cachly-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7cf0c2ec4ca85fae60dfc7d077100fe7391ffab66c05378b202ff81ed0f691d9
MD5 c85e144d97a51eadf78daa66419c1389
BLAKE2b-256 1e06fedba7e186d2a7e9b10b2441743a85aa40eeb5c00627fad1f282d8c7c321

See more details on using hashes here.

File details

Details for the file langchain_cachly-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_cachly-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0bb569a59835dec4d6feaa5ba38b09192a38b206d9906c936e67f480d0760537
MD5 4aca48713ff66107c6be99017f7ebfc3
BLAKE2b-256 b9f5d052504b22695357cfe6f7f611867e18845907e601c98ad0f048eb96a2ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page