Skip to main content

cachly.dev adapters for LangChain, AutoGen & CrewAI – persistent memory, semantic long-term memory, and semantic cache for AI agents

Project description

cachly-agents

Official cachly.dev adapters for AI agent frameworks – persistent memory and semantic cache for LangChain, AutoGen, and CrewAI agents.

DSGVO-compliant · German servers · 30s setup

Installation

pip install cachly-agents                   # core + memory
pip install "cachly-agents[langchain]"      # + LangChain adapter
pip install "cachly-agents[autogen]"        # + AutoGen adapter
pip install "cachly-agents[crewai]"         # + CrewAI adapter
pip install "cachly-agents[all]"            # everything

Requires Python 3.10+ · A free cachly.dev instance · pip install cachly


LangChain – Persistent Chat History

import os
from cachly_agents.langchain import CachlyChatMessageHistory
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory

def get_history(session_id: str) -> CachlyChatMessageHistory:
    return CachlyChatMessageHistory(
        session_id=session_id,
        redis_url=os.environ["CACHLY_URL"],
        ttl=3600,           # session expires after 1 h of inactivity
    )

chain = RunnableWithMessageHistory(
    ChatOpenAI(model="gpt-4o"),
    get_history,
)

# Messages persist across restarts, scale-out, and agent runs
response = chain.invoke(
    {"role": "user", "content": "What did we discuss last time?"},
    config={"configurable": {"session_id": "user-42"}},
)

AutoGen – Persistent Conversation Store

import os
from cachly_agents.autogen import CachlyMessageStore
from autogen import ConversableAgent

store = CachlyMessageStore(
    redis_url=os.environ["CACHLY_URL"],
    conversation_id="project-42",
    ttl=86400,          # 24 h
)

# Inject persisted history on agent start
assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful AI assistant.",
    initial_history=store.load(),
)

# Persist messages after each turn
store.append({"role": "user",      "content": "Draft a project plan."})
store.append({"role": "assistant", "content": "Sure! Here's the plan..."})

CrewAI – Persistent Memory Adapter

import os
from cachly_agents.crewai import CachlyCrewMemory
from crewai import Agent, Crew, Task

memory = CachlyCrewMemory(
    redis_url=os.environ["CACHLY_URL"],
    crew_id="research-team",
    ttl=604800,         # 7 days
)

# Store and recall facts
memory.save("last_topic", "LLM memory systems")
topic = memory.load("last_topic")          # → "LLM memory systems"

# Append-only research log
memory.log("research_log", {"topic": topic, "quality": "high"})
log = memory.get_log("research_log", limit=50)

researcher = Agent(
    role="Senior Research Analyst",
    goal=f"Continue research on {topic}",
    memory=True,
)

Generic Key-Value Memory (framework-agnostic)

import os
from cachly_agents.memory import CachlyMemory

with CachlyMemory(redis_url=os.environ["CACHLY_URL"], namespace="agent:planner") as mem:
    mem.remember("user_name", "Alice")
    mem.remember("budget", 50_000, ttl=3600)

    name   = mem.recall("user_name")      # → "Alice"
    budget = mem.recall("budget", 0)      # → 50000

    snapshot = mem.snapshot()             # full dict for checkpointing
    mem.clear()
    mem.restore(snapshot)                 # restore from checkpoint

Semantic Long-Term Memory (NEW ✨)

Store and recall knowledge by meaning — not by exact key. Uses cachly's pgvector-backed semantic cache for similarity search.

import os
from cachly_agents.semantic import CachlySemanticMemory

with CachlySemanticMemory(
    vector_url=os.environ["CACHLY_VECTOR_URL"],   # https://api.cachly.dev/v1/sem/{token}
    embed_fn=lambda text: openai.embeddings.create(
        input=text, model="text-embedding-3-small"
    ).data[0].embedding,
) as mem:
    # Store facts — they become vector embeddings
    mem.remember("cachly is a managed Valkey cache for AI apps")
    mem.remember("GDPR compliance is critical for EU companies")
    mem.remember("pgvector enables sub-millisecond ANN search", metadata={"topic": "tech"})

    # Recall by meaning — fuzzy, not exact
    results = mem.recall("What cache service works for AI?", top_k=3)
    for r in results:
        print(f"  [{r.similarity:.2f}] {r.text}")
        # → [0.94] cachly is a managed Valkey cache for AI apps

    # Streaming recall (SSE) — replay cached answer at LLM-like speed
    for chunk in mem.stream_recall("How does cachly handle GDPR?"):
        print(chunk, end="", flush=True)

    # Forget specific entries or flush all
    mem.forget(results[0].entry_id)
    mem.forget_all()

Why Semantic Memory?

Feature Key-Value (CachlyMemory) Semantic (CachlySemanticMemory)
Lookup method Exact key match Meaning-based similarity (cosine)
Data model name → value dict Free-text knowledge base
Best for Structured config, user prefs Unstructured knowledge, RAG, Q&A
Threshold N/A Adaptive F1-calibrated (auto-tuned)
Streaming No Yes (SSE word-level chunks)

Why cachly for AI agents?

Problem Without cachly With cachly-agents
Agent memory Ephemeral – lost on restart Persistent across restarts, replicas, agent runs
Multi-agent sessions Each agent has isolated memory Shared namespace, one source of truth
LLM costs Every user message re-generates Semantic cache cuts 60% of LLM calls
Agent knowledge Brittle keyword search Semantic recall by meaning — pgvector HNSW
GDPR / data residency Cloud provider may store in US German servers, EU data only
Scale-out In-process dict breaks with replicas Redis scales horizontally out of the box

API Reference

CachlyMemory

Method Description
remember(name, value, ttl?) Store any JSON-serialisable value
recall(name, default?) Retrieve a value (returns default on miss)
forget(name) Delete a single memory
recall_all() Return all memories in this namespace
snapshot() Alias for recall_all() – for checkpointing
restore(snapshot, ttl?) Bulk-restore from a snapshot
clear() Delete all memories in this namespace

CachlyChatMessageHistory

Method Description
messages List of BaseMessage objects
add_messages(messages) Append messages to history
clear() Delete all messages for this session

CachlyMessageStore

Method Description
load() Load all messages as list of dicts
append(message) Append a single message
append_many(messages) Bulk-append via pipeline
clear() Delete all messages

CachlyCrewMemory

Method Description
save(name, value, ttl?) Store a named fact
load(name, default?) Retrieve a named fact
delete(name) Remove a fact
log(log_name, entry) Append to an append-only log
get_log(log_name, limit?) Retrieve last N log entries
clear_all() Delete everything for this crew

CachlySemanticMemory

Method Description
remember(text, metadata?, ttl?) Store text as a vector embedding
recall(query, top_k?, threshold?) Retrieve similar entries by meaning
stream_recall(query, top_k?, threshold?) SSE streaming recall (word-level chunks)
forget(entry_id) Delete a specific entry by ID
forget_all() Flush the entire namespace
count() Number of live entries

SemanticRecallResult

Property Description
text The recalled text content
entry_id UUID of the entry (use with forget())
similarity Cosine similarity score (0.0–1.0)
threshold_used The threshold that was applied
metadata Optional dict attached at storage time

Environment Variables

CACHLY_URL=redis://:your-password@my-app.cachly.dev:30101
CACHLY_VECTOR_URL=https://api.cachly.dev/v1/sem/{your-vector-token}   # for CachlySemanticMemory

Get your connection string and vector token at cachly.dev/instances.


Real-World Use Cases

1. Customer Support Bot (LangChain + Semantic Cache)

Problem: Your support bot calls GPT-4o for every question — even when users ask the same thing 50 times a day ("How do I reset my password?").

Solution: Cache LLM responses by meaning. The 51st user asking "password reset help" gets an instant cached answer.

from cachly_agents.langchain import CachlyChatMessageHistory
from cachly_agents.memory import CachlySemanticMemory

memory = CachlySemanticMemory(
    vector_url=os.environ["CACHLY_VECTOR_URL"],
    embed_fn=openai_embed,
    threshold=0.85,
)

# First user asks "How do I reset my password?" → LLM call → cached
# Second user asks "I forgot my password, help!" → cache HIT (0.92 similarity)
result = await memory.recall("I forgot my password, help!")
if result:
    return result.response  # instant, $0.00

Impact: 60-80% fewer LLM calls → $500+/month saved on a typical support bot.


2. RAG Pipeline with Persistent Context (CrewAI)

Problem: Your research agents lose context between runs. Every restart means re-fetching and re-embedding documents.

Solution: Use cachly as the shared memory layer. Agents remember research across sessions.

from cachly_agents.crewai import CachlyCrewMemory

memory = CachlyCrewMemory(
    redis_url=os.environ["CACHLY_URL"],
    vector_url=os.environ["CACHLY_VECTOR_URL"],
    embed_fn=openai_embed,
)

# Agent stores research findings
await memory.store("research:market-analysis", {
    "finding": "EU AI Act requires audit trails for all AI decisions",
    "source": "Official Journal L 2024/1689",
    "embedding": embed("EU AI Act audit requirements"),
})

# Next run — agent recalls relevant findings by meaning
findings = await memory.semantic_recall("What are the compliance requirements?")
# Returns the market analysis finding (similarity: 0.91) — no re-research needed

3. Multi-Agent Research Team (AutoGen)

Problem: Multiple AutoGen agents working on the same topic duplicate LLM calls. The "Researcher" and "Writer" agents both ask similar questions to GPT-4o.

Solution: Shared semantic cache — when one agent gets an answer, all agents benefit.

from cachly_agents.autogen import CachlyAutoGenCache

cache = CachlyAutoGenCache(
    redis_url=os.environ["CACHLY_URL"],
    vector_url=os.environ["CACHLY_VECTOR_URL"],
    embed_fn=openai_embed,
)

# Researcher agent asks: "What is the market size for AI caching?"
# → LLM call, result cached semantically

# Writer agent later asks: "How big is the AI cache market?"
# → Cache HIT (similarity 0.93) → instant response, no LLM call

4. E-Commerce Recommendation Engine

Problem: Your product recommendation agent makes expensive LLM calls to generate personalized suggestions. Similar customers get similar recommendations, but each call costs $0.03.

Solution: Cache recommendations by user-profile similarity.

# User profile embedding captures preferences
profile_embedding = embed(f"likes:{categories} budget:{range} style:{preferences}")

# Check if a similar user already got recommendations
cached = await memory.recall_by_embedding(profile_embedding, threshold=0.88)
if cached:
    return cached.recommendations  # instant, free

# No cache hit — generate fresh recommendations
recs = await llm.generate_recommendations(user_profile)
await memory.index(profile_embedding, recs, namespace="recommendations", ttl=3600)
return recs

Impact: At 10,000 users/day with 70% cache hit rate → 7,000 free responses daily.


5. Code Review Assistant with Memory

Problem: Your AI code reviewer forgets patterns it already flagged. The same anti-pattern across 20 PRs triggers 20 identical LLM analyses.

Solution: Semantic cache for code patterns + persistent memory for project conventions.

from cachly_agents.memory import CachlySemanticMemory

review_memory = CachlySemanticMemory(
    vector_url=os.environ["CACHLY_VECTOR_URL"],
    embed_fn=code_embed,  # code-specific embedding model
    threshold=0.90,
    namespace="code-review",
)

# First PR with "SELECT * FROM users WHERE id = " + user_input
# → LLM: "SQL injection vulnerability, use parameterized queries"
# → Cached

# 15th PR with similar pattern
# → Cache HIT → instant review comment, consistent feedback

License

MIT © cachly.dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachly_agents-0.2.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachly_agents-0.2.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file cachly_agents-0.2.0.tar.gz.

File metadata

  • Download URL: cachly_agents-0.2.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cachly_agents-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9532cec7b3cc04d54c2e9784e1167162cc6a52232776b3f5ca277184241d6fc7
MD5 81ea25a11396d8ec7cc22148173377c8
BLAKE2b-256 3430b271fbb3d69d2cc5ecb42709d00638308f519354affa55c38c648575db4b

See more details on using hashes here.

File details

Details for the file cachly_agents-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cachly_agents-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cachly_agents-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cb46700d545de72d3ad518b04120da0cfb00396d983522f138d38f79d7f40c4
MD5 78e9983387e52e19a83534bb39e4aac5
BLAKE2b-256 6b54decb3f2a3b76adce6601b5bc6bca0f7d9ff4dbe676198e48844ae3760591

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page