Redis integrations for OpenAI Agents SDK - session management, vector search, caching, and observability
Project description
Redis OpenAI Agents
Production-ready Redis integrations for the OpenAI Agents SDK
Introduction
Redis OpenAI Agents is a production-ready Python library that provides Redis-powered infrastructure for the OpenAI Agents SDK. Replace 5+ separate systems with a single Redis deployment.
| Sessions & Memory | Caching & Search | Streaming & Coordination |
|---|---|---|
| AgentSession Persistent conversation storage |
SemanticCache Reduce LLM costs by 25%+ |
RedisStreamTransport Reliable, replayable streaming |
| JSONSession Complex nested data storage |
RedisVectorStore Fast similarity search |
AgentCoordinator Multi-agent orchestration |
| SemanticRouter Intent-based agent routing |
HybridSearchService BM25 + vector combined |
RobustStreamProcessor Consumer groups & replay |
Built for OpenAI Agents SDK
- Drop-in Session Storage → Replace SQLite with distributed Redis sessions
- Cost Reduction → Semantic caching reduces LLM API calls by 25%+
- Production Streaming → Redis Streams for reliable token delivery
- Multi-Agent Systems → Coordinate agents with atomic operations
Getting Started
Installation
Install redis-openai-agents into your Python (>=3.10) environment:
pip install redis-openai-agents
Redis
Choose from multiple Redis deployment options:
-
Redis Cloud: Managed cloud database (free tier available)
-
Redis (Docker): The official
redis:8image ships with Search, JSON, Time Series, and Bloom filters built in - no separate stack image required.docker run -d --name redis -p 6379:6379 redis:8
-
Redis Enterprise: Commercial, self-hosted database
Want a GUI? Run Redis Insight separately:
docker run -d --name redisinsight -p 5540:5540 redis/redisinsight:latest.
Overview
Agent Sessions
Replace SQLite sessions with Redis for persistent, distributed conversation storage:
from agents import Agent, Runner
from redis_openai_agents import AgentSession
# Create a session
session = AgentSession.create(
user_id="user_123",
redis_url="redis://localhost:6379"
)
# Define your agent
agent = Agent(name="assistant", instructions="You are a helpful assistant.")
# Run the agent
result = await Runner.run(agent, input="Hello!")
# Store the conversation
session.store_agent_result(result)
# Later: Load and continue the conversation
session = AgentSession.load(
conversation_id=session.conversation_id,
user_id="user_123",
redis_url="redis://localhost:6379"
)
# Get conversation history in SDK format
history = session.to_agent_inputs()
result = await Runner.run(agent, input=history + [{"role": "user", "content": "Follow up"}])
An async-compatible JSON session is also available:
JSONSessionfor complex nested data.
Semantic Caching
Reduce LLM costs by caching responses for similar queries:
from redis_openai_agents import SemanticCache
cache = SemanticCache(
redis_url="redis://localhost:6379",
distance_threshold=0.1, # Similarity threshold (lower = stricter)
ttl=3600 # 1 hour TTL
)
# Check cache before calling LLM
result = cache.check(query="What is the capital of France?")
if result:
print(f"Cache hit: {result.response}")
else:
# Call LLM and store result
response = "Paris is the capital of France."
cache.store(query="What is the capital of France?", response=response)
Learn more about semantic caching.
Semantic Routing
Route queries to the appropriate agent using vector similarity - no LLM calls required:
from redis_openai_agents import SemanticRouter, Route
router = SemanticRouter(
name="support-router",
redis_url="redis://localhost:6379",
routes=[
Route(
name="billing",
references=["payment issue", "invoice", "refund request"],
metadata={"agent": "billing_agent"},
distance_threshold=0.3
),
Route(
name="technical",
references=["bug report", "error message", "not working"],
metadata={"agent": "tech_agent"},
distance_threshold=0.3
),
]
)
# Route a query (vector lookup, not LLM call)
match = router.route("I need help with my payment")
print(f"Route to: {match.name}") # "billing"
Learn more about semantic routing.
Vector Search (RAG)
Build RAG applications with Redis vector search:
from redis_openai_agents import RedisVectorStore
store = RedisVectorStore(
name="knowledge-base",
redis_url="redis://localhost:6379"
)
# Add documents
store.add_documents([
{"content": "Redis is an in-memory data store.", "source": "docs"},
{"content": "Python is a programming language.", "source": "wiki"},
])
# Search with metadata filtering
results = store.search(
query="What is Redis?",
k=5,
filter={"source": "docs"}
)
for result in results:
print(f"{result.content} (score: {result.score})")
Hybrid Search
Combine vector similarity with BM25 full-text search for better retrieval:
from redis_openai_agents import HybridSearchService
search = HybridSearchService(
name="hybrid-search",
redis_url="redis://localhost:6379"
)
# Search with both vector and text matching
results = search.search(
query="Redis performance optimization",
k=10,
vector_weight=0.7, # 70% vector similarity
text_weight=0.3 # 30% BM25 text match
)
Token Streaming
Reliable, replayable token streaming via Redis Streams:
from redis_openai_agents import RedisStreamTransport, RobustStreamProcessor
# Publisher side
transport = RedisStreamTransport(
stream_name="agent-output",
redis_url="redis://localhost:6379"
)
await transport.publish({"type": "token", "data": {"text": "Hello"}})
await transport.publish({"type": "token", "data": {"text": " world!"}})
await transport.publish({"type": "complete", "data": {}})
# Consumer side with automatic recovery
processor = RobustStreamProcessor(
stream_name="agent-output",
consumer_group="clients",
redis_url="redis://localhost:6379"
)
async for event in processor.process():
if event["type"] == "token":
print(event["data"]["text"], end="")
Supports consumer groups, automatic acknowledgment, and replay from any position.
Agent Coordination
Coordinate multiple agents with Redis pub/sub and atomic operations:
from redis_openai_agents import AgentCoordinator, EventType
coordinator = AgentCoordinator(
session_id="multi-agent-session",
redis_url="redis://localhost:6379"
)
# Agent 1: Signal handoff ready
await coordinator.publish(EventType.HANDOFF_READY, {
"from_agent": "triage",
"to_agent": "specialist",
"context": {"topic": "billing"}
})
# Agent 2: Listen for handoffs
async for event in coordinator.subscribe():
if event.type == EventType.HANDOFF_READY:
print(f"Handoff from {event.data['from_agent']}")
Middleware for the Model Call
Compose cross-cutting concerns around the agent's LLM call with an
around-style middleware protocol modelled on LangChain's AgentMiddleware:
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
from openai import AsyncOpenAI
from redis_openai_agents import (
MiddlewareStack, Route, SemanticCache, SemanticRouter,
)
from redis_openai_agents.middleware import (
SemanticCacheMiddleware, SemanticRouterMiddleware,
)
router = SemanticRouter(
name="support-router", redis_url="redis://localhost:6379",
routes=[Route(name="greeting", references=["hello", "hi"])],
)
router_mw = SemanticRouterMiddleware(router=router, responses={"greeting": "Hi!"})
cache = SemanticCache(redis_url="redis://localhost:6379", similarity_threshold=0.92)
cache_mw = SemanticCacheMiddleware(cache=cache)
stack = MiddlewareStack(
model=OpenAIResponsesModel(model="gpt-4o-mini", openai_client=AsyncOpenAI()),
middlewares=[router_mw, cache_mw], # outer-to-inner
)
agent = Agent(name="assistant", instructions="Be concise.", model=stack)
result = await Runner.run(agent, "hello") # short-circuited by router
Ships with SemanticCacheMiddleware, SemanticRouterMiddleware, and
ConversationMemoryMiddleware. Write your own: any object with an async
awrap_model_call(request, handler) coroutine is a middleware.
Tool Result Caching
Memoize a tool's Python callable in Redis, keyed by argument hash. Side-effect
prefixes (send_, delete_, ...) and volatile args (timestamp, now, ...)
bypass the cache automatically.
from agents import function_tool
from redis_openai_agents import cached_tool
@function_tool
@cached_tool(name="lookup_company", redis_url="redis://localhost:6379", ttl=3600)
async def lookup_company(ticker: str) -> str:
return await _hit_paid_api(ticker)
Metrics & Observability
Built-in observability with RedisTimeSeries and Prometheus:
from redis_openai_agents import AgentMetrics, PrometheusExporter
metrics = AgentMetrics(redis_url="redis://localhost:6379")
# Record metrics
await metrics.record_latency("agent_run", 150.5)
await metrics.record_tokens("gpt-4", input_tokens=100, output_tokens=50)
await metrics.record_cache_hit("semantic_cache")
# Get statistics
stats = await metrics.get_stats("latency", aggregation="avg", time_range="1h")
# Prometheus export (http://localhost:9090/metrics)
exporter = PrometheusExporter(metrics)
await exporter.start_server(port=9090)
Components
Sessions & Memory
| Component | Description |
|---|---|
AgentSession |
Hash-based session storage built on RedisVL MessageHistory |
JSONSession |
JSON document storage for complex nested session data |
SemanticRouter |
Vector-based intent routing without LLM calls |
Caching & Search
| Component | Description |
|---|---|
SemanticCache |
Two-level cache (exact match + semantic similarity) |
RedisCachingModel |
Model wrapper with automatic response caching |
RedisVectorStore |
HNSW vector search for RAG applications |
RedisFullTextSearch |
BM25 full-text search with filters |
HybridSearchService |
Combined vector + text search with configurable weights |
Streaming & Coordination
| Component | Description |
|---|---|
RedisStreamTransport |
Redis Streams-based event transport |
RobustStreamProcessor |
Consumer groups with automatic recovery |
ResumableStreamRunner |
Checkpoint-based stream resumption |
AgentCoordinator |
Multi-agent coordination via pub/sub |
AtomicOperations |
Lua script-based atomic Redis operations |
Observability
| Component | Description |
|---|---|
AgentMetrics |
RedisTimeSeries metrics collection |
PrometheusExporter |
Prometheus metrics endpoint |
RedisTracingProcessor |
SDK-compatible trace storage in Redis Streams |
SDK Integration
| Component | Description |
|---|---|
RedisAgentRunner |
Enhanced runner with caching and metrics |
RedisFileSearchTool |
Drop-in replacement for OpenAI file search |
RedisRateLimitGuardrail |
SDK guardrail with Redis-backed rate limiting |
MiddlewareStack |
Around-style middleware wrapping the SDK Model interface |
SemanticCacheMiddleware |
Cache LLM responses by input similarity |
SemanticRouterMiddleware |
Short-circuit matched intents with canned responses |
ConversationMemoryMiddleware |
Inject semantically relevant past messages |
cached_tool |
Decorator that memoizes a tool callable's result in Redis |
Advanced Features
| Component | Description |
|---|---|
RankedOperations |
Sorted set rankings for agents and tools |
DeduplicationService |
Bloom filter request deduplication |
RedisConnectionPool |
Connection pooling with retry logic |
Examples
| Example | Description |
|---|---|
| 01-routing-agents | Multi-agent routing with handoffs |
| 02-semantic-cache | Reduce LLM costs with caching |
| 03-vector-search | Build RAG applications |
| 04-full-text-search | BM25 full-text search |
| 05-token-streaming | Real-time streaming with Redis Streams |
| 06-time-series-metrics | Observability with TimeSeries |
| 07-full-stack-integration | Complete integration example |
| 08-runner-integration | RedisAgentRunner usage |
| 09-hybrid-search | Combined vector + full-text search |
| 10-agent-ranking | Sorted set rankings |
| 11-deduplication | Bloom filter deduplication |
| 12-agent-coordinator | Multi-agent orchestration |
| 13-robust-streaming | Consumer groups & recovery |
| 14-atomic-operations | Lua script atomicity |
| 15-semantic-router | Intent-based routing |
| 16-middleware | Cache + router + composition around the Model |
| 17-tool-caching | @cached_tool for idempotent tools |
Why Redis OpenAI Agents?
| Challenge | Without Redis | With Redis OpenAI Agents |
|---|---|---|
| Session Storage | SQLite (single-node) | Distributed Redis sessions |
| Caching | None or external service | Built-in semantic cache |
| Vector Search | Pinecone, Qdrant ($70+/mo) | Redis Vector Search (free) |
| Streaming | Custom WebSocket code | Redis Streams (reliable) |
| Metrics | Prometheus + Grafana setup | Built-in TimeSeries |
| Total Services | 5+ separate systems | 1 Redis deployment |
Development
This project uses uv for dependency management.
# Install dependencies
uv sync --all-extras --group dev
# Run tests
uv run pytest --run-api-tests
# Format and lint
make format
make lint
# Type check
make mypy
# Build documentation
make docs
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ by Redis for the OpenAI Agents SDK community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redis_openai_agents-0.1.0.tar.gz.
File metadata
- Download URL: redis_openai_agents-0.1.0.tar.gz
- Upload date:
- Size: 448.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d665b0426b43368f9582462a624e1dea50d9a541ee0a48bc39729f8392abd6c
|
|
| MD5 |
fc0298d64e9406b7bcadea090052844f
|
|
| BLAKE2b-256 |
11edac3af3cb505e8b5323b337b07e83f1b766a342a3df71ecf7a90665701440
|
File details
Details for the file redis_openai_agents-0.1.0-py3-none-any.whl.
File metadata
- Download URL: redis_openai_agents-0.1.0-py3-none-any.whl
- Upload date:
- Size: 97.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8d3441a8393a3a1a4c8122028a846c996e66133b387f8e105dccdf7bd62adfd
|
|
| MD5 |
5c67261b5f3434578f5a019b1ce87c46
|
|
| BLAKE2b-256 |
56b203c2bf222e9b01ee7a3b51d9460eccd9d3a363e9ee35e8da114202343bd4
|