Skip to main content

Redis integrations for OpenAI Agents SDK - session management, vector search, caching, and observability

Project description

Redis

Redis OpenAI Agents

Production-ready Redis integrations for the OpenAI Agents SDK


Introduction

Redis OpenAI Agents is a production-ready Python library that provides Redis-powered infrastructure for the OpenAI Agents SDK. Replace 5+ separate systems with a single Redis deployment.

Sessions & Memory Caching & Search Streaming & Coordination
AgentSession
Persistent conversation storage
SemanticCache
Reduce LLM costs by 25%+
RedisStreamTransport
Reliable, replayable streaming
JSONSession
Complex nested data storage
RedisVectorStore
Fast similarity search
AgentCoordinator
Multi-agent orchestration
SemanticRouter
Intent-based agent routing
HybridSearchService
BM25 + vector combined
RobustStreamProcessor
Consumer groups & replay

Built for OpenAI Agents SDK

  • Drop-in Session Storage → Replace SQLite with distributed Redis sessions
  • Cost Reduction → Semantic caching reduces LLM API calls by 25%+
  • Production Streaming → Redis Streams for reliable token delivery
  • Multi-Agent Systems → Coordinate agents with atomic operations

Getting Started

Installation

Install redis-openai-agents into your Python (>=3.10) environment:

pip install redis-openai-agents

Redis

Choose from multiple Redis deployment options:

  1. Redis Cloud: Managed cloud database (free tier available)

  2. Redis (Docker): The official redis:8 image ships with Search, JSON, Time Series, and Bloom filters built in - no separate stack image required.

    docker run -d --name redis -p 6379:6379 redis:8
    
  3. Redis Enterprise: Commercial, self-hosted database

Want a GUI? Run Redis Insight separately: docker run -d --name redisinsight -p 5540:5540 redis/redisinsight:latest.


Overview

Agent Sessions

Replace SQLite sessions with Redis for persistent, distributed conversation storage:

from agents import Agent, Runner
from redis_openai_agents import AgentSession

# Create a session
session = AgentSession.create(
    user_id="user_123",
    redis_url="redis://localhost:6379"
)

# Define your agent
agent = Agent(name="assistant", instructions="You are a helpful assistant.")

# Run the agent
result = await Runner.run(agent, input="Hello!")

# Store the conversation
session.store_agent_result(result)

# Later: Load and continue the conversation
session = AgentSession.load(
    conversation_id=session.conversation_id,
    user_id="user_123",
    redis_url="redis://localhost:6379"
)

# Get conversation history in SDK format
history = session.to_agent_inputs()
result = await Runner.run(agent, input=history + [{"role": "user", "content": "Follow up"}])

An async-compatible JSON session is also available: JSONSession for complex nested data.

Semantic Caching

Reduce LLM costs by caching responses for similar queries:

from redis_openai_agents import SemanticCache

cache = SemanticCache(
    redis_url="redis://localhost:6379",
    distance_threshold=0.1,  # Similarity threshold (lower = stricter)
    ttl=3600                  # 1 hour TTL
)

# Check cache before calling LLM
result = cache.check(query="What is the capital of France?")
if result:
    print(f"Cache hit: {result.response}")
else:
    # Call LLM and store result
    response = "Paris is the capital of France."
    cache.store(query="What is the capital of France?", response=response)

Learn more about semantic caching.

Semantic Routing

Route queries to the appropriate agent using vector similarity - no LLM calls required:

from redis_openai_agents import SemanticRouter, Route

router = SemanticRouter(
    name="support-router",
    redis_url="redis://localhost:6379",
    routes=[
        Route(
            name="billing",
            references=["payment issue", "invoice", "refund request"],
            metadata={"agent": "billing_agent"},
            distance_threshold=0.3
        ),
        Route(
            name="technical",
            references=["bug report", "error message", "not working"],
            metadata={"agent": "tech_agent"},
            distance_threshold=0.3
        ),
    ]
)

# Route a query (vector lookup, not LLM call)
match = router.route("I need help with my payment")
print(f"Route to: {match.name}")  # "billing"

Learn more about semantic routing.

Vector Search (RAG)

Build RAG applications with Redis vector search:

from redis_openai_agents import RedisVectorStore

store = RedisVectorStore(
    name="knowledge-base",
    redis_url="redis://localhost:6379"
)

# Add documents
store.add_documents([
    {"content": "Redis is an in-memory data store.", "source": "docs"},
    {"content": "Python is a programming language.", "source": "wiki"},
])

# Search with metadata filtering
results = store.search(
    query="What is Redis?",
    k=5,
    filter={"source": "docs"}
)

for result in results:
    print(f"{result.content} (score: {result.score})")

Hybrid Search

Combine vector similarity with BM25 full-text search for better retrieval:

from redis_openai_agents import HybridSearchService

search = HybridSearchService(
    name="hybrid-search",
    redis_url="redis://localhost:6379"
)

# Search with both vector and text matching
results = search.search(
    query="Redis performance optimization",
    k=10,
    vector_weight=0.7,  # 70% vector similarity
    text_weight=0.3     # 30% BM25 text match
)

Token Streaming

Reliable, replayable token streaming via Redis Streams:

from redis_openai_agents import RedisStreamTransport, RobustStreamProcessor

# Publisher side
transport = RedisStreamTransport(
    stream_name="agent-output",
    redis_url="redis://localhost:6379"
)

await transport.publish({"type": "token", "data": {"text": "Hello"}})
await transport.publish({"type": "token", "data": {"text": " world!"}})
await transport.publish({"type": "complete", "data": {}})

# Consumer side with automatic recovery
processor = RobustStreamProcessor(
    stream_name="agent-output",
    consumer_group="clients",
    redis_url="redis://localhost:6379"
)

async for event in processor.process():
    if event["type"] == "token":
        print(event["data"]["text"], end="")

Supports consumer groups, automatic acknowledgment, and replay from any position.

Agent Coordination

Coordinate multiple agents with Redis pub/sub and atomic operations:

from redis_openai_agents import AgentCoordinator, EventType

coordinator = AgentCoordinator(
    session_id="multi-agent-session",
    redis_url="redis://localhost:6379"
)

# Agent 1: Signal handoff ready
await coordinator.publish(EventType.HANDOFF_READY, {
    "from_agent": "triage",
    "to_agent": "specialist",
    "context": {"topic": "billing"}
})

# Agent 2: Listen for handoffs
async for event in coordinator.subscribe():
    if event.type == EventType.HANDOFF_READY:
        print(f"Handoff from {event.data['from_agent']}")

Middleware for the Model Call

Compose cross-cutting concerns around the agent's LLM call with an around-style middleware protocol modelled on LangChain's AgentMiddleware:

from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
from openai import AsyncOpenAI

from redis_openai_agents import (
    MiddlewareStack, Route, SemanticCache, SemanticRouter,
)
from redis_openai_agents.middleware import (
    SemanticCacheMiddleware, SemanticRouterMiddleware,
)

router = SemanticRouter(
    name="support-router", redis_url="redis://localhost:6379",
    routes=[Route(name="greeting", references=["hello", "hi"])],
)
router_mw = SemanticRouterMiddleware(router=router, responses={"greeting": "Hi!"})

cache = SemanticCache(redis_url="redis://localhost:6379", similarity_threshold=0.92)
cache_mw = SemanticCacheMiddleware(cache=cache)

stack = MiddlewareStack(
    model=OpenAIResponsesModel(model="gpt-4o-mini", openai_client=AsyncOpenAI()),
    middlewares=[router_mw, cache_mw],  # outer-to-inner
)

agent = Agent(name="assistant", instructions="Be concise.", model=stack)
result = await Runner.run(agent, "hello")  # short-circuited by router

Ships with SemanticCacheMiddleware, SemanticRouterMiddleware, and ConversationMemoryMiddleware. Write your own: any object with an async awrap_model_call(request, handler) coroutine is a middleware.

Tool Result Caching

Memoize a tool's Python callable in Redis, keyed by argument hash. Side-effect prefixes (send_, delete_, ...) and volatile args (timestamp, now, ...) bypass the cache automatically.

from agents import function_tool
from redis_openai_agents import cached_tool


@function_tool
@cached_tool(name="lookup_company", redis_url="redis://localhost:6379", ttl=3600)
async def lookup_company(ticker: str) -> str:
    return await _hit_paid_api(ticker)

Metrics & Observability

Built-in observability with RedisTimeSeries and Prometheus:

from redis_openai_agents import AgentMetrics, PrometheusExporter

metrics = AgentMetrics(redis_url="redis://localhost:6379")

# Record metrics
await metrics.record_latency("agent_run", 150.5)
await metrics.record_tokens("gpt-4", input_tokens=100, output_tokens=50)
await metrics.record_cache_hit("semantic_cache")

# Get statistics
stats = await metrics.get_stats("latency", aggregation="avg", time_range="1h")

# Prometheus export (http://localhost:9090/metrics)
exporter = PrometheusExporter(metrics)
await exporter.start_server(port=9090)

Components

Sessions & Memory

Component Description
AgentSession Hash-based session storage built on RedisVL MessageHistory
JSONSession JSON document storage for complex nested session data
SemanticRouter Vector-based intent routing without LLM calls

Caching & Search

Component Description
SemanticCache Two-level cache (exact match + semantic similarity)
RedisCachingModel Model wrapper with automatic response caching
RedisVectorStore HNSW vector search for RAG applications
RedisFullTextSearch BM25 full-text search with filters
HybridSearchService Combined vector + text search with configurable weights

Streaming & Coordination

Component Description
RedisStreamTransport Redis Streams-based event transport
RobustStreamProcessor Consumer groups with automatic recovery
ResumableStreamRunner Checkpoint-based stream resumption
AgentCoordinator Multi-agent coordination via pub/sub
AtomicOperations Lua script-based atomic Redis operations

Observability

Component Description
AgentMetrics RedisTimeSeries metrics collection
PrometheusExporter Prometheus metrics endpoint
RedisTracingProcessor SDK-compatible trace storage in Redis Streams

SDK Integration

Component Description
RedisAgentRunner Enhanced runner with caching and metrics
RedisFileSearchTool Drop-in replacement for OpenAI file search
RedisRateLimitGuardrail SDK guardrail with Redis-backed rate limiting
MiddlewareStack Around-style middleware wrapping the SDK Model interface
SemanticCacheMiddleware Cache LLM responses by input similarity
SemanticRouterMiddleware Short-circuit matched intents with canned responses
ConversationMemoryMiddleware Inject semantically relevant past messages
cached_tool Decorator that memoizes a tool callable's result in Redis

Advanced Features

Component Description
RankedOperations Sorted set rankings for agents and tools
DeduplicationService Bloom filter request deduplication
RedisConnectionPool Connection pooling with retry logic

Examples

Example Description
01-routing-agents Multi-agent routing with handoffs
02-semantic-cache Reduce LLM costs with caching
03-vector-search Build RAG applications
04-full-text-search BM25 full-text search
05-token-streaming Real-time streaming with Redis Streams
06-time-series-metrics Observability with TimeSeries
07-full-stack-integration Complete integration example
08-runner-integration RedisAgentRunner usage
09-hybrid-search Combined vector + full-text search
10-agent-ranking Sorted set rankings
11-deduplication Bloom filter deduplication
12-agent-coordinator Multi-agent orchestration
13-robust-streaming Consumer groups & recovery
14-atomic-operations Lua script atomicity
15-semantic-router Intent-based routing
16-middleware Cache + router + composition around the Model
17-tool-caching @cached_tool for idempotent tools

Why Redis OpenAI Agents?

Challenge Without Redis With Redis OpenAI Agents
Session Storage SQLite (single-node) Distributed Redis sessions
Caching None or external service Built-in semantic cache
Vector Search Pinecone, Qdrant ($70+/mo) Redis Vector Search (free)
Streaming Custom WebSocket code Redis Streams (reliable)
Metrics Prometheus + Grafana setup Built-in TimeSeries
Total Services 5+ separate systems 1 Redis deployment

Development

This project uses uv for dependency management.

# Install dependencies
uv sync --all-extras --group dev

# Run tests
uv run pytest --run-api-tests

# Format and lint
make format
make lint

# Type check
make mypy

# Build documentation
make docs

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with ❤️ by Redis for the OpenAI Agents SDK community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redis_openai_agents-0.1.0.tar.gz (448.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redis_openai_agents-0.1.0-py3-none-any.whl (97.1 kB view details)

Uploaded Python 3

File details

Details for the file redis_openai_agents-0.1.0.tar.gz.

File metadata

  • Download URL: redis_openai_agents-0.1.0.tar.gz
  • Upload date:
  • Size: 448.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redis_openai_agents-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0d665b0426b43368f9582462a624e1dea50d9a541ee0a48bc39729f8392abd6c
MD5 fc0298d64e9406b7bcadea090052844f
BLAKE2b-256 11edac3af3cb505e8b5323b337b07e83f1b766a342a3df71ecf7a90665701440

See more details on using hashes here.

File details

Details for the file redis_openai_agents-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for redis_openai_agents-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8d3441a8393a3a1a4c8122028a846c996e66133b387f8e105dccdf7bd62adfd
MD5 5c67261b5f3434578f5a019b1ce87c46
BLAKE2b-256 56b203c2bf222e9b01ee7a3b51d9460eccd9d3a363e9ee35e8da114202343bd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page