Reusable session management and multi-layer Redis caching for conversational AI
Project description
lix_open_cache
Standalone multi-layer caching and session management for conversational AI. Extracted from lixSearch into a reusable, pip-installable package.
Drop it into any chatbot, search assistant, or RAG pipeline to get production-grade session memory, semantic caching, and compressed disk archival out of the box.
pip install lix-open-cache
Research Paper
This library is described in detail in our research paper:
A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware Ayushman Bhattacharya (Pollinations.ai), 2026 Read the paper (PDF)
The paper covers the origin story (building a cost-effective alternative to SearchGPT), the architecture and design decisions behind each caching layer, production evaluation on an 8-vCPU server (89.3% hit rate, 0.1ms latency, 1,000x cost reduction), and the Huffman compression scheme for conversation archival.
If you use this library in your research, please cite:
@article{bhattacharya2026lixcache,
title={A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware},
author={Bhattacharya, Ayushman},
year={2026},
url={https://github.com/pollinations/lixsearch/blob/main/docs/paper/lix_cache_paper.pdf},
note={Licensed under CC BY-NC-ND 4.0}
}
What it solves
| Problem | Layer | Solution |
|---|---|---|
| "What did we just talk about?" | Session Context Window (Redis DB 2) | Rolling window of 20 messages in Redis, overflow to Huffman-compressed disk |
| "Didn't we already answer this?" | Semantic Query Cache (Redis DB 0) | Cache LLM responses keyed by embedding similarity (cosine ≥ 0.90) |
| "We already embedded this URL" | URL Embedding Cache (Redis DB 1) | Global cache of URL → embedding vector, shared across sessions |
Architecture
User message arrives
│
├─ ① SessionContextWindow (Redis DB 2)
│ ├─ get_context() → last 20 messages from Redis
│ ├─ If Redis empty → load from .huff archive → re-hydrate
│ └─ Inject into LLM prompt as conversation history
│
├─ ② SemanticCacheRedis (Redis DB 0)
│ ├─ Compute query embedding vector
│ ├─ cosine_similarity(cached, new) ≥ 0.90?
│ │ ├─ HIT → return cached response (skip LLM)
│ │ └─ MISS → continue pipeline
│ └─ After LLM: cache (embedding, response) for 5 min
│
├─ ③ URLEmbeddingCache (Redis DB 1)
│ ├─ Before embedding a URL: check Redis
│ │ ├─ HIT → use cached vector (~0ms vs ~200ms)
│ │ └─ MISS → compute, cache for 24h
│ └─ Global (shared across all sessions)
│
└─ HybridConversationCache (backing store)
├─ Hot: Redis ordered list (LPUSH/RPOP, 20-msg window)
├─ Cold: Huffman-compressed .huff files on disk
├─ Overflow: oldest messages spill hot → cold
└─ LRU daemon: idle 2h → migrate all to disk, free Redis
Package structure
lix_open_cache/
├── pyproject.toml
├── README.md
└── lix_open_cache/
├── __init__.py # public API
├── config.py # CacheConfig dataclass
├── redis_pool.py # Connection-pooled Redis factory
├── huffman_codec.py # Canonical Huffman encoder/decoder
├── conversation_archive.py # .huff disk persistence
├── hybrid_cache.py # Redis hot + disk cold + LRU eviction
├── semantic_cache.py # SemanticCacheRedis + URLEmbeddingCache
├── context_window.py # SessionContextWindow (wraps hybrid_cache)
└── coordinator.py # CacheCoordinator (orchestrates all 3)
Installation
From PyPI (once published):
pip install lix-open-cache
From source:
git clone https://github.com/pollinations/lixsearch.git
cd lixsearch/lix_open_cache
pip install -e .
Dependencies:
| Package | Version | Why |
|---|---|---|
| redis | ≥ 5.0 | All three cache layers |
| numpy | ≥ 1.24 | Embedding vectors, cosine similarity |
| loguru | ≥ 0.7 | Structured logging |
| lz4 (optional) | ≥ 4.0 | Alternative compression method |
Quick start
Full 3-layer setup
from lix_open_cache import CacheConfig, CacheCoordinator
config = CacheConfig(
redis_host="localhost",
redis_port=6379,
redis_key_prefix="mychat",
archive_dir="./data/conversations",
)
cache = CacheCoordinator(session_id="user-abc", config=config)
# Store messages
cache.add_message_to_context("user", "What's the weather in Tokyo?")
cache.add_message_to_context("assistant", "It's 22°C and sunny.")
# Retrieve context for next LLM call
history = cache.get_context_messages()
# Check semantic cache before calling LLM
import numpy as np
query_embedding = np.random.rand(384).astype(np.float32)
cached = cache.get_semantic_response("https://weather.com", query_embedding)
if cached:
print("Cache hit — skip LLM")
else:
response = {"answer": "22°C and sunny", "sources": ["..."]}
cache.cache_semantic_response("https://weather.com", query_embedding, response)
Session memory only (no semantic cache)
from lix_open_cache import HybridConversationCache, CacheConfig
config = CacheConfig(redis_host="localhost", redis_port=6379)
cache = HybridConversationCache("session-123", config=config)
cache.add_message("user", "hello")
cache.add_message("assistant", "hey there!")
messages = cache.get_context() # last 20 from Redis
# Smart retrieval: recent + semantically relevant from disk
context = cache.smart_context(
query="what did we talk about yesterday?",
query_embedding=your_embedding,
recent_k=10,
disk_k=5,
)
# → {"recent": [...last 10...], "relevant": [...5 from disk archive...]}
Disk-only (no Redis)
from lix_open_cache import ConversationArchive
archive = ConversationArchive("./data/chats", session_ttl_days=30)
archive.append_turn("sess-1", {"role": "user", "content": "hello"})
archive.append_turn("sess-1", {"role": "assistant", "content": "hi!"})
turns = archive.load_all("sess-1")
recent = archive.load_recent("sess-1", 5)
results = archive.search_by_text("sess-1", "hello", top_k=3)
archive.cleanup_expired()
Just the Huffman codec
from lix_open_cache import HuffmanCodec
from lix_open_cache.huffman_codec import encode_str, decode_bytes
text = "The quick brown fox jumps over the lazy dog" * 100
compressed = encode_str(text)
restored = decode_bytes(compressed)
assert restored == text
print(f"{len(text)}B → {len(compressed)}B ({len(compressed)/len(text)*100:.0f}%)")
Configuration
All tunables live in a single CacheConfig dataclass. No global state, no scattered constants.
from lix_open_cache import CacheConfig
config = CacheConfig(
# Redis connection
redis_host="redis.internal",
redis_port=6379,
redis_password="secret",
redis_key_prefix="mychat",
redis_pool_size=50,
# Session context window (Redis DB 2)
session_redis_db=2,
session_ttl_seconds=86400, # 24h
hot_window_size=20, # messages kept in Redis
session_max_tokens=None, # no token limit
# Semantic query cache (Redis DB 0)
semantic_redis_db=0,
semantic_ttl_seconds=300, # 5 min
semantic_similarity_threshold=0.90, # cosine similarity threshold
semantic_max_items_per_url=50,
# URL embedding cache (Redis DB 1)
url_cache_redis_db=1,
url_cache_ttl_seconds=86400, # 24h
# Disk archive
archive_dir="./data/conversations",
disk_ttl_days=14, # purge after 14 days
# LRU eviction
evict_after_minutes=120, # 2h idle → migrate to disk
)
# Or from environment variables (12-factor apps):
# Reads MYAPP_REDIS_HOST, MYAPP_REDIS_PORT, MYAPP_SEMANTIC_TTL_SECONDS, etc.
config = CacheConfig.from_env("MYAPP")
Redis DB layout
Three logical databases on a single Redis server:
| DB | Layer | TTL | Scope | What it stores |
|---|---|---|---|---|
| 0 | Semantic query cache | 5 min | Per-session | (query_embedding, LLM response) pairs per URL |
| 1 | URL embedding cache | 24h | Global | URL → float32 embedding vector |
| 2 | Session context window | 24h | Per-session | Last 20 conversation messages |
Separate DBs instead of key prefixes so you can FLUSHDB one layer without touching others, and monitor each independently via DBSIZE.
How each layer works
Session Context Window
add_message("user", "hello")
│
├─ LPUSH message_id to Redis ordered list
├─ SETEX message JSON with TTL
│
└─ Window > 20?
├─ Yes → RPOP oldest
│ ├─ Append to .huff disk archive
│ └─ DELETE from Redis
└─ No → done
get_context()
│
├─ Redis has messages?
│ ├─ Yes → return them, refresh all TTLs
│ └─ No → session was evicted
│ ├─ Load from .huff archive
│ ├─ Re-hydrate Redis with last 20
│ └─ Return full history
│
└─ Redis down?
└─ Read everything from disk (graceful fallback)
LRU eviction daemon: Background thread, checks every 60s. Session idle > evict_after_minutes → migrate all Redis messages to disk, free memory. When user returns, get_context() re-hydrates transparently.
smart_context(): Returns {"recent": [...], "relevant": [...]} — recent messages from Redis plus semantically relevant messages from the disk archive (matched by embedding cosine similarity).
Semantic Query Cache
Keyed by (session_id, URL, query_embedding). Each URL stores up to 50 (embedding, response) pairs.
On lookup: compute cosine similarity between the new query embedding and all cached embeddings for that URL. If any exceed 0.90 → cache hit, return the cached response, skip the LLM.
- Per-session isolation (privacy)
- 5-minute TTL (freshness)
- Catches rephrasings: "weather Tokyo" vs "Tokyo weather forecast" → cosine ~0.94 → HIT
URL Embedding Cache
Global (shared across all sessions), 24h TTL. Maps URL → raw float32 bytes in Redis.
Computing embeddings costs ~200ms per URL. This cache means the embedding model only runs once per URL per day, regardless of how many sessions fetch it.
Hybrid storage: hot + cold
The two-tier architecture:
Hot (Redis): Ordered list of message IDs. Each message stored as a separate key with TTL. Fast reads (~1ms). Limited to hot_window_size messages per session.
Cold (Disk): Huffman-compressed .huff files. One file per session at {archive_dir}/{session_id}.huff. Self-contained binary format with a 24-byte header you can read without decompressing.
.huff file format
Offset Size Field
0 4B Magic: "CAv1"
4 8B created_at (float64 LE, unix timestamp)
12 8B updated_at (float64 LE, unix timestamp)
20 4B num_turns (uint32 LE)
24 var Huffman-compressed JSON array of turn objects
Why Huffman over gzip?
Conversation text has very skewed byte frequencies (~18% spaces, ~13% 'e', ~0.07% 'z'). Huffman assigns shorter bit codes to frequent bytes. For small payloads (<100KB), this beats gzip because there's no dictionary overhead. ~54% compression ratio on typical conversation text. Pure Python, zero native dependencies.
Connection pooling
create_redis_client() maintains a global pool keyed by (host, port, db):
from lix_open_cache import create_redis_client, CacheConfig
config = CacheConfig(redis_host="localhost", redis_port=6379)
# First call: creates ConnectionPool, pings, returns client
client = create_redis_client(host="localhost", port=6379, db=2, config=config)
# Same (host, port, db): reuses existing pool
client = create_redis_client(host="localhost", port=6379, db=2, config=config)
Handles auth gracefully — tries with password first, falls back to no-auth on AuthenticationError.
API reference
CacheConfig
| Method | Description |
|---|---|
CacheConfig(**kwargs) |
Create config with explicit values |
CacheConfig.from_env(prefix) |
Load from env vars: {PREFIX}_REDIS_HOST, etc. |
CacheCoordinator
| Method | Description |
|---|---|
__init__(session_id, config?) |
Initialize all 3 layers |
add_message_to_context(role, content, metadata?) |
Add to session window |
get_context_messages() |
Get rolling window |
get_formatted_context(max_lines?) |
Get as formatted string |
get_semantic_response(url, query_embedding) |
Check semantic cache |
cache_semantic_response(url, query_embedding, response) |
Store in semantic cache |
get_url_embedding(url) |
Get cached URL embedding |
cache_url_embedding(url, embedding) |
Cache URL embedding |
batch_cache_url_embeddings(dict) |
Batch cache |
clear_session_cache() |
Clear semantic + context |
clear_context() |
Clear context only |
get_stats() |
Stats from all 3 layers |
SessionContextWindow
| Method | Description |
|---|---|
__init__(session_id, config?, **kwargs) |
Create context window |
add_message(role, content, metadata?) |
Add a message |
get_context() |
Get hot window messages |
get_full_history() |
All messages (Redis + disk) |
smart_context(query, embedding?, recent_k?, disk_k?) |
Recent + relevant from disk |
get_formatted_context(max_lines?) |
As formatted string |
flush_to_disk() |
Force migrate Redis → disk |
clear() |
Wipe Redis hot window |
get_stats() |
Session statistics |
HybridConversationCache
| Method | Description |
|---|---|
__init__(session_id, config?, **kwargs) |
Create hybrid cache |
add_message(role, content, metadata?, embedding?) |
Add message (auto-evicts overflow) |
get_context() |
Hot window (auto re-hydrates from disk) |
get_full() |
Merge hot + cold |
smart_context(query, embedding?, recent_k?, disk_k?) |
Recent + relevant |
flush_to_disk() |
Migrate Redis → disk |
clear() |
Clear Redis keys |
delete_session() |
Delete from Redis + disk |
get_stats() |
Hot count, disk turns, sizes |
ConversationArchive
| Method | Description |
|---|---|
__init__(archive_dir, session_ttl_days?) |
Create archive |
append_turn(session_id, turn) |
Append single turn |
append_turns(session_id, turns) |
Batch append |
load_all(session_id) |
Load all turns |
load_recent(session_id, n) |
Load last N turns |
search_by_text(session_id, query, top_k?) |
Text overlap search |
search_by_embedding(session_id, embedding, top_k?) |
Cosine similarity search |
delete_session(session_id) |
Delete archive file |
session_exists(session_id) |
Check if .huff exists |
get_metadata(session_id) |
Read header without decompressing |
cleanup_expired() |
Purge sessions older than TTL |
list_sessions() |
List all archived sessions |
SemanticCacheRedis
| Method | Description |
|---|---|
__init__(session_id, config?, **kwargs) |
Create semantic cache |
get(url, query_embedding) |
Check for cached response |
set(url, query_embedding, response) |
Cache a response |
clear_session() |
Delete all entries for this session |
get_stats() |
Cache statistics |
URLEmbeddingCache
| Method | Description |
|---|---|
__init__(session_id, config?, **kwargs) |
Create embedding cache |
get(url) |
Get cached embedding (np.ndarray or None) |
set(url, embedding) |
Cache an embedding |
batch_set(url_embeddings) |
Batch cache |
get_stats() |
Cache statistics |
HuffmanCodec
| Method | Description |
|---|---|
HuffmanCodec.encode(data: bytes) |
Compress bytes → bytes |
HuffmanCodec.decode(data: bytes) |
Decompress bytes → bytes |
encode_str(text: str) |
Compress string → bytes |
decode_bytes(data: bytes) |
Decompress bytes → string |
Publishing to PyPI
cd lix_open_cache
pip install build twine
# Build
python -m build
# Test on TestPyPI first
twine upload --repository testpypi dist/*
pip install --index-url https://test.pypi.org/simple/ lix-open-cache
# Publish to production PyPI
twine upload dist/*
For CI/CD, add a GitHub Actions workflow triggered on release:
name: Publish to PyPI
on:
release:
types: [published]
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install build twine
- run: cd lix_open_cache && python -m build
- run: cd lix_open_cache && twine upload dist/*
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
License
MIT — same as lixSearch.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lix_open_cache-2.1.4.tar.gz.
File metadata
- Download URL: lix_open_cache-2.1.4.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f999ff74fa8b7a40b0deb974d8a0935876d963bdf82ac5a613304a7854de3d09
|
|
| MD5 |
7394aeca7f189f310827d8eb3678d587
|
|
| BLAKE2b-256 |
b486c74547d818c0cefba3175a467f2e9eb88229e116dbe380316c954c120b7d
|
File details
Details for the file lix_open_cache-2.1.4-py3-none-any.whl.
File metadata
- Download URL: lix_open_cache-2.1.4-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a14050287210cad64fe1b643073c43a41aaeb6cf8c8d4eb4410fc29012ae5c9
|
|
| MD5 |
658099674a6a9d4b0aa84e34947a5646
|
|
| BLAKE2b-256 |
1cd1f905c9bc7823b204fdffc00ca7548b82e29a325dd96406ce9b6677111b28
|