Skip to main content

Reusable session management and multi-layer Redis caching for conversational AI

Project description

lix_open_cache

Standalone multi-layer caching and session management for conversational AI. Extracted from lixSearch into a reusable, pip-installable package.

Drop it into any chatbot, search assistant, or RAG pipeline to get production-grade session memory, semantic caching, and compressed disk archival out of the box.

pip install lix-open-cache

Research Paper

This library is described in detail in our research paper:

A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware Ayushman Bhattacharya (Pollinations.ai), 2026 Read the paper (PDF)

The paper covers the origin story (building a cost-effective alternative to SearchGPT), the architecture and design decisions behind each caching layer, production evaluation on an 8-vCPU server (89.3% hit rate, 0.1ms latency, 1,000x cost reduction), and the Huffman compression scheme for conversation archival.

If you use this library in your research, please cite:

@article{bhattacharya2026lixcache,
  title={A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware},
  author={Bhattacharya, Ayushman},
  year={2026},
  url={https://github.com/pollinations/lixsearch/blob/main/docs/paper/lix_cache_paper.pdf},
  note={Licensed under CC BY-NC-ND 4.0}
}

What it solves

Problem Layer Solution
"What did we just talk about?" Session Context Window (Redis DB 2) Rolling window of 20 messages in Redis, overflow to Huffman-compressed disk
"Didn't we already answer this?" Semantic Query Cache (Redis DB 0) Cache LLM responses keyed by embedding similarity (cosine ≥ 0.90)
"We already embedded this URL" URL Embedding Cache (Redis DB 1) Global cache of URL → embedding vector, shared across sessions

Architecture

User message arrives
│
├─ ① SessionContextWindow (Redis DB 2)
│   ├─ get_context() → last 20 messages from Redis
│   ├─ If Redis empty → load from .huff archive → re-hydrate
│   └─ Inject into LLM prompt as conversation history
│
├─ ② SemanticCacheRedis (Redis DB 0)
│   ├─ Compute query embedding vector
│   ├─ cosine_similarity(cached, new) ≥ 0.90?
│   │   ├─ HIT  → return cached response (skip LLM)
│   │   └─ MISS → continue pipeline
│   └─ After LLM: cache (embedding, response) for 5 min
│
├─ ③ URLEmbeddingCache (Redis DB 1)
│   ├─ Before embedding a URL: check Redis
│   │   ├─ HIT  → use cached vector (~0ms vs ~200ms)
│   │   └─ MISS → compute, cache for 24h
│   └─ Global (shared across all sessions)
│
└─ HybridConversationCache (backing store)
    ├─ Hot: Redis ordered list (LPUSH/RPOP, 20-msg window)
    ├─ Cold: Huffman-compressed .huff files on disk
    ├─ Overflow: oldest messages spill hot → cold
    └─ LRU daemon: idle 2h → migrate all to disk, free Redis

Package structure

lix_open_cache/
├── pyproject.toml
├── README.md
└── lix_open_cache/
    ├── __init__.py              # public API
    ├── config.py                # CacheConfig dataclass
    ├── redis_pool.py            # Connection-pooled Redis factory
    ├── huffman_codec.py         # Canonical Huffman encoder/decoder
    ├── conversation_archive.py  # .huff disk persistence
    ├── hybrid_cache.py          # Redis hot + disk cold + LRU eviction
    ├── semantic_cache.py        # SemanticCacheRedis + URLEmbeddingCache
    ├── context_window.py        # SessionContextWindow (wraps hybrid_cache)
    └── coordinator.py           # CacheCoordinator (orchestrates all 3)

Installation

From PyPI (once published):

pip install lix-open-cache

From source:

git clone https://github.com/pollinations/lixsearch.git
cd lixsearch/lix_open_cache
pip install -e .

Dependencies:

Package Version Why
redis ≥ 5.0 All three cache layers
numpy ≥ 1.24 Embedding vectors, cosine similarity
loguru ≥ 0.7 Structured logging
lz4 (optional) ≥ 4.0 Alternative compression method

Quick start

Full 3-layer setup

from lix_open_cache import CacheConfig, CacheCoordinator

config = CacheConfig(
    redis_host="localhost",
    redis_port=6379,
    redis_key_prefix="mychat",
    archive_dir="./data/conversations",
)

cache = CacheCoordinator(session_id="user-abc", config=config)

# Store messages
cache.add_message_to_context("user", "What's the weather in Tokyo?")
cache.add_message_to_context("assistant", "It's 22°C and sunny.")

# Retrieve context for next LLM call
history = cache.get_context_messages()

# Check semantic cache before calling LLM
import numpy as np
query_embedding = np.random.rand(384).astype(np.float32)
cached = cache.get_semantic_response("https://weather.com", query_embedding)
if cached:
    print("Cache hit — skip LLM")
else:
    response = {"answer": "22°C and sunny", "sources": ["..."]}
    cache.cache_semantic_response("https://weather.com", query_embedding, response)

Session memory only (no semantic cache)

from lix_open_cache import HybridConversationCache, CacheConfig

config = CacheConfig(redis_host="localhost", redis_port=6379)
cache = HybridConversationCache("session-123", config=config)

cache.add_message("user", "hello")
cache.add_message("assistant", "hey there!")

messages = cache.get_context()  # last 20 from Redis

# Smart retrieval: recent + semantically relevant from disk
context = cache.smart_context(
    query="what did we talk about yesterday?",
    query_embedding=your_embedding,
    recent_k=10,
    disk_k=5,
)
# → {"recent": [...last 10...], "relevant": [...5 from disk archive...]}

Disk-only (no Redis)

from lix_open_cache import ConversationArchive

archive = ConversationArchive("./data/chats", session_ttl_days=30)

archive.append_turn("sess-1", {"role": "user", "content": "hello"})
archive.append_turn("sess-1", {"role": "assistant", "content": "hi!"})

turns = archive.load_all("sess-1")
recent = archive.load_recent("sess-1", 5)
results = archive.search_by_text("sess-1", "hello", top_k=3)

archive.cleanup_expired()

Just the Huffman codec

from lix_open_cache import HuffmanCodec
from lix_open_cache.huffman_codec import encode_str, decode_bytes

text = "The quick brown fox jumps over the lazy dog" * 100
compressed = encode_str(text)
restored = decode_bytes(compressed)
assert restored == text
print(f"{len(text)}B → {len(compressed)}B ({len(compressed)/len(text)*100:.0f}%)")

Configuration

All tunables live in a single CacheConfig dataclass. No global state, no scattered constants.

from lix_open_cache import CacheConfig

config = CacheConfig(
    # Redis connection
    redis_host="redis.internal",
    redis_port=6379,
    redis_password="secret",
    redis_key_prefix="mychat",
    redis_pool_size=50,

    # Session context window (Redis DB 2)
    session_redis_db=2,
    session_ttl_seconds=86400,          # 24h
    hot_window_size=20,                 # messages kept in Redis
    session_max_tokens=None,            # no token limit

    # Semantic query cache (Redis DB 0)
    semantic_redis_db=0,
    semantic_ttl_seconds=300,           # 5 min
    semantic_similarity_threshold=0.90, # cosine similarity threshold
    semantic_max_items_per_url=50,

    # URL embedding cache (Redis DB 1)
    url_cache_redis_db=1,
    url_cache_ttl_seconds=86400,        # 24h

    # Disk archive
    archive_dir="./data/conversations",
    disk_ttl_days=14,                   # purge after 14 days

    # LRU eviction
    evict_after_minutes=120,            # 2h idle → migrate to disk
)

# Or from environment variables (12-factor apps):
# Reads MYAPP_REDIS_HOST, MYAPP_REDIS_PORT, MYAPP_SEMANTIC_TTL_SECONDS, etc.
config = CacheConfig.from_env("MYAPP")

Redis DB layout

Three logical databases on a single Redis server:

DB Layer TTL Scope What it stores
0 Semantic query cache 5 min Per-session (query_embedding, LLM response) pairs per URL
1 URL embedding cache 24h Global URL → float32 embedding vector
2 Session context window 24h Per-session Last 20 conversation messages

Separate DBs instead of key prefixes so you can FLUSHDB one layer without touching others, and monitor each independently via DBSIZE.

How each layer works

Session Context Window

add_message("user", "hello")
│
├─ LPUSH message_id to Redis ordered list
├─ SETEX message JSON with TTL
│
└─ Window > 20?
    ├─ Yes → RPOP oldest
    │        ├─ Append to .huff disk archive
    │        └─ DELETE from Redis
    └─ No → done
get_context()
│
├─ Redis has messages?
│   ├─ Yes → return them, refresh all TTLs
│   └─ No → session was evicted
│           ├─ Load from .huff archive
│           ├─ Re-hydrate Redis with last 20
│           └─ Return full history
│
└─ Redis down?
    └─ Read everything from disk (graceful fallback)

LRU eviction daemon: Background thread, checks every 60s. Session idle > evict_after_minutes → migrate all Redis messages to disk, free memory. When user returns, get_context() re-hydrates transparently.

smart_context(): Returns {"recent": [...], "relevant": [...]} — recent messages from Redis plus semantically relevant messages from the disk archive (matched by embedding cosine similarity).

Semantic Query Cache

Keyed by (session_id, URL, query_embedding). Each URL stores up to 50 (embedding, response) pairs.

On lookup: compute cosine similarity between the new query embedding and all cached embeddings for that URL. If any exceed 0.90 → cache hit, return the cached response, skip the LLM.

  • Per-session isolation (privacy)
  • 5-minute TTL (freshness)
  • Catches rephrasings: "weather Tokyo" vs "Tokyo weather forecast" → cosine ~0.94 → HIT

URL Embedding Cache

Global (shared across all sessions), 24h TTL. Maps URL → raw float32 bytes in Redis.

Computing embeddings costs ~200ms per URL. This cache means the embedding model only runs once per URL per day, regardless of how many sessions fetch it.

Hybrid storage: hot + cold

The two-tier architecture:

Hot (Redis): Ordered list of message IDs. Each message stored as a separate key with TTL. Fast reads (~1ms). Limited to hot_window_size messages per session.

Cold (Disk): Huffman-compressed .huff files. One file per session at {archive_dir}/{session_id}.huff. Self-contained binary format with a 24-byte header you can read without decompressing.

.huff file format

Offset  Size    Field
0       4B      Magic: "CAv1"
4       8B      created_at (float64 LE, unix timestamp)
12      8B      updated_at (float64 LE, unix timestamp)
20      4B      num_turns (uint32 LE)
24      var     Huffman-compressed JSON array of turn objects

Why Huffman over gzip?

Conversation text has very skewed byte frequencies (~18% spaces, ~13% 'e', ~0.07% 'z'). Huffman assigns shorter bit codes to frequent bytes. For small payloads (<100KB), this beats gzip because there's no dictionary overhead. ~54% compression ratio on typical conversation text. Pure Python, zero native dependencies.

Connection pooling

create_redis_client() maintains a global pool keyed by (host, port, db):

from lix_open_cache import create_redis_client, CacheConfig

config = CacheConfig(redis_host="localhost", redis_port=6379)

# First call: creates ConnectionPool, pings, returns client
client = create_redis_client(host="localhost", port=6379, db=2, config=config)

# Same (host, port, db): reuses existing pool
client = create_redis_client(host="localhost", port=6379, db=2, config=config)

Handles auth gracefully — tries with password first, falls back to no-auth on AuthenticationError.

API reference

CacheConfig

Method Description
CacheConfig(**kwargs) Create config with explicit values
CacheConfig.from_env(prefix) Load from env vars: {PREFIX}_REDIS_HOST, etc.

CacheCoordinator

Method Description
__init__(session_id, config?) Initialize all 3 layers
add_message_to_context(role, content, metadata?) Add to session window
get_context_messages() Get rolling window
get_formatted_context(max_lines?) Get as formatted string
get_semantic_response(url, query_embedding) Check semantic cache
cache_semantic_response(url, query_embedding, response) Store in semantic cache
get_url_embedding(url) Get cached URL embedding
cache_url_embedding(url, embedding) Cache URL embedding
batch_cache_url_embeddings(dict) Batch cache
clear_session_cache() Clear semantic + context
clear_context() Clear context only
get_stats() Stats from all 3 layers

SessionContextWindow

Method Description
__init__(session_id, config?, **kwargs) Create context window
add_message(role, content, metadata?) Add a message
get_context() Get hot window messages
get_full_history() All messages (Redis + disk)
smart_context(query, embedding?, recent_k?, disk_k?) Recent + relevant from disk
get_formatted_context(max_lines?) As formatted string
flush_to_disk() Force migrate Redis → disk
clear() Wipe Redis hot window
get_stats() Session statistics

HybridConversationCache

Method Description
__init__(session_id, config?, **kwargs) Create hybrid cache
add_message(role, content, metadata?, embedding?) Add message (auto-evicts overflow)
get_context() Hot window (auto re-hydrates from disk)
get_full() Merge hot + cold
smart_context(query, embedding?, recent_k?, disk_k?) Recent + relevant
flush_to_disk() Migrate Redis → disk
clear() Clear Redis keys
delete_session() Delete from Redis + disk
get_stats() Hot count, disk turns, sizes

ConversationArchive

Method Description
__init__(archive_dir, session_ttl_days?) Create archive
append_turn(session_id, turn) Append single turn
append_turns(session_id, turns) Batch append
load_all(session_id) Load all turns
load_recent(session_id, n) Load last N turns
search_by_text(session_id, query, top_k?) Text overlap search
search_by_embedding(session_id, embedding, top_k?) Cosine similarity search
delete_session(session_id) Delete archive file
session_exists(session_id) Check if .huff exists
get_metadata(session_id) Read header without decompressing
cleanup_expired() Purge sessions older than TTL
list_sessions() List all archived sessions

SemanticCacheRedis

Method Description
__init__(session_id, config?, **kwargs) Create semantic cache
get(url, query_embedding) Check for cached response
set(url, query_embedding, response) Cache a response
clear_session() Delete all entries for this session
get_stats() Cache statistics

URLEmbeddingCache

Method Description
__init__(session_id, config?, **kwargs) Create embedding cache
get(url) Get cached embedding (np.ndarray or None)
set(url, embedding) Cache an embedding
batch_set(url_embeddings) Batch cache
get_stats() Cache statistics

HuffmanCodec

Method Description
HuffmanCodec.encode(data: bytes) Compress bytes → bytes
HuffmanCodec.decode(data: bytes) Decompress bytes → bytes
encode_str(text: str) Compress string → bytes
decode_bytes(data: bytes) Decompress bytes → string

Publishing to PyPI

cd lix_open_cache
pip install build twine

# Build
python -m build

# Test on TestPyPI first
twine upload --repository testpypi dist/*
pip install --index-url https://test.pypi.org/simple/ lix-open-cache

# Publish to production PyPI
twine upload dist/*

For CI/CD, add a GitHub Actions workflow triggered on release:

name: Publish to PyPI
on:
  release:
    types: [published]
jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install build twine
      - run: cd lix_open_cache && python -m build
      - run: cd lix_open_cache && twine upload dist/*
        env:
          TWINE_USERNAME: __token__
          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}

License

MIT — same as lixSearch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lix_open_cache-2.1.4.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lix_open_cache-2.1.4-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file lix_open_cache-2.1.4.tar.gz.

File metadata

  • Download URL: lix_open_cache-2.1.4.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for lix_open_cache-2.1.4.tar.gz
Algorithm Hash digest
SHA256 f999ff74fa8b7a40b0deb974d8a0935876d963bdf82ac5a613304a7854de3d09
MD5 7394aeca7f189f310827d8eb3678d587
BLAKE2b-256 b486c74547d818c0cefba3175a467f2e9eb88229e116dbe380316c954c120b7d

See more details on using hashes here.

File details

Details for the file lix_open_cache-2.1.4-py3-none-any.whl.

File metadata

  • Download URL: lix_open_cache-2.1.4-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for lix_open_cache-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6a14050287210cad64fe1b643073c43a41aaeb6cf8c8d4eb4410fc29012ae5c9
MD5 658099674a6a9d4b0aa84e34947a5646
BLAKE2b-256 1cd1f905c9bc7823b204fdffc00ca7548b82e29a325dd96406ce9b6677111b28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page