Reusable session management and multi-layer Redis caching for conversational AI

Project description

lix_open_cache

Standalone multi-layer caching and session management for conversational AI. Extracted from lixSearch into a reusable, pip-installable package.

Drop it into any chatbot, search assistant, or RAG pipeline to get production-grade session memory, semantic caching, and compressed disk archival out of the box.

pip install lix-open-cache

Research Paper

This library is described in detail in our research paper:

A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware Ayushman Bhattacharya (Pollinations.ai), 2026 Read the paper (PDF)

The paper covers the origin story (building a cost-effective alternative to SearchGPT), the architecture and design decisions behind each caching layer, production evaluation on an 8-vCPU server (89.3% hit rate, 0.1ms latency, 1,000x cost reduction), and the Huffman compression scheme for conversation archival.

If you use this library in your research, please cite:

@article{bhattacharya2026lixcache,
  title={A Three-Layer Caching Architecture for Low-Latency LLM Web Search on Commodity CPU Hardware},
  author={Bhattacharya, Ayushman},
  year={2026},
  url={https://github.com/pollinations/lixsearch/blob/main/docs/paper/lix_cache_paper.pdf},
  note={Licensed under CC BY-NC-ND 4.0}
}

What it solves

Problem	Layer	Solution
"What did we just talk about?"	Session Context Window (Redis DB 2)	Rolling window of 20 messages in Redis, overflow to Huffman-compressed disk
"Didn't we already answer this?"	Semantic Query Cache (Redis DB 0)	Cache LLM responses keyed by embedding similarity (cosine ≥ 0.90)
"We already embedded this URL"	URL Embedding Cache (Redis DB 1)	Global cache of URL → embedding vector, shared across sessions

Architecture

User message arrives
│
├─ ① SessionContextWindow (Redis DB 2)
│   ├─ get_context() → last 20 messages from Redis
│   ├─ If Redis empty → load from .huff archive → re-hydrate
│   └─ Inject into LLM prompt as conversation history
│
├─ ② SemanticCacheRedis (Redis DB 0)
│   ├─ Compute query embedding vector
│   ├─ cosine_similarity(cached, new) ≥ 0.90?
│   │   ├─ HIT  → return cached response (skip LLM)
│   │   └─ MISS → continue pipeline
│   └─ After LLM: cache (embedding, response) for 5 min
│
├─ ③ URLEmbeddingCache (Redis DB 1)
│   ├─ Before embedding a URL: check Redis
│   │   ├─ HIT  → use cached vector (~0ms vs ~200ms)
│   │   └─ MISS → compute, cache for 24h
│   └─ Global (shared across all sessions)
│
└─ HybridConversationCache (backing store)
    ├─ Hot: Redis ordered list (LPUSH/RPOP, 20-msg window)
    ├─ Cold: Huffman-compressed .huff files on disk
    ├─ Overflow: oldest messages spill hot → cold
    └─ LRU daemon: idle 2h → migrate all to disk, free Redis

Package structure

lix_open_cache/
├── pyproject.toml
├── README.md
└── lix_open_cache/
    ├── __init__.py              # public API
    ├── config.py                # CacheConfig dataclass
    ├── redis_pool.py            # Connection-pooled Redis factory
    ├── huffman_codec.py         # Canonical Huffman encoder/decoder
    ├── conversation_archive.py  # .huff disk persistence
    ├── hybrid_cache.py          # Redis hot + disk cold + LRU eviction
    ├── semantic_cache.py        # SemanticCacheRedis + URLEmbeddingCache
    ├── context_window.py        # SessionContextWindow (wraps hybrid_cache)
    └── coordinator.py           # CacheCoordinator (orchestrates all 3)

Installation

From PyPI (once published):

pip install lix-open-cache

From source:

git clone https://github.com/pollinations/lixsearch.git
cd lixsearch/lix_open_cache
pip install -e .

Dependencies:

Package	Version	Why
redis	≥ 5.0	All three cache layers
numpy	≥ 1.24	Embedding vectors, cosine similarity
loguru	≥ 0.7	Structured logging
lz4 (optional)	≥ 4.0	Alternative compression method

Quick start

Full 3-layer setup

from lix_open_cache import CacheConfig, CacheCoordinator

config = CacheConfig(
    redis_host="localhost",
    redis_port=6379,
    redis_key_prefix="mychat",
    archive_dir="./data/conversations",
)

cache = CacheCoordinator(session_id="user-abc", config=config)

# Store messages
cache.add_message_to_context("user", "What's the weather in Tokyo?")
cache.add_message_to_context("assistant", "It's 22°C and sunny.")

# Retrieve context for next LLM call
history = cache.get_context_messages()

# Check semantic cache before calling LLM
import numpy as np
query_embedding = np.random.rand(384).astype(np.float32)
cached = cache.get_semantic_response("https://weather.com", query_embedding)
if cached:
    print("Cache hit — skip LLM")
else:
    response = {"answer": "22°C and sunny", "sources": ["..."]}
    cache.cache_semantic_response("https://weather.com", query_embedding, response)

Session memory only (no semantic cache)

from lix_open_cache import HybridConversationCache, CacheConfig

config = CacheConfig(redis_host="localhost", redis_port=6379)
cache = HybridConversationCache("session-123", config=config)

cache.add_message("user", "hello")
cache.add_message("assistant", "hey there!")

messages = cache.get_context()  # last 20 from Redis

# Smart retrieval: recent + semantically relevant from disk
context = cache.smart_context(
    query="what did we talk about yesterday?",
    query_embedding=your_embedding,
    recent_k=10,
    disk_k=5,
)
# → {"recent": [...last 10...], "relevant": [...5 from disk archive...]}

Disk-only (no Redis)

from lix_open_cache import ConversationArchive

archive = ConversationArchive("./data/chats", session_ttl_days=30)

archive.append_turn("sess-1", {"role": "user", "content": "hello"})
archive.append_turn("sess-1", {"role": "assistant", "content": "hi!"})

turns = archive.load_all("sess-1")
recent = archive.load_recent("sess-1", 5)
results = archive.search_by_text("sess-1", "hello", top_k=3)

archive.cleanup_expired()

Just the Huffman codec

from lix_open_cache import HuffmanCodec
from lix_open_cache.huffman_codec import encode_str, decode_bytes

text = "The quick brown fox jumps over the lazy dog" * 100
compressed = encode_str(text)
restored = decode_bytes(compressed)
assert restored == text
print(f"{len(text)}B → {len(compressed)}B ({len(compressed)/len(text)*100:.0f}%)")

Configuration

All tunables live in a single CacheConfig dataclass. No global state, no scattered constants.

from lix_open_cache import CacheConfig

config = CacheConfig(
    # Redis connection
    redis_host="redis.internal",
    redis_port=6379,
    redis_password="secret",
    redis_key_prefix="mychat",
    redis_pool_size=50,

    # Session context window (Redis DB 2)
    session_redis_db=2,
    session_ttl_seconds=86400,          # 24h
    hot_window_size=20,                 # messages kept in Redis
    session_max_tokens=None,            # no token limit

    # Semantic query cache (Redis DB 0)
    semantic_redis_db=0,
    semantic_ttl_seconds=300,           # 5 min
    semantic_similarity_threshold=0.90, # cosine similarity threshold
    semantic_max_items_per_url=50,

    # URL embedding cache (Redis DB 1)
    url_cache_redis_db=1,
    url_cache_ttl_seconds=86400,        # 24h

    # Disk archive
    archive_dir="./data/conversations",
    disk_ttl_days=14,                   # purge after 14 days

    # LRU eviction
    evict_after_minutes=120,            # 2h idle → migrate to disk
)

# Or from environment variables (12-factor apps):
# Reads MYAPP_REDIS_HOST, MYAPP_REDIS_PORT, MYAPP_SEMANTIC_TTL_SECONDS, etc.
config = CacheConfig.from_env("MYAPP")

Redis DB layout

Three logical databases on a single Redis server:

DB	Layer	TTL	Scope	What it stores
0	Semantic query cache	5 min	Per-session	`(query_embedding, LLM response)` pairs per URL
1	URL embedding cache	24h	Global	URL → float32 embedding vector
2	Session context window	24h	Per-session	Last 20 conversation messages

Separate DBs instead of key prefixes so you can FLUSHDB one layer without touching others, and monitor each independently via DBSIZE.

How each layer works

Session Context Window

add_message("user", "hello")
│
├─ LPUSH message_id to Redis ordered list
├─ SETEX message JSON with TTL
│
└─ Window > 20?
    ├─ Yes → RPOP oldest
    │        ├─ Append to .huff disk archive
    │        └─ DELETE from Redis
    └─ No → done

get_context()
│
├─ Redis has messages?
│   ├─ Yes → return them, refresh all TTLs
│   └─ No → session was evicted
│           ├─ Load from .huff archive
│           ├─ Re-hydrate Redis with last 20
│           └─ Return full history
│
└─ Redis down?
    └─ Read everything from disk (graceful fallback)

LRU eviction daemon: Background thread, checks every 60s. Session idle > evict_after_minutes → migrate all Redis messages to disk, free memory. When user returns, get_context() re-hydrates transparently.

smart_context(): Returns {"recent": [...], "relevant": [...]} — recent messages from Redis plus semantically relevant messages from the disk archive (matched by embedding cosine similarity).

Semantic Query Cache

Keyed by (session_id, URL, query_embedding). Each URL stores up to 50 (embedding, response) pairs.

On lookup: compute cosine similarity between the new query embedding and all cached embeddings for that URL. If any exceed 0.90 → cache hit, return the cached response, skip the LLM.

Per-session isolation (privacy)
5-minute TTL (freshness)
Catches rephrasings: "weather Tokyo" vs "Tokyo weather forecast" → cosine ~0.94 → HIT

URL Embedding Cache

Global (shared across all sessions), 24h TTL. Maps URL → raw float32 bytes in Redis.

Computing embeddings costs ~200ms per URL. This cache means the embedding model only runs once per URL per day, regardless of how many sessions fetch it.

Hybrid storage: hot + cold

The two-tier architecture:

Hot (Redis): Ordered list of message IDs. Each message stored as a separate key with TTL. Fast reads (~1ms). Limited to hot_window_size messages per session.

Cold (Disk): Huffman-compressed .huff files. One file per session at {archive_dir}/{session_id}.huff. Self-contained binary format with a 24-byte header you can read without decompressing.

.huff file format

Offset  Size    Field
0       4B      Magic: "CAv1"
4       8B      created_at (float64 LE, unix timestamp)
12      8B      updated_at (float64 LE, unix timestamp)
20      4B      num_turns (uint32 LE)
24      var     Huffman-compressed JSON array of turn objects

Why Huffman over gzip?

Conversation text has very skewed byte frequencies (~18% spaces, ~13% 'e', ~0.07% 'z'). Huffman assigns shorter bit codes to frequent bytes. For small payloads (<100KB), this beats gzip because there's no dictionary overhead. ~54% compression ratio on typical conversation text. Pure Python, zero native dependencies.

Connection pooling

create_redis_client() maintains a global pool keyed by (host, port, db):

from lix_open_cache import create_redis_client, CacheConfig

config = CacheConfig(redis_host="localhost", redis_port=6379)

# First call: creates ConnectionPool, pings, returns client
client = create_redis_client(host="localhost", port=6379, db=2, config=config)

# Same (host, port, db): reuses existing pool
client = create_redis_client(host="localhost", port=6379, db=2, config=config)

Handles auth gracefully — tries with password first, falls back to no-auth on AuthenticationError.

API reference

CacheConfig

Method	Description
`CacheConfig(**kwargs)`	Create config with explicit values
`CacheConfig.from_env(prefix)`	Load from env vars: `{PREFIX}_REDIS_HOST`, etc.

CacheCoordinator

Method	Description
`__init__(session_id, config?)`	Initialize all 3 layers
`add_message_to_context(role, content, metadata?)`	Add to session window
`get_context_messages()`	Get rolling window
`get_formatted_context(max_lines?)`	Get as formatted string
`get_semantic_response(url, query_embedding)`	Check semantic cache
`cache_semantic_response(url, query_embedding, response)`	Store in semantic cache
`get_url_embedding(url)`	Get cached URL embedding
`cache_url_embedding(url, embedding)`	Cache URL embedding
`batch_cache_url_embeddings(dict)`	Batch cache
`clear_session_cache()`	Clear semantic + context
`clear_context()`	Clear context only
`get_stats()`	Stats from all 3 layers

SessionContextWindow

Method	Description
`__init__(session_id, config?, **kwargs)`	Create context window
`add_message(role, content, metadata?)`	Add a message
`get_context()`	Get hot window messages
`get_full_history()`	All messages (Redis + disk)
`smart_context(query, embedding?, recent_k?, disk_k?)`	Recent + relevant from disk
`get_formatted_context(max_lines?)`	As formatted string
`flush_to_disk()`	Force migrate Redis → disk
`clear()`	Wipe Redis hot window
`get_stats()`	Session statistics

HybridConversationCache

Method	Description
`__init__(session_id, config?, **kwargs)`	Create hybrid cache
`add_message(role, content, metadata?, embedding?)`	Add message (auto-evicts overflow)
`get_context()`	Hot window (auto re-hydrates from disk)
`get_full()`	Merge hot + cold
`smart_context(query, embedding?, recent_k?, disk_k?)`	Recent + relevant
`flush_to_disk()`	Migrate Redis → disk
`clear()`	Clear Redis keys
`delete_session()`	Delete from Redis + disk
`get_stats()`	Hot count, disk turns, sizes

ConversationArchive

Method	Description
`__init__(archive_dir, session_ttl_days?)`	Create archive
`append_turn(session_id, turn)`	Append single turn
`append_turns(session_id, turns)`	Batch append
`load_all(session_id)`	Load all turns
`load_recent(session_id, n)`	Load last N turns
`search_by_text(session_id, query, top_k?)`	Text overlap search
`search_by_embedding(session_id, embedding, top_k?)`	Cosine similarity search
`delete_session(session_id)`	Delete archive file
`session_exists(session_id)`	Check if .huff exists
`get_metadata(session_id)`	Read header without decompressing
`cleanup_expired()`	Purge sessions older than TTL
`list_sessions()`	List all archived sessions

SemanticCacheRedis

Method	Description
`__init__(session_id, config?, **kwargs)`	Create semantic cache
`get(url, query_embedding)`	Check for cached response
`set(url, query_embedding, response)`	Cache a response
`clear_session()`	Delete all entries for this session
`get_stats()`	Cache statistics

URLEmbeddingCache

Method	Description
`__init__(session_id, config?, **kwargs)`	Create embedding cache
`get(url)`	Get cached embedding (np.ndarray or None)
`set(url, embedding)`	Cache an embedding
`batch_set(url_embeddings)`	Batch cache
`get_stats()`	Cache statistics

HuffmanCodec

Method	Description
`HuffmanCodec.encode(data: bytes)`	Compress bytes → bytes
`HuffmanCodec.decode(data: bytes)`	Decompress bytes → bytes
`encode_str(text: str)`	Compress string → bytes
`decode_bytes(data: bytes)`	Decompress bytes → string

Publishing to PyPI

cd lix_open_cache
pip install build twine

# Build
python -m build

# Test on TestPyPI first
twine upload --repository testpypi dist/*
pip install --index-url https://test.pypi.org/simple/ lix-open-cache

# Publish to production PyPI
twine upload dist/*

For CI/CD, add a GitHub Actions workflow triggered on release:

name: Publish to PyPI
on:
  release:
    types: [published]
jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install build twine
      - run: cd lix_open_cache && python -m build
      - run: cd lix_open_cache && twine upload dist/*
        env:
          TWINE_USERNAME: __token__
          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}

License

MIT — same as lixSearch.

Project details

Release history Release notifications | RSS feed

2.1.6

Mar 27, 2026

2.1.5

Mar 26, 2026

This version

2.1.4

Mar 26, 2026

2.1.3

Mar 26, 2026

2.1.2

Mar 26, 2026

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lix_open_cache-2.1.4.tar.gz (23.1 kB view details)

Uploaded Mar 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lix_open_cache-2.1.4-py3-none-any.whl (20.8 kB view details)

Uploaded Mar 26, 2026 Python 3

File details

Details for the file lix_open_cache-2.1.4.tar.gz.

File metadata

Download URL: lix_open_cache-2.1.4.tar.gz
Upload date: Mar 26, 2026
Size: 23.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for lix_open_cache-2.1.4.tar.gz
Algorithm	Hash digest
SHA256	`f999ff74fa8b7a40b0deb974d8a0935876d963bdf82ac5a613304a7854de3d09`
MD5	`7394aeca7f189f310827d8eb3678d587`
BLAKE2b-256	`b486c74547d818c0cefba3175a467f2e9eb88229e116dbe380316c954c120b7d`

See more details on using hashes here.

File details

Details for the file lix_open_cache-2.1.4-py3-none-any.whl.

File metadata

Download URL: lix_open_cache-2.1.4-py3-none-any.whl
Upload date: Mar 26, 2026
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for lix_open_cache-2.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a14050287210cad64fe1b643073c43a41aaeb6cf8c8d4eb4410fc29012ae5c9`
MD5	`658099674a6a9d4b0aa84e34947a5646`
BLAKE2b-256	`1cd1f905c9bc7823b204fdffc00ca7548b82e29a325dd96406ce9b6677111b28`

See more details on using hashes here.

lix-open-cache 2.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

lix_open_cache

Research Paper

What it solves

Architecture

Package structure

Installation

Quick start

Full 3-layer setup

Session memory only (no semantic cache)

Disk-only (no Redis)

Just the Huffman codec

Configuration

Redis DB layout

How each layer works

Session Context Window

Semantic Query Cache

URL Embedding Cache

Hybrid storage: hot + cold

.huff file format

Why Huffman over gzip?

Connection pooling

API reference

CacheConfig

CacheCoordinator

SessionContextWindow

HybridConversationCache

ConversationArchive

SemanticCacheRedis

URLEmbeddingCache

HuffmanCodec

Publishing to PyPI

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes