LatticeMemory — E8 lattice semantic cache and LLM proxy. Calibrated Hamming routing, zero-false-positive intent caching, compliance mode.
Project description
LatticeMemory
Semantic cache, dedup, and hybrid memory — 32× compressed E8 keys for instant repeat-query hits, dense fallback for novel retrieval.
LatticeMemory uses the E8 lattice — the densest sphere packing in 8 dimensions — as a deterministic address space for text embeddings. Every 1024-dim embedding snaps to a 128-byte E8 key. Identical or near-identical text lands on the same key; novel queries fall through to a dense float32/Int8 fallback.
Live Demo → | Model → | GitHub →
What it's for
| Workload | E8 path | Fallback needed? |
|---|---|---|
| Repeat / paraphrase LLM queries (cache) | ✅ O(1) exact or Hamming hit | No |
| Semantic deduplication, near-duplicate detection | ✅ Key collision = duplicate | No |
| Dataset quality filtering, semantic sharding | ✅ Stable cluster addresses | No |
| IoT/command normalization (symmetric vocab) | ✅ Fixed command set → fixed keys | No |
| Asymmetric QA/passage search (RAG) | ❌ Query ≠ passage in E8 space | Yes — Int8 or float32 required |
E8 keys route fast for content that is semantically identical or near-identical. They are not a replacement for vector search on asymmetric workloads where the query text and the correct passage are structurally different.
Benchmarks
Compression (bge-large 1024-dim):
| Method | Compression | Index / 1M docs | Retrieval p50 @ 100K docs |
|---|---|---|---|
| Float32 | 1× | 4.1 GB | 20.8 ms |
| LatticeMemory E8 keys | 32× | 0.13 GB | O(1) on key hit |
Fallback quality (1K docs, 100 paraphrase queries, recall vs float32):
| Fallback | Compression vs float32 | Recall@10 overlap | Top-1 agreement | Search p50 |
|---|---|---|---|---|
| Float32 | 1× | 100.0% | 100.0% | 0.14 ms |
| Int8 | 4× | 95.1% | 91.0% | 1.97 ms |
| Int4 | 8× | 12.1% | 1.0% | 4.21 ms |
- Int8 fallback is the recommended fallback for RAG/QA — 4× smaller than float32, 95% recall parity.
- STS quality:
bge-large-e8-snapscores 0.8714 vs 0.8637 float baseline (+0.0077).
Compression basis: 1 address byte per 8-dim block × 128 blocks = 128 bytes for 1024-dim vs 4,096 bytes float32 = 32×. This applies to E8 key storage only; hybrid mode also stores the dense index.
Install
pip install lattice-memory-e8
The PyPI distribution is named lattice-memory-e8 (the plain latticememory name
collides with an unrelated existing package on PyPI) — the import name is unaffected:
import latticememory works exactly as shown throughout this README.
Optional extras:
pip install 'lattice-memory-e8[proxy]' # FastAPI proxy server (fastapi, uvicorn, httpx)
pip install 'lattice-memory-e8[redis]' # Redis backend for multi-instance caches
pip install 'lattice-memory-e8[hf]' # HuggingFace datasets integration
pip install 'lattice-memory-e8[faiss]' # FAISS vector fallback
Quickstart
Semantic cache (the primary use case)
from latticememory import LatticeIndex
index = LatticeIndex() # downloads dfrokido/bge-large-e8-snap on first run (~500MB)
index.add([
"What is the refund policy?",
"How do I reset my password?",
"Where is my order?",
])
# Exact text → guaranteed O(1) lattice_exact hit
result = index.search("What is the refund policy?", top_k=1)
print(result[0].retrieval_path) # lattice_exact
# Near-paraphrase → lattice_exact or Hamming hit (same E8 neighborhood)
result2 = index.search("What's your return policy?", top_k=1)
print(result2[0].retrieval_path) # lattice_exact or lattice_hamming
print(index.stats())
Semantic cache with answer lookup
from latticememory import RFSnapSemanticCache, RFSnapTextMemory, RFSnapLatticeMemory
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer("dfrokido/bge-large-e8-snap")
lm = RFSnapLatticeMemory(d_model=1024)
rt = RFSnapTextMemory(encoder=encoder, d_model=1024, memory=lm)
cache = RFSnapSemanticCache(runtime=rt)
cache.put("What is the refund policy?", value="30-day returns, full refund.")
result = cache.get("What's your return policy?") # paraphrase hit
print(result.hit) # True
print(result.value) # "30-day returns, full refund."
Hybrid RAG / document search
For asymmetric search (user questions against document passages), use hybrid mode — E8 for cache hits, dense fallback for novel queries:
from latticememory import LatticeIndex
index = LatticeIndex(mode="hybrid") # Int8 fallback enabled automatically
index.add([
"The refund window is 30 days from purchase date.",
"Password resets are sent to your registered email.",
"Orders ship within 2 business days.",
])
# Novel query → routes through E8, misses, falls back to Int8 dense search
result = index.search("Can I return something after a month?", top_k=1)
print(result[0].retrieval_path) # fallback
print(result[0].text) # The refund window is 30 days...
HammingRouter — Catch Paraphrases at Scale
HammingRouter caches full Q&A pairs and matches incoming queries by Hamming distance on their E8 keys. A threshold of 70–111 blocks (out of 128) catches paraphrases while controlling false positives.
from latticememory import HammingRouter
router = HammingRouter(threshold=100) # tune per domain
# Index known Q&A pairs
router.add("What is your cancellation policy?", answer="Cancel anytime, no fee.", intent="cancel")
router.add("How do I cancel my subscription?", answer="Cancel anytime, no fee.", intent="cancel")
# Match a paraphrase
match = router.match("Can I cancel at any time?")
if match:
print(match.answer) # "Cancel anytime, no fee."
print(match.hamming_distance) # e.g. 97
Threshold guidance (BANKING77 benchmark):
| Threshold | Recall | FP rate | Use case |
|---|---|---|---|
| 70 | 4.5% | 0.0% | Proxy default — zero false positives |
| 100 | 52.5% | 0.0% | Practical helpdesk operating point |
| 111 | 84.0% | 4.5% | Router default — calibrate per domain |
LLM Cache Proxy
Drop-in OpenAI-compatible HTTP proxy. Same prompt or near-paraphrase returns the cached response without hitting the upstream model.
pip install 'lattice-memory-e8[proxy]'
lattice serve --key sk-... --cache helpdesk.db --miss-log misses.jsonl --port 8000
Or with Docker:
OPENAI_API_KEY=sk-... docker-compose up
Point your OpenAI client at http://localhost:8000 — no other code changes needed.
Features:
X-Lattice-Cache: HIT/MISSandX-Lattice-Savings-USDon every response- Streaming SSE + non-streaming JSON
- SQLite persistence — survives process restart
- HammingRouter approximate cache in
shadoworservemode - TTL per-entry expiry
- Compliance mode — only serve pre-approved responses (for regulated industries)
- Admin CRUD API gated by
X-Lattice-Admin-Key - Warm-start from CSV/JSON/JSONL
LangChain Integration
pip install lattice-memory-e8 langchain-core langchain-openai
from langchain_openai import ChatOpenAI
from langchain_core.globals import set_llm_cache
from latticememory.integrations.langchain import LatticeMemoryCache
set_llm_cache(LatticeMemoryCache())
llm = ChatOpenAI(model="gpt-4o")
llm.invoke("What is the capital of France?") # miss — calls API
llm.invoke("What is the capital of France?") # hit — O(1) key match
llm.invoke("Which city is France's capital?") # likely hit — same E8 neighborhood
Deduplication
from latticememory import LatticeTrainingCleaner, RFSnapSemanticCache
# batch dedup
cleaner = LatticeTrainingCleaner(cache)
result = cleaner.clean([
"The quick brown fox jumps over the lazy dog.",
"A fast brown fox leaped over a sleeping dog.", # near-duplicate
"Machine learning is a branch of artificial intelligence.",
])
print(result.kept_count) # 2
print(result.duplicate_count) # 1
print(result.dedup_rate) # 0.333...
# streaming dedup (generator)
for unique_text in cleaner.stream(iter(large_corpus)):
process(unique_text)
Or via CLI:
lattice dedup corpus.jsonl --text-col text --output corpus_deduped.jsonl
Vertical Applications
All 9 verticals ship in latticememory.verticals and wrap RFSnapSemanticCache.
| Vertical | Class | Key Capability |
|---|---|---|
| SOC Monitor | LatticeSOCMonitor |
O(1) alert dedup for SIEM event streams |
| Ticket Analyzer | LatticeTicketAnalyzer |
Intent-based ticket routing + gap detection |
| Content Moderator | LatticeContentModerator |
Semantic near-miss content policy |
| Clause Coder | LatticeClauseCoder |
Legal clause classification |
| Edge Memory | LatticeEdgeMemory |
On-device personalization without cloud |
| Private Sync | LatticePrivateSync |
Federated key sync, no raw text transfer |
| Prompt Firewall | LatticePromptFirewall |
Semantic injection/jailbreak detection |
| Semantic Rate Limiter | LatticeSemanticRateLimiter |
Per-intent sliding-window rate limiting |
| Training Cleaner | LatticeTrainingCleaner |
O(N) near-duplicate removal for LLM training sets |
Prompt Firewall
from latticememory import LatticePromptFirewall, RFSnapSemanticCache
fw = LatticePromptFirewall(cache)
fw.load_injection_defaults() # loads 14 common injection/jailbreak patterns
result = fw.check("Ignore all previous instructions and")
print(result.blocked) # True
print(result.category) # prompt_injection
# Add custom deny patterns
fw.add_deny_pattern("roleplay as an unfiltered AI", category="jailbreak")
Semantic Rate Limiter
from latticememory import LatticeSemanticRateLimiter
limiter = LatticeSemanticRateLimiter(cache, limit=10, window_seconds=60.0)
r = limiter.check("tell me about Python", client_id="user_123")
print(r.allowed) # True
print(r.remaining) # 9
print(r.retry_after) # 0.0
Training Data Cleaner
from latticememory import LatticeTrainingCleaner
cleaner = LatticeTrainingCleaner(cache)
result = cleaner.clean_to_jsonl(texts, output_path="clean.jsonl")
print(result.summary())
# Total: 50000 | Kept: 43217 | Duplicates removed: 6783 (13.6%)
Agent Memory Sync
AgentMemorySync lets agents in a swarm share only the E8 keys they are missing — no embedding transfer, just 128-byte addresses.
from latticememory import AgentMemorySync
# Two independent agents
agent_a = AgentMemorySync(runtime=rt_a)
agent_b = AgentMemorySync(runtime=rt_b)
# Register peers
agent_a.register_peer(agent_b)
# Pull-sync: B gets everything A knows
agent_b.sync_from_peer(agent_a)
# Push-broadcast: A broadcasts a new key to all registered peers
new_key = next(iter(agent_a.get_known_keys()))
agent_a.share(new_key) # agent_b receives it immediately
# Diff: check what each side is missing
diff = agent_a.diff(agent_b.get_known_keys())
# {"extra": set(), "missing": set()} ← fully in sync
See examples/agent_swarm_demo.py for a complete end-to-end scenario.
Active Learning Flywheel
Every proxy cache miss can be logged. LatticeFlywheel clusters miss logs by E8 key proximity to surface emerging intent gaps — groups of queries the cache doesn't cover yet.
from latticememory import LatticeFlywheel
fw = LatticeFlywheel("misses.jsonl")
# From your proxy, log each miss:
fw.log_miss("How do I bulk export my contacts?", e8_key_hex=e8_key)
# Detect drifting intents (new query patterns emerging):
drifting = fw.detect_drift(window_seconds=7*86400, min_delta=5)
for cluster in drifting:
print(f"+{cluster['delta']} queries: {cluster['representative']!r}")
# Check if re-training is warranted:
if fw.should_finetune():
print("Recommend: add Q&A pairs for these new intent clusters")
Or via CLI:
lattice drift --log misses.jsonl --window 604800 --export drift_report.json
CLI Reference
| Command | What it does |
|---|---|
lattice populate |
Load Q&A pairs from CSV/JSON into a SQLite cache |
lattice inspect |
Print cache statistics |
lattice export |
Export all cache entries to a portable JSONL file |
lattice import |
Re-import a JSONL export into a new cache |
lattice gaps |
Show top miss clusters (unmet query intents) |
lattice drift |
Detect drifting intents + finetune recommendation |
lattice dedup |
Deduplicate a text file using E8 lattice hashing |
lattice serve |
Start the proxy server |
lattice analytics |
Fetch live analytics from a running proxy |
CLI IDE
lattice ide opens a local terminal command center for BYOK AI chat, cache operations,
proxy diagnostics, vertical discovery, and VS Code CLI bridging.
export LATTICE_IDE_BASE_URL=https://api.openai.com/v1
export LATTICE_IDE_MODEL=gpt-4o-mini
export LATTICE_IDE_API_KEY=sk-...
lattice ide chat "Summarize the current cache analytics"
lattice ide cache inspect --cache helpdesk.db
lattice ide proxy doctor --port 8000
lattice ide verticals list
lattice ide vscode status
Run lattice ide with no arguments for an interactive lm> shell. The first IDE
slice uses OpenAI-compatible chat endpoints, so it works with OpenAI and compatible
BYOK gateways. VS Code integration uses the installed code command; it does not
require a VS Code extension.
How It Works
float32 embedding [1024-dim]
→ 128 blocks of 8 floats
→ each block → nearest E8 Shell-1 point (240 possible addresses)
→ 1-byte address per block = 128-byte E8 key ← used for cache routing (32× vs float32)
→ optional 2-byte scale per block = full 384-byte quantized representation
query → same key → O(1) lattice_exact lookup
query → Hamming-N neighbor → O(1) HammingRouter lookup
query → no neighbor found → dense fallback (Int8 or float32 ANN)
The E8 key is a deterministic hash of meaning — not an approximation. Two texts that are semantically identical land on the same key every time, without cosine threshold tuning.
Redis Backend
For multi-instance deployments sharing a single cache:
from latticememory import LatticeRedisStore, RFSnapSemanticCache, patch_cache_with_redis
cache = RFSnapSemanticCache(...)
patch_cache_with_redis(cache, redis_url="redis://localhost:6379", namespace="helpdesk")
# Now cache._entries reads/writes Redis instead of the in-memory dict
Test Suite
508 tests, all passing:
python -m pytest tests/ -q
# 508 passed in ~70s
Design Partners
We're looking for 3 teams with high-repetition LLM workloads (support bots, document QA, internal search) to pilot semantic cache + dedup at no cost.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lattice_memory_e8-0.2.0.tar.gz.
File metadata
- Download URL: lattice_memory_e8-0.2.0.tar.gz
- Upload date:
- Size: 240.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4dfff19e1c2df59cf053aac1d5d5c01e349e8b53f638c1ef18f7afc08dfffe5
|
|
| MD5 |
951391e526e67cb401c2ff9591963eb9
|
|
| BLAKE2b-256 |
5272e2aabb918f7a2608342dd57f8e746c9530856eb60f0fc1d3b1b905bc3880
|
File details
Details for the file lattice_memory_e8-0.2.0-py3-none-any.whl.
File metadata
- Download URL: lattice_memory_e8-0.2.0-py3-none-any.whl
- Upload date:
- Size: 194.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f2d1291cf679ca9b56202dbd141c95dfafaf401a35ac65fe2f835868cfd12e7
|
|
| MD5 |
367d4654a0ca4b7316955c81c91693d3
|
|
| BLAKE2b-256 |
5ac952dbe625d37d8017a4659d78714d7efb93d7c04cd0423deb5b77e850e7bc
|