Skip to main content

KV cache fingerprinting for persistent cross-session LLM memory. Fourier decomposition achieves 98% Recall@1 at 51µs.

Project description

ENGRAM Protocol

KV cache fingerprinting for persistent cross-session semantic retrieval.

ENGRAM extracts Fourier fingerprints from LLM KV caches, stores them as compact binary certificates (.eng files), and retrieves them via HNSW approximate nearest neighbor search. This enables cross-session memory, semantic deduplication, and KV cache restoration for large language models.

Key Features

  • Fourier fingerprinting of LLM KV caches (f0+f1 DFT decomposition)
  • 4-stage geodesic retrieval pipeline with confidence scoring
  • HNSW index (faiss) for sub-millisecond search at scale
  • Multi-architecture support: Llama, Gemma, Gemma 4 (ISWA), Phi, Qwen, Mistral
  • EIGENGRAM binary format (v1.2) - portable, versioned .eng certificates
  • MCP server for Claude Code session memory integration
  • Knowledge index for semantic search over markdown documentation

Metrics

Metric Value
Recall@1 (N=200) 100.0%
HNSW speedup 5.7x vs brute-force
Tests 220 passing
Architectures llama, gemma, gemma4/ISWA, phi, qwen, mistral

Quick Start

# Clone and setup
git clone https://github.com/engram-protocol/engram.git
cd engram
./scripts/setup.sh

# Run tests
source .venv/bin/activate
KMP_DUPLICATE_LIB_OK=TRUE OMP_NUM_THREADS=1 PYTHONPATH=. pytest tests/ -x -q

# Start the API server
cp .env.template .env
# Edit .env with your model path
engram-server

Installation

From source (recommended)

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install core dependencies
pip install -e .

# Optional: sentence-transformers embedder (recommended)
pip install -e ".[sbert]"

# Optional: MCP server for Claude Code
pip install -e ".[mcp]"

# Optional: development tools
pip install -e ".[dev]"

Requirements

  • Python >= 3.11
  • PyTorch >= 2.3.0
  • faiss-cpu >= 1.8.0 (pip install, not conda-forge)

Project Structure

kvcos/                     Core library
  core/                    Foundation: types, parsing, fingerprinting, extraction
    types.py               Type system (ModelCacheSpec, CacheSection, AttentionType)
    blob_parser.py         llama.cpp state blob parser (standard + ISWA)
    fingerprint.py         Fourier fingerprint computation (v1, v2, ISWA)
    cache_spec.py          Model registry (Llama, Gemma, Phi, Qwen, Mistral)
    state_extractor.py     MAR state extraction (SVD, mean_pool, xKV)
    manifold_index.py      FAISS IndexFlatIP wrapper
    retriever.py           EGR retrieval pipeline orchestrator
    serializer.py          EIGENGRAM compression codec
    config.py              Centralized pydantic-settings config
  engram/                  High-level retrieval and session memory
    retrieval.py           4-stage geodesic retrieval pipeline
    hnsw_index.py          HNSW index wrapper (EngramIndex)
    index_c.py             SQLite confidence history (IndexC)
    knowledge_index.py     HNSW over knowledge .eng files
    embedder.py            Unified fingerprint: llama_cpp > sbert > hash
    format.py              EIGENGRAM binary format v1.2 codec
    chunker.py             Markdown-aware semantic chunker
    manifest.py            Knowledge index manifest registry
    session_propagator.py  Session lifecycle manager
    metadata_disambiguate.py  Stage 4 metadata tiebreaker
  api/                     FastAPI REST API
    server.py              Application factory + lifespan
    routes.py              API route handlers
    schemas.py             Pydantic request/response models
  client/                  Python client library
    python_client.py       Sync HTTP client (EngramClient)
  storage/                 Storage backends
    local.py               Local filesystem backend
integrations/              External LLM runtime bridges
  llama_cpp_bridge.py      llama-cpp-python bridge (KV extraction + injection)
mcp/                       MCP server for Claude Code
  engram_memory.py         7 tools: session + knowledge memory
scripts/                   CLI utilities
  setup.sh                 One-command environment setup
  index_knowledge.py       Batch markdown indexer
  demo_agent_session.py    End-to-end demo
tests/                     220 tests (pytest)

Retrieval Pipeline

ENGRAM uses a 4-stage geodesic retrieval pipeline with confidence scoring:

Stage 0: Prior preemption     IndexC chronic failure -> skip HNSW
Stage 1: HNSW search          -> HIGH / MEDIUM confidence
Stage 2: Trajectory correction -> MEDIUM (interpolation w=0.3)
Stage 3: Negative constraints  -> LOW (apophatic layer)
Stage 4: Metadata disambig     -> LOW + stage4_used=True

Entry point: geodesic_retrieve_stage4() in kvcos/engram/retrieval.py

Configuration

Copy .env.template to .env and configure:

ENGRAM_PORT=8080              # API server port
ENGRAM_DATA_DIR=~/.engram/data  # Storage directory
ENGRAM_MODEL_PATH=            # Path to GGUF model file
ENGRAM_N_CTX=16384            # Context window size

All configuration uses the ENGRAM_ prefix via pydantic-settings.

MCP Integration (Claude Code)

ENGRAM includes an MCP server for persistent session memory:

# Register globally
claude mcp add --global engram-memory \
  -e ENGRAM_SESSIONS_DIR=~/.engram/sessions \
  -- python3 mcp/engram_memory.py

Tools: write_session_engram, get_last_session, retrieve_relevant_sessions, get_relevant_context, list_indexed, index_knowledge

Multi-Architecture Support

ENGRAM supports standard and ISWA (Interleaved Sliding Window Attention) models:

Architecture Type Fingerprint Dim
Llama 3.x Standard 2048
Gemma 2 Standard 2048
Gemma 4 26B ISWA (dual-cache) 6144
Phi 3 Mini Standard 768
Qwen 2.5 Standard 256
Mistral 7B Standard 2048

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engram_kv-1.0.0.tar.gz (148.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

engram_kv-1.0.0-py3-none-any.whl (83.8 kB view details)

Uploaded Python 3

File details

Details for the file engram_kv-1.0.0.tar.gz.

File metadata

  • Download URL: engram_kv-1.0.0.tar.gz
  • Upload date:
  • Size: 148.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for engram_kv-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b77e53b19988e51c597b73585d44697a410fbff542cb3b27d6188596469d9032
MD5 e479bbdb97a638c913d539c9324b9b7f
BLAKE2b-256 849b4689b8ee65e48e640a6dfdd646cfc679e5f092a9acf681c08969c0fb7d3a

See more details on using hashes here.

File details

Details for the file engram_kv-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: engram_kv-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 83.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for engram_kv-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c168892f418a2c7eda83f193ce87b945a2b5397ba6a06ae8a6ed96e279a065af
MD5 4e63367af43c69e1be3442ac0b4d9c57
BLAKE2b-256 a82248888199fb5e7281b3b9b8d93b1a935d4e388f5648efcb5dc7505dc2d76b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page