Skip to main content

High-recall conversational memory retrieval. 98% R@5 on LongMemEval, 94% on LoCoMo — no LLM required. Local-first, cloud-ready.

Project description

Engram logo

Engram

High-recall memory for Conversational agents
93.9% LoCoMo • Zero LLM calls • Local-first, cloud-ready

Website PyPI CI License Python


Why Engram

LLMs are getting better — but memory is still broken.

Even the best agents:

  • forget past interactions
  • lose long-term context
  • rely on expensive reprocessing

Engram fixes this at the infrastructure layer.

No LLM calls at query time. No summarization. No paraphrasing. Your exact words, retrieved with state-of-the-art recall.

Without memory infrastructure With Engram
✕ Forgets past turns ✓ 93.9% recall across sessions
✕ Re-embeds or paraphrases on every call ✓ Exact words, retrieved verbatim
✕ $ per query, rate limits, prompt drift ✓ $0 per query, deterministic, reproducible

Benchmark Results

Tested on two major benchmarks — no LLM required, zero cost per query.

LongMemEval (500 questions)

Metric Score
R@5 98.4% (492/500)
R@10 99.4%
NDCG@5 0.934
Question Type R@5
knowledge-update 98.7%
multi-session 99.2%
single-session-assistant 100.0%
single-session-user 100.0%
temporal-reasoning 97.0%
single-session-preference 93.3%

LoCoMo (1982 questions, 10 conversations)

Metric Score
R@5 93.9% (1862/1982)
R@10 95.0%
NDCG@5 0.894
Category R@5 R@10
Single-hop (factual) 90.4% 93.3%
Temporal (dates) 93.1% 94.7%
Multi-hop (inference) 75.0% 78.3%
Contextual (details) 97.1% 97.5%
Adversarial (speaker) 94.6% 94.8%

Reported with --mode rerank (chunking + cross-encoder reranker + speaker-name injection).

What It Does

Engram stores conversation history and retrieves it with state-of-the-art accuracy. It uses a three-stage retrieval pipeline — dense embeddings, sparse keyword matching, and cross-encoder reranking — to achieve higher recall than systems relying on LLM-based extraction or summarization.

Nothing is summarized. Nothing is paraphrased. Your exact words are stored and returned.

How It Compares

LoCoMo Benchmark Comparison

Disclaimer: Results are compiled from multiple papers and evaluation reports. They are not directly comparable due to differences in backbone LLMs, prompting strategies, and evaluation setups.

System LoCoMo Accuracy LLM Required Open Source Source
Engram 93.9% (R@5) No Yes (MIT) This repo (reproducible)
EverMemOS 86.76% – 93.05% Yes No arXiv:2601.02163
Zep 85.22% Yes Partial EverMemOS evaluation
MemOS 80.76% Yes Partial EverMemOS evaluation
Mem0 64.20% Yes Partial EverMemOS evaluation
MemU 61.15% Yes Partial arXiv:2601.02163
Other LLM-based systems (Hindsight, MemGPT, Letta) ~83 – 92% Yes Varies Secondary reports
Non-LLM systems (SLM variants) ~74 – 75% No Yes Secondary reports

Engram is the top-performing system on LoCoMo — and the only one in the top tier with zero LLM calls at query time.

LongMemEval

Engram MemPalace Mem0
R@5 (LongMemEval) 98.4% 96.6%
Embedding model bge-large (1024d) all-MiniLM (384d) Varies
Sparse retrieval BM25 + RRF fusion Ad-hoc keyword overlap N/A
Reranking Cross-encoder (free) LLM call ($0.001/q) N/A
Indexing User + assistant + preference docs User turns only LLM-extracted facts
Cloud deployment Qdrant backend No Yes
LLM required No No (optional rerank) Yes

Install

pip install engram-search

Optional extras:

# With cloud backend (Qdrant)
pip install engram-search[cloud]

# With cross-encoder reranker
pip install engram-search[rerank]

# Everything (dev + cloud + rerank)
pip install engram-search[all]

Quickstart — CLI

# Initialize a memory store
engram init ./my_memories

# Ingest conversations
engram ingest conversations.json --store ./my_memories

# Search
engram search "why did we switch to GraphQL" --store ./my_memories

Quickstart — Python API

from engram.backends.faiss_backend import FaissBackend
from engram.backends.base import Document
from engram.ingestion.parser import session_to_documents
from engram.retrieval.embedder import Embedder
from engram.retrieval.pipeline import RetrievalPipeline

# Initialize
embedder = Embedder("bge-large")
backend = FaissBackend(path="./my_memories", dimension=1024)
pipeline = RetrievalPipeline(embedder=embedder)

# Ingest a conversation
turns = [
    {"role": "user", "content": "I'm switching our API from REST to GraphQL."},
    {"role": "assistant", "content": "What's driving the switch?"},
    {"role": "user", "content": "Too many round trips. Our mobile app makes 12 calls per screen."},
]
docs = session_to_documents(turns, session_id="session_1", timestamp="2025-01-15")
texts = [d["text"] for d in docs]
embeddings = embedder.encode_documents(texts)

documents = [
    Document(id=d["id"], text=d["text"], embedding=e.tolist(), metadata=d["metadata"])
    for d, e in zip(docs, embeddings)
]
backend.add(documents)

# Search
results = pipeline.search("why did we switch to GraphQL", documents=documents, top_k=3)
for r in results:
    print(r.text)

Quickstart — Cloud Mode

# Set up Qdrant (managed or self-hosted)
export ENGRAM_BACKEND=qdrant
export ENGRAM_QDRANT_URL=https://your-cluster.qdrant.io:6333
export ENGRAM_QDRANT_API_KEY=your-api-key

# Start the API server
pip install fastapi uvicorn
uvicorn engram.server:app --host 0.0.0.0 --port 8000

API Endpoints

Method Endpoint Description
POST /ingest Add conversations
POST /search Search memories
GET /health Health check
GET /stats Store statistics

Quickstart — MCP Server

Expose Engram as a Model Context Protocol tool for Claude Desktop, Cursor, Windsurf, Zed, and other MCP clients.

pip install "engram-search[mcp]"
engram init ./engram_store   # create a store

Add to claude_desktop_config.json (or your MCP client's equivalent):

{
  "mcpServers": {
    "engram": {
      "command": "engram-mcp",
      "env": {
        "ENGRAM_STORE_PATH": "/absolute/path/to/engram_store"
      }
    }
  }
}

Restart the client. Engram exposes three tools:

Tool Description
search_memory(query, top_k, min_score) Retrieve relevant memories
add_memory(text, metadata) Store a new memory fact
memory_stats() Count documents in the store

Use Cases

  • AI assistants with long-term memory — recall user preferences, past decisions, and prior context across sessions
  • Customer support agents — pull a customer's full history on every interaction without re-feeding transcripts to an LLM
  • Agent memory layer — give autonomous agents persistent memory across runs without blowing up the context window
  • Multi-session chatbots — resolve references to prior conversations ("like we discussed last week") without re-embedding history
  • RAG over conversations — index dialogues, meeting transcripts, or support tickets with higher recall than vanilla semantic search

Examples

Check out the interactive notebooks in examples/:

Notebook Description
Getting Started Ingest conversations, search memories, understand hybrid retrieval
Customer Support Build a support agent with full customer history recall
Personal Assistant AI assistant with long-term memory across conversations

Docker

# Local mode
docker compose up

# Or build and run directly
docker build -t engram .
docker run -p 8000:8000 -v engram_data:/data engram

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Engram                               │
│                                                             │
│  ┌────────────┐  ┌─────────────┐  ┌───────────────────┐    │
│  │ Ingestion  │  │   Index     │  │    Retrieval      │    │
│  │            │→ │             │→ │                   │    │
│  │ user+asst  │  │ FAISS (local│  │ 1. Dense (bi-enc) │    │
│  │ turns      │  │  or Qdrant  │  │ 2. BM25 (sparse)  │    │
│  │ preference │  │ (cloud)     │  │ 3. RRF fusion     │    │
│  │ extraction │  │             │  │ 4. Cross-encoder   │    │
│  └────────────┘  └─────────────┘  └───────────────────┘    │
│                                                             │
│  Local: FAISS + SQLite    Cloud: Qdrant + REST API          │
└─────────────────────────────────────────────────────────────┘

Run Benchmarks

LongMemEval

# Download dataset
curl -fsSL -o data/longmemeval_s_cleaned.json \
  https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json

pip install engram-search[all]

python benchmarks/longmemeval_bench.py data/longmemeval_s_cleaned.json --mode hybrid

LoCoMo

# Download dataset (from Snap Research)
curl -fsSL -o data/locomo10.json \
  https://raw.githubusercontent.com/snap-research/locomo/main/data/locomo10.json

python benchmarks/locomo_bench.py data/locomo10.json --mode rerank

Requirements

  • Python 3.9+
  • ~1.3 GB disk for bge-large embedding model (downloaded on first use)
  • No API keys required for local mode

Roadmap

  • LangChain + LlamaIndex integrations — drop-in memory modules for existing agent stacks
  • MCP server — expose Engram as a Model Context Protocol tool for Claude, Cursor, and other MCP clients
  • Streaming ingestion — append turns to a live session without re-indexing
  • Multi-tenant isolation — per-user namespaces for hosted deployments
  • Async API — non-blocking ingest/search for high-throughput workloads
  • More backends — pgvector, Weaviate, Pinecone adapters
  • Temporal reasoning boost — improved date-grounding for "when did we..." queries
  • Benchmark expansion — add MSC, DialogSum, and custom domain benchmarks

Have a use case we're missing? Open an issue.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engram_search-0.1.6.tar.gz (329.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

engram_search-0.1.6-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file engram_search-0.1.6.tar.gz.

File metadata

  • Download URL: engram_search-0.1.6.tar.gz
  • Upload date:
  • Size: 329.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for engram_search-0.1.6.tar.gz
Algorithm Hash digest
SHA256 aa12042d2135d9fef195ed5bb8aafeaf8ee5fcfa87cb1002404d5a3c183c8d0e
MD5 72e6bac411bd9a5b5a59963d3c5bcb0b
BLAKE2b-256 7fa66ff4ddd84fbf1a5388fa8f7f15e3b2dbaf82b17c94344bc53f1258f5c43e

See more details on using hashes here.

Provenance

The following attestation bundles were made for engram_search-0.1.6.tar.gz:

Publisher: publish.yml on Nitin-Gupta1109/engram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file engram_search-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: engram_search-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for engram_search-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 cfd3187358f939cc18b1758928b4016187dc1c075a37cfb1522345db5c94d779
MD5 e6c9b1537b186276e32e605ce9f8017d
BLAKE2b-256 c72572f04978167bbb70a2ce270a5c657fc86ee2cc8f77853738f50ab329e7e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for engram_search-0.1.6-py3-none-any.whl:

Publisher: publish.yml on Nitin-Gupta1109/engram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page