High-recall conversational memory retrieval. 98% R@5 on LongMemEval, 94% on LoCoMo — no LLM required. Local-first, cloud-ready.
Project description
Engram
High-recall memory for Conversational agents
93.9% LoCoMo • Zero LLM calls • Local-first, cloud-ready
Why Engram
LLMs are getting better — but memory is still broken.
Even the best agents:
- forget past interactions
- lose long-term context
- rely on expensive reprocessing
Engram fixes this at the infrastructure layer.
No LLM calls at query time. No summarization. No paraphrasing. Your exact words, retrieved with state-of-the-art recall.
| Without memory infrastructure | With Engram |
|---|---|
| ✕ Forgets past turns | ✓ 93.9% recall across sessions |
| ✕ Re-embeds or paraphrases on every call | ✓ Exact words, retrieved verbatim |
| ✕ $ per query, rate limits, prompt drift | ✓ $0 per query, deterministic, reproducible |
Benchmark Results
Tested on two major benchmarks — no LLM required, zero cost per query.
LongMemEval (500 questions)
| Metric | Score |
|---|---|
| R@5 | 98.4% (492/500) |
| R@10 | 99.4% |
| NDCG@5 | 0.934 |
| Question Type | R@5 |
|---|---|
| knowledge-update | 98.7% |
| multi-session | 99.2% |
| single-session-assistant | 100.0% |
| single-session-user | 100.0% |
| temporal-reasoning | 97.0% |
| single-session-preference | 93.3% |
LoCoMo (1982 questions, 10 conversations)
| Metric | Score |
|---|---|
| R@5 | 93.9% (1862/1982) |
| R@10 | 95.0% |
| NDCG@5 | 0.894 |
| Category | R@5 | R@10 |
|---|---|---|
| Single-hop (factual) | 90.4% | 93.3% |
| Temporal (dates) | 93.1% | 94.7% |
| Multi-hop (inference) | 75.0% | 78.3% |
| Contextual (details) | 97.1% | 97.5% |
| Adversarial (speaker) | 94.6% | 94.8% |
Reported with --mode rerank (chunking + cross-encoder reranker + speaker-name injection).
What It Does
Engram stores conversation history and retrieves it with state-of-the-art accuracy. It uses a three-stage retrieval pipeline — dense embeddings, sparse keyword matching, and cross-encoder reranking — to achieve higher recall than systems relying on LLM-based extraction or summarization.
Nothing is summarized. Nothing is paraphrased. Your exact words are stored and returned.
How It Compares
LoCoMo Benchmark Comparison
Disclaimer: Results are compiled from multiple papers and evaluation reports. They are not directly comparable due to differences in backbone LLMs, prompting strategies, and evaluation setups.
| System | LoCoMo Accuracy | LLM Required | Open Source | Source |
|---|---|---|---|---|
| Engram | 93.9% (R@5) | No | Yes (MIT) | This repo (reproducible) |
| EverMemOS | 86.76% – 93.05% | Yes | No | arXiv:2601.02163 |
| Zep | 85.22% | Yes | Partial | EverMemOS evaluation |
| MemOS | 80.76% | Yes | Partial | EverMemOS evaluation |
| Mem0 | 64.20% | Yes | Partial | EverMemOS evaluation |
| MemU | 61.15% | Yes | Partial | arXiv:2601.02163 |
| Other LLM-based systems (Hindsight, MemGPT, Letta) | ~83 – 92% | Yes | Varies | Secondary reports |
| Non-LLM systems (SLM variants) | ~74 – 75% | No | Yes | Secondary reports |
Engram is the top-performing system on LoCoMo — and the only one in the top tier with zero LLM calls at query time.
LongMemEval
| Engram | MemPalace | Mem0 | |
|---|---|---|---|
| R@5 (LongMemEval) | 98.4% | 96.6% | — |
| Embedding model | bge-large (1024d) | all-MiniLM (384d) | Varies |
| Sparse retrieval | BM25 + RRF fusion | Ad-hoc keyword overlap | N/A |
| Reranking | Cross-encoder (free) | LLM call ($0.001/q) | N/A |
| Indexing | User + assistant + preference docs | User turns only | LLM-extracted facts |
| Cloud deployment | Qdrant backend | No | Yes |
| LLM required | No | No (optional rerank) | Yes |
Install
pip install engram-search
Optional extras:
# With cloud backend (Qdrant)
pip install engram-search[cloud]
# With cross-encoder reranker
pip install engram-search[rerank]
# Everything (dev + cloud + rerank)
pip install engram-search[all]
Quickstart — CLI
# Initialize a memory store
engram init ./my_memories
# Ingest conversations
engram ingest conversations.json --store ./my_memories
# Search
engram search "why did we switch to GraphQL" --store ./my_memories
Quickstart — Python API
from engram.backends.faiss_backend import FaissBackend
from engram.backends.base import Document
from engram.ingestion.parser import session_to_documents
from engram.retrieval.embedder import Embedder
from engram.retrieval.pipeline import RetrievalPipeline
# Initialize
embedder = Embedder("bge-large")
backend = FaissBackend(path="./my_memories", dimension=1024)
pipeline = RetrievalPipeline(embedder=embedder)
# Ingest a conversation
turns = [
{"role": "user", "content": "I'm switching our API from REST to GraphQL."},
{"role": "assistant", "content": "What's driving the switch?"},
{"role": "user", "content": "Too many round trips. Our mobile app makes 12 calls per screen."},
]
docs = session_to_documents(turns, session_id="session_1", timestamp="2025-01-15")
texts = [d["text"] for d in docs]
embeddings = embedder.encode_documents(texts)
documents = [
Document(id=d["id"], text=d["text"], embedding=e.tolist(), metadata=d["metadata"])
for d, e in zip(docs, embeddings)
]
backend.add(documents)
# Search
results = pipeline.search("why did we switch to GraphQL", documents=documents, top_k=3)
for r in results:
print(r.text)
Quickstart — Cloud Mode
# Set up Qdrant (managed or self-hosted)
export ENGRAM_BACKEND=qdrant
export ENGRAM_QDRANT_URL=https://your-cluster.qdrant.io:6333
export ENGRAM_QDRANT_API_KEY=your-api-key
# Start the API server
pip install fastapi uvicorn
uvicorn engram.server:app --host 0.0.0.0 --port 8000
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest |
Add conversations |
POST |
/search |
Search memories |
GET |
/health |
Health check |
GET |
/stats |
Store statistics |
Quickstart — MCP Server
Expose Engram as a Model Context Protocol tool for Claude Desktop, Cursor, Windsurf, Zed, and other MCP clients.
pip install "engram-search[mcp]"
engram init ./engram_store # create a store
Add to claude_desktop_config.json (or your MCP client's equivalent):
{
"mcpServers": {
"engram": {
"command": "engram-mcp",
"env": {
"ENGRAM_STORE_PATH": "/absolute/path/to/engram_store"
}
}
}
}
Restart the client. Engram exposes three tools:
| Tool | Description |
|---|---|
search_memory(query, top_k, min_score) |
Retrieve relevant memories |
add_memory(text, metadata) |
Store a new memory fact |
memory_stats() |
Count documents in the store |
Use Cases
- AI assistants with long-term memory — recall user preferences, past decisions, and prior context across sessions
- Customer support agents — pull a customer's full history on every interaction without re-feeding transcripts to an LLM
- Agent memory layer — give autonomous agents persistent memory across runs without blowing up the context window
- Multi-session chatbots — resolve references to prior conversations ("like we discussed last week") without re-embedding history
- RAG over conversations — index dialogues, meeting transcripts, or support tickets with higher recall than vanilla semantic search
Examples
Check out the interactive notebooks in examples/:
| Notebook | Description |
|---|---|
| Getting Started | Ingest conversations, search memories, understand hybrid retrieval |
| Customer Support | Build a support agent with full customer history recall |
| Personal Assistant | AI assistant with long-term memory across conversations |
Docker
# Local mode
docker compose up
# Or build and run directly
docker build -t engram .
docker run -p 8000:8000 -v engram_data:/data engram
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Engram │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │ Ingestion │ │ Index │ │ Retrieval │ │
│ │ │→ │ │→ │ │ │
│ │ user+asst │ │ FAISS (local│ │ 1. Dense (bi-enc) │ │
│ │ turns │ │ or Qdrant │ │ 2. BM25 (sparse) │ │
│ │ preference │ │ (cloud) │ │ 3. RRF fusion │ │
│ │ extraction │ │ │ │ 4. Cross-encoder │ │
│ └────────────┘ └─────────────┘ └───────────────────┘ │
│ │
│ Local: FAISS + SQLite Cloud: Qdrant + REST API │
└─────────────────────────────────────────────────────────────┘
Run Benchmarks
LongMemEval
# Download dataset
curl -fsSL -o data/longmemeval_s_cleaned.json \
https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
pip install engram-search[all]
python benchmarks/longmemeval_bench.py data/longmemeval_s_cleaned.json --mode hybrid
LoCoMo
# Download dataset (from Snap Research)
curl -fsSL -o data/locomo10.json \
https://raw.githubusercontent.com/snap-research/locomo/main/data/locomo10.json
python benchmarks/locomo_bench.py data/locomo10.json --mode rerank
Requirements
- Python 3.9+
- ~1.3 GB disk for bge-large embedding model (downloaded on first use)
- No API keys required for local mode
Roadmap
- LangChain + LlamaIndex integrations — drop-in memory modules for existing agent stacks
- MCP server — expose Engram as a Model Context Protocol tool for Claude, Cursor, and other MCP clients
- Streaming ingestion — append turns to a live session without re-indexing
- Multi-tenant isolation — per-user namespaces for hosted deployments
- Async API — non-blocking ingest/search for high-throughput workloads
- More backends — pgvector, Weaviate, Pinecone adapters
- Temporal reasoning boost — improved date-grounding for "when did we..." queries
- Benchmark expansion — add MSC, DialogSum, and custom domain benchmarks
Have a use case we're missing? Open an issue.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file engram_search-0.1.7.tar.gz.
File metadata
- Download URL: engram_search-0.1.7.tar.gz
- Upload date:
- Size: 329.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38eca71321c11ba339c90eb559a7e8324e9cafc7c5b0aae22cbcdba28b351581
|
|
| MD5 |
ca3aa97603c5280a061f51895312d092
|
|
| BLAKE2b-256 |
93dd7d0bff0c0484ed2a9a5ccdce7e9c3dd5b2ab05f72e4047b5718999aefa0b
|
Provenance
The following attestation bundles were made for engram_search-0.1.7.tar.gz:
Publisher:
publish.yml on Nitin-Gupta1109/engram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
engram_search-0.1.7.tar.gz -
Subject digest:
38eca71321c11ba339c90eb559a7e8324e9cafc7c5b0aae22cbcdba28b351581 - Sigstore transparency entry: 1340621459
- Sigstore integration time:
-
Permalink:
Nitin-Gupta1109/engram@68e8ade65fc5631d4eda34b0ac40fffa17fd53d8 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/Nitin-Gupta1109
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@68e8ade65fc5631d4eda34b0ac40fffa17fd53d8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file engram_search-0.1.7-py3-none-any.whl.
File metadata
- Download URL: engram_search-0.1.7-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c82505127ead951cb5de0584b3ac83a9d8dd946852b53900a0f355e7ac90369f
|
|
| MD5 |
f0e0ff50ef677c5086a0c4d07e32149b
|
|
| BLAKE2b-256 |
1e129aae74966672e67e7a071c1d1231d5046304fed060718022f52fe66c70d8
|
Provenance
The following attestation bundles were made for engram_search-0.1.7-py3-none-any.whl:
Publisher:
publish.yml on Nitin-Gupta1109/engram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
engram_search-0.1.7-py3-none-any.whl -
Subject digest:
c82505127ead951cb5de0584b3ac83a9d8dd946852b53900a0f355e7ac90369f - Sigstore transparency entry: 1340621468
- Sigstore integration time:
-
Permalink:
Nitin-Gupta1109/engram@68e8ade65fc5631d4eda34b0ac40fffa17fd53d8 -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/Nitin-Gupta1109
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@68e8ade65fc5631d4eda34b0ac40fffa17fd53d8 -
Trigger Event:
release
-
Statement type: