Hierarchical memory with consolidation and principled decay for LLM agents and assistants.
Project description
Engram
Hierarchical memory with consolidation and principled decay for LLM agents and assistants.
Engram is a memory layer for LLM systems that does what existing memory libraries don't: it consolidates. Raw events get abstracted into general patterns, redundant or contradicted memories decay, and retrieval reads across a hierarchy from specific episodes to compressed knowledge — the way human memory actually works.
It is designed as a single primitive that serves both agents (procedural memory: "in situations like this, this approach worked") and assistants (semantic memory: "the user has a golden retriever they care about"), with the same algorithm and the same API.
Why Engram exists
Every production LLM system today has a memory problem. The complaint is universal — "it doesn't remember me" — and the typical solution is some variation of "dump everything into a vector store and search by cosine similarity."
That isn't memory. It's a logbook with a search bar.
Real memory does three things that current systems don't:
- Consolidates. Many specific events become one general principle. "User mentioned dog in conv 3, vet in conv 7, kibble in conv 12" becomes "user has a dog they actively care about."
- Forgets selectively. Routine, redundant, or contradicted memories decay; surprising, frequently-used, or recently-relevant ones strengthen.
- Reads across abstractions. Retrieval sometimes wants the general pattern, sometimes the specific episode, sometimes both. Flat stores can't do this cleanly.
Engram is built around these three principles.
The core idea
Engram organizes memory as a hierarchy with three things flowing through it:
┌──────────────────────────────┐
│ Consolidated abstractions │ ← retrieved when general
│ (semantic / procedural) │ patterns suffice
└──────────────▲───────────────┘
│ consolidation
│ (clustering + abstraction)
┌──────────────┴───────────────┐
│ Mid-level summaries │
│ (episode clusters) │
└──────────────▲───────────────┘
│
┌──────────────┴───────────────┐
│ Raw event log │ ← retrieved when
│ (episodic memory) │ specifics matter
└──────────────────────────────┘
│
▼
decay function
(recency, reinforcement,
corroboration, contradiction)
The four moving parts:
- Event log. Every observation lands here first, with provenance and timestamp.
- Consolidation pass. Periodically (or on trigger), the system clusters related events, extracts abstractions, and links them to their supporting evidence.
- Decay. Memories at every level have weights driven by reinforcement (was this retrieved? did it lead to a successful outcome?), corroboration (how many independent events support it?), contradiction (was it ever overruled?), and recency.
- Hierarchical retrieval. Queries read across the whole hierarchy, preferring abstractions when they suffice and drilling into specifics when they don't.
Quick start
pip install engrampy
The PyPI distribution name is
engrampy. The Python import name is justengram(sofrom engram import Memoryworks as expected). The bareengramandengram-memorynames on PyPI are squatted by unrelated parties — PEP 541 reclaim requests are pending.
from engram import Memory
memory = Memory(
backend="sqlite", # or "postgres", "duckdb"
embedding_model="text-embedding-3-small",
consolidation_model="gpt-4o-mini", # used for abstraction extraction
consolidation_interval=50, # consolidate every 50 events
)
That's the whole setup. Engram is model-agnostic — bring whichever LLM and embedding model you want for the consolidation and retrieval steps.
Usage
As assistant memory
# Observe events as they happen
memory.observe("User mentioned they have a golden retriever named Max.")
memory.observe("User asked about senior dog food.")
memory.observe("User said Max is 9 years old and slowing down.")
# ... many sessions later ...
context = memory.retrieve("what should I know about the user's pets?")
# Returns:
# [
# {
# "level": "abstraction",
# "content": "User has an aging golden retriever (Max, ~9yo) and is
# actively researching senior dog care.",
# "confidence": 0.91,
# "supported_by": [event_ids...],
# },
# ... lower-level episodes available if needed
# ]
As agent procedural memory
# Record what the agent tried and what happened
memory.observe({
"type": "procedure",
"situation": "API returned 429 rate limit",
"action": "exponential backoff with jitter, retry up to 5x",
"outcome": "success",
})
# Later, in a new situation
procedures = memory.retrieve_procedures(
situation="hitting 503 errors from downstream service"
)
# Returns consolidated procedures from analogous past situations,
# ranked by how often they worked.
Manual consolidation and inspection
# Trigger consolidation explicitly (e.g. during downtime)
memory.consolidate()
# Inspect what's been consolidated
memory.summary()
# Resolve a contradiction manually
memory.reconcile(memory_id="...", resolution="prefer_recent")
How it works under the hood
Consolidation
When triggered, the consolidation pass:
- Clusters recent unconsolidated events using embedding similarity, with a configurable cohesion threshold.
- Extracts abstractions from each cluster using a cheap LLM call. The prompt is structured to produce generalizations, not summaries.
- Links abstractions to their supporting events (provenance is always preserved).
- Detects contradictions with existing abstractions and flags them for resolution.
- Promotes stable, frequently-corroborated abstractions to higher levels of the hierarchy.
Decay
Each memory item carries a weight $w \in [0, 1]$. The weight evolves as:
$$ w_{t+1} = w_t \cdot \alpha^{\Delta t} + \beta \cdot r_t + \gamma \cdot c_t - \delta \cdot x_t $$
Where $r_t$ is reinforcement (was it retrieved and useful?), $c_t$ is new corroboration, $x_t$ is contradiction, and $\alpha, \beta, \gamma, \delta$ are tunable. Items below a threshold are pruned (or kept as cold storage, depending on configuration).
Retrieval
Retrieval is coarse-to-fine by default: search abstractions first, then drill into supporting episodes only if the query demands specifics or the top-level results are low-confidence. This is both faster and more grounded than flat vector search.
What makes Engram different
| Flat vector store | mem0 / similar | Engram | |
|---|---|---|---|
| Stores raw events | ✓ | ✓ | ✓ |
| Summarization | — | ✓ | ✓ |
| Multi-level hierarchy | — | — | ✓ |
| Principled decay | — | partial | ✓ |
| Contradiction handling | — | — | ✓ |
| Provenance tracking | — | partial | ✓ |
| Procedural memory | — | — | ✓ |
| Coarse-to-fine retrieval | — | — | ✓ |
The headline difference is that Engram treats memory as a living hierarchy that changes over time, not as a static append-only store with a search index. The downstream effects — better recall on long conversations, transferable agent procedures, graceful handling of stale or contradictory information — fall out of that.
Benchmarks
The success criterion for Engram is beating state-of-the-art on long-horizon memory benchmarks, not just being correct in principle.
Tracked suites:
- LongMemEval — long-horizon conversational memory.
- LoCoMo — multi-session dialogue with memory recall (especially the temporal and adversarial splits, where flat RAG breaks).
- Custom procedural transfer benchmark — does an agent with Engram do better on tasks it has seen analogues of? (Constructed from agent traces.)
Tracked baselines: Chroma (flat dense), Chroma + BM25 (hybrid), Letta / MemGPT, Zep / Graphiti, Cognee, HippoRAG, mem0, A-MEM, full-context (as upper bound).
The full plan — targets, why-we-think-we-can-win, and the reproducibility discipline — is in benchmarks/SOTA.md. The running comparison is in benchmarks/SCOREBOARD.md. A claim of "we beat X" requires a committed manifest in benchmarks/runs/; without one it doesn't count.
Roadmap
Stage-by-stage breakdown — including cross-cutting standards on speed, quality, and security — in ROADMAP.md. High-level milestones:
v0.1 — Core primitive. Event log, basic consolidation, decay, coarse-to-fine retrieval. SQLite backend. Reference benchmarks against flat vector store.
v0.2 — Procedural memory. First-class support for action/situation/outcome triples. Procedure retrieval API. Agent-framework integrations (LangGraph, LlamaIndex, raw OpenAI).
v0.3 — Contradiction and temporal reasoning. Trust-weighted conflict resolution, temporal segmentation ("X was true until March"), explicit invalidation.
v0.4 — Multi-tenant and production. Postgres backend, async API, observability, memory inspector UI.
v1.0 — Paper + stable API. Frozen public API, full benchmark suite, peer-reviewed paper.
Research
Engram is an applied research project as much as a library. The paper-track contributions:
- A formal framing of memory as a hierarchical decay process with measurable consolidation quality.
- Algorithmic choices (when to consolidate, what to abstract, how to decay) with ablations.
- A unified primitive for episodic→semantic consolidation and episodic→procedural abstraction.
- Benchmarks against existing memory libraries and flat baselines.
Drafts and notes live in /research.
Citation
If you use Engram in research, please cite (citation will be added when the paper is on arXiv):
@misc{engram2026,
title = {Engram: Hierarchical Memory with Consolidation and Decay for LLM Systems},
author = {[your name]},
year = {2026},
note = {Preprint forthcoming},
}
Contributing
Engram is early. The most useful contributions right now:
- Benchmark runs — reproducing baselines, finding failure modes.
- Algorithmic experiments — alternative consolidation strategies, decay functions, retrieval policies.
- Integrations — bindings for popular agent/RAG frameworks.
- Edge cases — adversarial conversations or agent traces that break the current implementation.
See CONTRIBUTING.md for setup and conventions.
License
MIT. See LICENSE.
Acknowledgments
Engram draws on ideas from cognitive neuroscience (complementary learning systems, episodic-to-semantic consolidation, Ebbinghaus decay), spaced repetition systems, and prior memory libraries (mem0, Letta/MemGPT, Zep). Standing on shoulders.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file engrampy-0.2.1.tar.gz.
File metadata
- Download URL: engrampy-0.2.1.tar.gz
- Upload date:
- Size: 288.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f2f2f0076640565b59647064ec2ff700e14fb6fd305cdcdf4f807f7228a4560
|
|
| MD5 |
11ed3af7186f19aa22e969a2c60d6774
|
|
| BLAKE2b-256 |
63c19df9fbcb3c4be2a373cddfd2dbc172f1f861a7f36d25de9b744e5bfb6a6f
|
File details
Details for the file engrampy-0.2.1-py3-none-any.whl.
File metadata
- Download URL: engrampy-0.2.1-py3-none-any.whl
- Upload date:
- Size: 194.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f27ee953088027100c79383726b18d7807028f7b6bafff24078f8f30c225f3c
|
|
| MD5 |
35a81a9d745413e37ab9e43599a419e6
|
|
| BLAKE2b-256 |
a87205e344f68ac48d83f1e6bfeffd27e3a0a5b376847be9f242d413abacdbfe
|