Skip to main content

Hierarchical memory with consolidation and principled decay for LLM agents and assistants.

Project description

Engram

Hierarchical memory with consolidation and principled decay for LLM agents and assistants.

PyPI License: MIT Paper

Engram is a memory layer for LLM systems that does what existing memory libraries don't: it consolidates. Raw events get abstracted into general patterns, redundant or contradicted memories decay, and retrieval reads across a hierarchy from specific episodes to compressed knowledge — the way human memory actually works.

It is designed as a single primitive that serves both agents (procedural memory: "in situations like this, this approach worked") and assistants (semantic memory: "the user has a golden retriever they care about"), with the same algorithm and the same API.


Why Engram exists

Every production LLM system today has a memory problem. The complaint is universal — "it doesn't remember me" — and the typical solution is some variation of "dump everything into a vector store and search by cosine similarity."

That isn't memory. It's a logbook with a search bar.

Real memory does three things that current systems don't:

  1. Consolidates. Many specific events become one general principle. "User mentioned dog in conv 3, vet in conv 7, kibble in conv 12" becomes "user has a dog they actively care about."
  2. Forgets selectively. Routine, redundant, or contradicted memories decay; surprising, frequently-used, or recently-relevant ones strengthen.
  3. Reads across abstractions. Retrieval sometimes wants the general pattern, sometimes the specific episode, sometimes both. Flat stores can't do this cleanly.

Engram is built around these three principles.


The core idea

Engram organizes memory as a hierarchy with three things flowing through it:

                      ┌──────────────────────────────┐
                      │   Consolidated abstractions  │   ← retrieved when general
                      │   (semantic / procedural)    │      patterns suffice
                      └──────────────▲───────────────┘
                                     │  consolidation
                                     │  (clustering + abstraction)
                      ┌──────────────┴───────────────┐
                      │   Mid-level summaries        │
                      │   (episode clusters)         │
                      └──────────────▲───────────────┘
                                     │
                      ┌──────────────┴───────────────┐
                      │   Raw event log              │   ← retrieved when
                      │   (episodic memory)          │      specifics matter
                      └──────────────────────────────┘
                                     │
                                     ▼
                              decay function
                          (recency, reinforcement,
                           corroboration, contradiction)

The four moving parts:

  • Event log. Every observation lands here first, with provenance and timestamp.
  • Consolidation pass. Periodically (or on trigger), the system clusters related events, extracts abstractions, and links them to their supporting evidence.
  • Decay. Memories at every level have weights driven by reinforcement (was this retrieved? did it lead to a successful outcome?), corroboration (how many independent events support it?), contradiction (was it ever overruled?), and recency.
  • Hierarchical retrieval. Queries read across the whole hierarchy, preferring abstractions when they suffice and drilling into specifics when they don't.

Quick start

pip install engrampy

The PyPI distribution name is engrampy. The Python import name is just engram (so from engram import Memory works as expected). The bare engram and engram-memory names on PyPI are squatted by unrelated parties — PEP 541 reclaim requests are pending.

from engram import Memory

memory = Memory(
    backend="sqlite",                    # or "postgres", "duckdb"
    embedding_model="text-embedding-3-small",
    consolidation_model="gpt-4o-mini",   # used for abstraction extraction
    consolidation_interval=50,           # consolidate every 50 events
)

That's the whole setup. Engram is model-agnostic — bring whichever LLM and embedding model you want for the consolidation and retrieval steps.


Usage

As assistant memory

# Observe events as they happen
memory.observe("User mentioned they have a golden retriever named Max.")
memory.observe("User asked about senior dog food.")
memory.observe("User said Max is 9 years old and slowing down.")

# ... many sessions later ...

context = memory.retrieve("what should I know about the user's pets?")
# Returns:
# [
#   {
#     "level": "abstraction",
#     "content": "User has an aging golden retriever (Max, ~9yo) and is
#                 actively researching senior dog care.",
#     "confidence": 0.91,
#     "supported_by": [event_ids...],
#   },
#   ... lower-level episodes available if needed
# ]

As agent procedural memory

# Record what the agent tried and what happened
memory.observe({
    "type": "procedure",
    "situation": "API returned 429 rate limit",
    "action": "exponential backoff with jitter, retry up to 5x",
    "outcome": "success",
})

# Later, in a new situation
procedures = memory.retrieve_procedures(
    situation="hitting 503 errors from downstream service"
)
# Returns consolidated procedures from analogous past situations,
# ranked by how often they worked.

Manual consolidation and inspection

# Trigger consolidation explicitly (e.g. during downtime)
memory.consolidate()

# Inspect what's been consolidated
memory.summary()

# Resolve a contradiction manually
memory.reconcile(memory_id="...", resolution="prefer_recent")

How it works under the hood

Consolidation

When triggered, the consolidation pass:

  1. Clusters recent unconsolidated events using embedding similarity, with a configurable cohesion threshold.
  2. Extracts abstractions from each cluster using a cheap LLM call. The prompt is structured to produce generalizations, not summaries.
  3. Links abstractions to their supporting events (provenance is always preserved).
  4. Detects contradictions with existing abstractions and flags them for resolution.
  5. Promotes stable, frequently-corroborated abstractions to higher levels of the hierarchy.

Decay

Each memory item carries a weight $w \in [0, 1]$. The weight evolves as:

$$ w_{t+1} = w_t \cdot \alpha^{\Delta t} + \beta \cdot r_t + \gamma \cdot c_t - \delta \cdot x_t $$

Where $r_t$ is reinforcement (was it retrieved and useful?), $c_t$ is new corroboration, $x_t$ is contradiction, and $\alpha, \beta, \gamma, \delta$ are tunable. Items below a threshold are pruned (or kept as cold storage, depending on configuration).

Retrieval

Retrieval is coarse-to-fine by default: search abstractions first, then drill into supporting episodes only if the query demands specifics or the top-level results are low-confidence. This is both faster and more grounded than flat vector search.


What makes Engram different

Flat vector store mem0 / similar Engram
Stores raw events
Summarization
Multi-level hierarchy
Principled decay partial
Contradiction handling
Provenance tracking partial
Procedural memory
Coarse-to-fine retrieval

The headline difference is that Engram treats memory as a living hierarchy that changes over time, not as a static append-only store with a search index. The downstream effects — better recall on long conversations, transferable agent procedures, graceful handling of stale or contradictory information — fall out of that.


Benchmarks

The success criterion for Engram is beating state-of-the-art on long-horizon memory benchmarks, not just being correct in principle.

Tracked suites:

  • LongMemEval — long-horizon conversational memory.
  • LoCoMo — multi-session dialogue with memory recall (especially the temporal and adversarial splits, where flat RAG breaks).
  • Custom procedural transfer benchmark — does an agent with Engram do better on tasks it has seen analogues of? (Constructed from agent traces.)

Tracked baselines: Chroma (flat dense), Chroma + BM25 (hybrid), Letta / MemGPT, Zep / Graphiti, Cognee, HippoRAG, mem0, A-MEM, full-context (as upper bound).

The full plan — targets, why-we-think-we-can-win, and the reproducibility discipline — is in benchmarks/SOTA.md. The running comparison is in benchmarks/SCOREBOARD.md. A claim of "we beat X" requires a committed manifest in benchmarks/runs/; without one it doesn't count.


Roadmap

Stage-by-stage breakdown — including cross-cutting standards on speed, quality, and security — in ROADMAP.md. High-level milestones:

v0.1 — Core primitive. Event log, basic consolidation, decay, coarse-to-fine retrieval. SQLite backend. Reference benchmarks against flat vector store.

v0.2 — Procedural memory. First-class support for action/situation/outcome triples. Procedure retrieval API. Agent-framework integrations (LangGraph, LlamaIndex, raw OpenAI).

v0.3 — Contradiction and temporal reasoning. Trust-weighted conflict resolution, temporal segmentation ("X was true until March"), explicit invalidation.

v0.4 — Multi-tenant and production. Postgres backend, async API, observability, memory inspector UI.

v1.0 — Paper + stable API. Frozen public API, full benchmark suite, peer-reviewed paper.


Research

Engram is an applied research project as much as a library. The paper-track contributions:

  • A formal framing of memory as a hierarchical decay process with measurable consolidation quality.
  • Algorithmic choices (when to consolidate, what to abstract, how to decay) with ablations.
  • A unified primitive for episodic→semantic consolidation and episodic→procedural abstraction.
  • Benchmarks against existing memory libraries and flat baselines.

Drafts and notes live in /research.


Citation

If you use Engram in research, please cite (citation will be added when the paper is on arXiv):

@misc{engram2026,
  title  = {Engram: Hierarchical Memory with Consolidation and Decay for LLM Systems},
  author = {[your name]},
  year   = {2026},
  note   = {Preprint forthcoming},
}

Contributing

Engram is early. The most useful contributions right now:

  • Benchmark runs — reproducing baselines, finding failure modes.
  • Algorithmic experiments — alternative consolidation strategies, decay functions, retrieval policies.
  • Integrations — bindings for popular agent/RAG frameworks.
  • Edge cases — adversarial conversations or agent traces that break the current implementation.

See CONTRIBUTING.md for setup and conventions.


License

MIT. See LICENSE.


Acknowledgments

Engram draws on ideas from cognitive neuroscience (complementary learning systems, episodic-to-semantic consolidation, Ebbinghaus decay), spaced repetition systems, and prior memory libraries (mem0, Letta/MemGPT, Zep). Standing on shoulders.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engrampy-0.2.0.tar.gz (288.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

engrampy-0.2.0-py3-none-any.whl (194.2 kB view details)

Uploaded Python 3

File details

Details for the file engrampy-0.2.0.tar.gz.

File metadata

  • Download URL: engrampy-0.2.0.tar.gz
  • Upload date:
  • Size: 288.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for engrampy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0a4d1c09e653e9712ea1f15f4f3b2a81dd0414823914c063d8c41d367ddf2c52
MD5 5cdf1930ec2b1599a583cf0fd4443560
BLAKE2b-256 5e8127403f8f2ae6eb74a0759879176e2ef56d4b12698cce4222fd0a4c13abfd

See more details on using hashes here.

File details

Details for the file engrampy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: engrampy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 194.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for engrampy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1d7689d61e9ae20ef85f564f8da568a527c2f63f476851b29e5c96cf56d326
MD5 a5ff82f10c765459d97f5ed367ffc31b
BLAKE2b-256 f4f4f7dcadb4f5ccd21c885444cb64e1990a9b4d79ef63287336664d0ee4fd1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page