Skip to main content

Hierarchical memory with consolidation and principled decay for LLM agents and assistants.

Project description

Engram

Hierarchical memory with consolidation and principled decay for LLM agents and assistants.

PyPI License: MIT Paper

Engram is a memory layer for LLM systems that does what existing memory libraries don't: it consolidates. Raw events get abstracted into general patterns, redundant or contradicted memories decay, and retrieval reads across a hierarchy from specific episodes to compressed knowledge — the way human memory actually works.

It is designed as a single primitive that serves both agents (procedural memory: "in situations like this, this approach worked") and assistants (semantic memory: "the user has a golden retriever they care about"), with the same algorithm and the same API.


Why Engram exists

Every production LLM system today has a memory problem. The complaint is universal — "it doesn't remember me" — and the typical solution is some variation of "dump everything into a vector store and search by cosine similarity."

That isn't memory. It's a logbook with a search bar.

Real memory does three things that current systems don't:

  1. Consolidates. Many specific events become one general principle. "User mentioned dog in conv 3, vet in conv 7, kibble in conv 12" becomes "user has a dog they actively care about."
  2. Forgets selectively. Routine, redundant, or contradicted memories decay; surprising, frequently-used, or recently-relevant ones strengthen.
  3. Reads across abstractions. Retrieval sometimes wants the general pattern, sometimes the specific episode, sometimes both. Flat stores can't do this cleanly.

Engram is built around these three principles.


The core idea

Engram organizes memory as a hierarchy with three things flowing through it:

                      ┌──────────────────────────────┐
                      │   Consolidated abstractions  │   ← retrieved when general
                      │   (semantic / procedural)    │      patterns suffice
                      └──────────────▲───────────────┘
                                     │  consolidation
                                     │  (clustering + abstraction)
                      ┌──────────────┴───────────────┐
                      │   Mid-level summaries        │
                      │   (episode clusters)         │
                      └──────────────▲───────────────┘
                                     │
                      ┌──────────────┴───────────────┐
                      │   Raw event log              │   ← retrieved when
                      │   (episodic memory)          │      specifics matter
                      └──────────────────────────────┘
                                     │
                                     ▼
                              decay function
                          (recency, reinforcement,
                           corroboration, contradiction)

The four moving parts:

  • Event log. Every observation lands here first, with provenance and timestamp.
  • Consolidation pass. Periodically (or on trigger), the system clusters related events, extracts abstractions, and links them to their supporting evidence.
  • Decay. Memories at every level have weights driven by reinforcement (was this retrieved? did it lead to a successful outcome?), corroboration (how many independent events support it?), contradiction (was it ever overruled?), and recency.
  • Hierarchical retrieval. Queries read across the whole hierarchy, preferring abstractions when they suffice and drilling into specifics when they don't.

Quick start

pip install engrampy

The PyPI distribution name is engrampy. The Python import name is just engram (so from engram import Memory works as expected). The bare engram and engram-memory names on PyPI are squatted by unrelated parties — PEP 541 reclaim requests are pending.

from engram import Memory

memory = Memory(
    backend="sqlite",                    # or "postgres", "duckdb"
    embedding_model="text-embedding-3-small",
    consolidation_model="gpt-4o-mini",   # used for abstraction extraction
    consolidation_interval=50,           # consolidate every 50 events
)

That's the whole setup. Engram is model-agnostic — bring whichever LLM and embedding model you want for the consolidation and retrieval steps.


Usage

As assistant memory

# Observe events as they happen
memory.observe("User mentioned they have a golden retriever named Max.")
memory.observe("User asked about senior dog food.")
memory.observe("User said Max is 9 years old and slowing down.")

# ... many sessions later ...

context = memory.retrieve("what should I know about the user's pets?")
# Returns:
# [
#   {
#     "level": "abstraction",
#     "content": "User has an aging golden retriever (Max, ~9yo) and is
#                 actively researching senior dog care.",
#     "confidence": 0.91,
#     "supported_by": [event_ids...],
#   },
#   ... lower-level episodes available if needed
# ]

As agent procedural memory

# Record what the agent tried and what happened
memory.observe({
    "type": "procedure",
    "situation": "API returned 429 rate limit",
    "action": "exponential backoff with jitter, retry up to 5x",
    "outcome": "success",
})

# Later, in a new situation
procedures = memory.retrieve_procedures(
    situation="hitting 503 errors from downstream service"
)
# Returns consolidated procedures from analogous past situations,
# ranked by how often they worked.

Manual consolidation and inspection

# Trigger consolidation explicitly (e.g. during downtime)
memory.consolidate()

# Inspect what's been consolidated
memory.summary()

# Resolve a contradiction manually
memory.reconcile(memory_id="...", resolution="prefer_recent")

How it works under the hood

Consolidation

When triggered, the consolidation pass:

  1. Clusters recent unconsolidated events using embedding similarity, with a configurable cohesion threshold.
  2. Extracts abstractions from each cluster using a cheap LLM call. The prompt is structured to produce generalizations, not summaries.
  3. Links abstractions to their supporting events (provenance is always preserved).
  4. Detects contradictions with existing abstractions and flags them for resolution.
  5. Promotes stable, frequently-corroborated abstractions to higher levels of the hierarchy.

Decay

Each memory item carries a weight $w \in [0, 1]$. The weight evolves as:

$$ w_{t+1} = w_t \cdot \alpha^{\Delta t} + \beta \cdot r_t + \gamma \cdot c_t - \delta \cdot x_t $$

Where $r_t$ is reinforcement (was it retrieved and useful?), $c_t$ is new corroboration, $x_t$ is contradiction, and $\alpha, \beta, \gamma, \delta$ are tunable. Items below a threshold are pruned (or kept as cold storage, depending on configuration).

Retrieval

Retrieval is coarse-to-fine by default: search abstractions first, then drill into supporting episodes only if the query demands specifics or the top-level results are low-confidence. This is both faster and more grounded than flat vector search.


What makes Engram different

Flat vector store mem0 / similar Engram
Stores raw events
Summarization
Multi-level hierarchy
Principled decay partial
Contradiction handling
Provenance tracking partial
Procedural memory
Coarse-to-fine retrieval

The headline difference is that Engram treats memory as a living hierarchy that changes over time, not as a static append-only store with a search index. The downstream effects — better recall on long conversations, transferable agent procedures, graceful handling of stale or contradictory information — fall out of that.


Benchmarks

The success criterion for Engram is beating state-of-the-art on long-horizon memory benchmarks, not just being correct in principle.

Tracked suites:

  • LongMemEval — long-horizon conversational memory.
  • LoCoMo — multi-session dialogue with memory recall (especially the temporal and adversarial splits, where flat RAG breaks).
  • Custom procedural transfer benchmark — does an agent with Engram do better on tasks it has seen analogues of? (Constructed from agent traces.)

Tracked baselines: Chroma (flat dense), Chroma + BM25 (hybrid), Letta / MemGPT, Zep / Graphiti, Cognee, HippoRAG, mem0, A-MEM, full-context (as upper bound).

The full plan — targets, why-we-think-we-can-win, and the reproducibility discipline — is in benchmarks/SOTA.md. The running comparison is in benchmarks/SCOREBOARD.md. A claim of "we beat X" requires a committed manifest in benchmarks/runs/; without one it doesn't count.


Roadmap

Stage-by-stage breakdown — including cross-cutting standards on speed, quality, and security — in ROADMAP.md. High-level milestones:

v0.1 — Core primitive. Event log, basic consolidation, decay, coarse-to-fine retrieval. SQLite backend. Reference benchmarks against flat vector store.

v0.2 — Procedural memory. First-class support for action/situation/outcome triples. Procedure retrieval API. Agent-framework integrations (LangGraph, LlamaIndex, raw OpenAI).

v0.3 — Contradiction and temporal reasoning. Trust-weighted conflict resolution, temporal segmentation ("X was true until March"), explicit invalidation.

v0.4 — Multi-tenant and production. Postgres backend, async API, observability, memory inspector UI.

v1.0 — Paper + stable API. Frozen public API, full benchmark suite, peer-reviewed paper.


Research

Engram is an applied research project as much as a library. The paper-track contributions:

  • A formal framing of memory as a hierarchical decay process with measurable consolidation quality.
  • Algorithmic choices (when to consolidate, what to abstract, how to decay) with ablations.
  • A unified primitive for episodic→semantic consolidation and episodic→procedural abstraction.
  • Benchmarks against existing memory libraries and flat baselines.

Drafts and notes live in /research.


Citation

If you use Engram in research, please cite (citation will be added when the paper is on arXiv):

@misc{engram2026,
  title  = {Engram: Hierarchical Memory with Consolidation and Decay for LLM Systems},
  author = {[your name]},
  year   = {2026},
  note   = {Preprint forthcoming},
}

Contributing

Engram is early. The most useful contributions right now:

  • Benchmark runs — reproducing baselines, finding failure modes.
  • Algorithmic experiments — alternative consolidation strategies, decay functions, retrieval policies.
  • Integrations — bindings for popular agent/RAG frameworks.
  • Edge cases — adversarial conversations or agent traces that break the current implementation.

See CONTRIBUTING.md for setup and conventions.


License

MIT. See LICENSE.


Acknowledgments

Engram draws on ideas from cognitive neuroscience (complementary learning systems, episodic-to-semantic consolidation, Ebbinghaus decay), spaced repetition systems, and prior memory libraries (mem0, Letta/MemGPT, Zep). Standing on shoulders.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engrampy-0.2.1.tar.gz (288.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

engrampy-0.2.1-py3-none-any.whl (194.2 kB view details)

Uploaded Python 3

File details

Details for the file engrampy-0.2.1.tar.gz.

File metadata

  • Download URL: engrampy-0.2.1.tar.gz
  • Upload date:
  • Size: 288.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for engrampy-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0f2f2f0076640565b59647064ec2ff700e14fb6fd305cdcdf4f807f7228a4560
MD5 11ed3af7186f19aa22e969a2c60d6774
BLAKE2b-256 63c19df9fbcb3c4be2a373cddfd2dbc172f1f861a7f36d25de9b744e5bfb6a6f

See more details on using hashes here.

File details

Details for the file engrampy-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: engrampy-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 194.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for engrampy-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f27ee953088027100c79383726b18d7807028f7b6bafff24078f8f30c225f3c
MD5 35a81a9d745413e37ab9e43599a419e6
BLAKE2b-256 a87205e344f68ac48d83f1e6bfeffd27e3a0a5b376847be9f242d413abacdbfe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page