Engram — durable memory layer for AI agents. Temporal knowledge graph, semantic search, and MCP-native tools.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

prakashsunil

These details have not been verified by PyPI

Project links

Project description

Engram

Durable memory for AI agents. Multi-tenant by default. Local-first or cloud. Benchmarked, not vapor.

from engram import Engram

async with await Engram.open(":memory:") as memory:
    await memory.record("Alice prefers espresso over drip.", user_id="alice", role="user")
    await memory.record("Alice's brother lives in Tokyo.",   user_id="alice", role="user")

    results = await memory.recall(query="what does alice drink?", user_id="alice", top_k=1)
    print(results[0].fact.text)
    #  Alice prefers espresso over drip.

That's the whole shape. Record what users (or agents) say; ask questions later; get back the relevant facts. With per-user isolation, temporal grounding, and a recommendation mode that goes beyond simple recall.

Why Engram?

There are several memory libraries for agents already (Mem0, Letta, Zep, and the published AgentMemory). Engram makes a few specific bets:

1. Mode-aware reading. Most memory layers do one thing: retrieve facts that match your query. Engram ships two distinct read paths: Reader(mode="recall") for "what did the user say about X?" and Reader(mode="synthesis") for "recommend something based on the user's preferences." On LongMemEval-S, this lifts single-session-preference accuracy from 29% to 71%.

2. Multi-tenant by default. Every API takes user_id + org_id (a Scope). Facts, vectors, and chat messages are partitioned at the SQL and HNSW levels — no cross-tenant leakage by construction. There is no "admin" scope that can read across tenants.

3. Local-first OR cloud. Pluggable providers for both LLM (Ollama / OpenAI / Anthropic) and embeddings (Ollama / OpenAI / synthetic). Run completely offline with nomic-embed-text + llama3.2:3b, or swap in cloud providers when you need throughput.

4. Active fact versioning. Engram.supersede(old_id, new_id) marks a fact as replaced. Recall filters superseded facts by default. So when a user changes a preference, the old one doesn't poison future answers.

5. Benchmarked, not vapor. Reproducible LongMemEval-S runs in benchmarks/. We publish overall and per-category numbers (currently 71% overall, see below). The 25-point gap to AgentMemory's frontier 96.2% is a public roadmap item — not hidden.

6. MCP-native. Ships an MCP server so Claude Code, Cursor, and other agentic tools can use Engram memory directly. Three tools: memory_record, memory_recall, memory_context.

Install

uv add jamjet-engram                  # or: pip install jamjet-engram
uv add 'jamjet-engram[rerank]'        # cross-encoder reranker (recommended)

Requires Python 3.12+. SQLite (stdlib) and hnswlib (auto-installed) are the only persistence dependencies.

60-second tour

Record + recall

import asyncio
from engram import Engram

async def main():
    async with await Engram.open(":memory:") as memory:
        await memory.record("user prefers oat milk", user_id="alice")
        await memory.record("user drinks coffee twice a day", user_id="alice")
        await memory.record("user's dog is named Whiskey", user_id="alice")

        results = await memory.recall(query="coffee", user_id="alice", top_k=3)
        for r in results:
            print(f"  [{r.score:.2f}]  {r.fact.text}")

asyncio.run(main())

Build a context window for an LLM

ctx = await memory.context(
    query="what should I order at the cafe?",
    user_id="alice",
    token_budget=2000,
)
# ctx is a newline-joined string ready to drop into a system prompt

Per-category routing (the differentiator)

For preference/recommendation questions, route to synthesis mode — the reader generates a recommendation grounded in stored user preferences instead of just listing facts.

from engram import Engram, Reader, RuleBasedClassifier, is_preference_question
from engram.llm.tier import ModelTier

tier = ModelTier.default()  # gpt-4o-mini
classifier = RuleBasedClassifier()

async with await Engram.open(":memory:", tier=tier) as memory:
    for turn in conversation_history:
        await memory.record(text=turn["content"], role=turn["role"], user_id="alice")

    question = "Recommend a movie for tonight"
    qt = await classifier.classify(question)

    if is_preference_question(question, qt):
        ctx = await memory.context(query=question, user_id="alice", role_filter=("user",))
        reader = Reader(tier.reader, mode="synthesis")
    else:
        ctx = await memory.context(query=question, user_id="alice", classifier=classifier)
        reader = Reader(tier.reader)

    res = await reader.read(question=question, context=ctx)
    print(res.answer)

Full walk-through: docs/quickstart.md. Working example: examples/05_preference_synthesis.py.

Run as an HTTP server

import uvicorn
from engram import Engram
from engram.server.http import build_http_app

memory = await Engram.open("./engram.db")
uvicorn.run(build_http_app(memory), host="127.0.0.1", port=19090)

curl -sX POST localhost:19090/v1/memory/record \
  -H 'Content-Type: application/json' \
  -d '{"text": "alice prefers espresso", "user_id": "alice"}'

Endpoints: record, extract, recall, search, context, raw_facts, sessions, messages, healthz.

Run as an MCP server (for Claude Code, Cursor, etc.)

from engram.server.mcp import build_mcp_server
import mcp.server.stdio

server = build_mcp_server(memory)
async with mcp.server.stdio.stdio_server() as streams:
    await server.run(streams[0], streams[1], server.create_initialization_options())

Core concepts

Concept	What it is	Why you care
`Scope`	`(org_id, user_id)` pair attached to every fact	Multi-tenancy. No cross-tenant leakage at the SQL or vector index layer.
`Fact`	Pydantic model: text, scope, validity window, confidence, optional event_date / session_id / role / metadata	The unit of memory. Versioned via `supersede`, scored at retrieval, governed by per-tenant scope.
`Engram`	Async facade over store + embedder + retriever + (optional) extractor + (optional) tier	The thing you actually call. `record()`, `recall()`, `context()`, `extract()`.
`Reader`	LLM-driven answer generator with two modes: `"recall"` (verifier-gated fact recall) and `"synthesis"` (recommendation grounded in user preferences)	The mode you pick changes the whole pipeline — verifier on or off, tool loop or not, recall prompt or synthesis prompt.
`RuleBasedClassifier`	Maps a question to one of the 6 LongMemEval question types	Used for per-category token budgets and to decide between recall vs synthesis. Override with your own `QuestionClassifier` if you need different categories.
`MemoryTier`	`WORKING` / `EPISODIC` / `SEMANTIC` enum on each fact	Optional structuring of memory by time horizon. The store doesn't enforce it; your code does.
`Tool` + `ToolRegistry`	The reader's tool-use protocol (text-marker `[TOOL_USE]{...}[/TOOL_USE]`)	When `mode="recall"`, the reader can call tools mid-generation. Six built-in tools (search_facts, search_events, solve_temporal, count_between, add_days, days_between). Bring your own.

All importable from the top-level package: from engram import Engram, Scope, Fact, Reader, ....

Architecture (one paragraph + a diagram)

Three layers under the Engram facade. Storage = SqliteStore (FTS5 for keyword) + HnswVectorStore (cosine similarity). Retrieval = HybridRetriever blends vector / keyword / temporal scoring with optional cross-encoder rerank. Reading = Reader (verifier-gated recall) or synthesis-mode (verifier off, recommendation prompt). Optional extras: ExtractionPipeline (LLM extracts facts from chat turns), EventExtractor (SVO event calendar for temporal solvers), QueryDecomposer (splits compound questions), and a tool registry the reader can call mid-generation. Everything is async; everything is scope-isolated.

                   ┌──────────────────────────────────────────────┐
                   │  Engram (async facade)                       │
                   │  record  •  recall  •  context  •  extract  │
                   │  supersede  •  record_message               │
                   └───────────┬───────────┬──────────┬──────────┘
              ┌────────────────┘           │          └──────────────┐
              ▼                            ▼                         ▼
   ┌──────────────────┐          ┌──────────────────┐      ┌──────────────────┐
   │  SqliteStore     │          │  HnswVectorStore │      │  Reader          │
   │  facts + msgs    │          │  per-scope HNSW  │      │  recall mode     │
   │  FTS5 keyword    │          │  cosine          │      │   ↳ verifier     │
   │  scope-isolated  │          │  deterministic   │      │   ↳ tool loop    │
   └──────────────────┘          └──────────────────┘      │   ↳ escalation   │
              │                            │                │  synthesis mode  │
              └─────────────┬──────────────┘                │   ↳ direct LLM   │
                            ▼                                └──────────────────┘
                  ┌──────────────────┐
                  │  HybridRetriever │  vector + keyword + temporal scoring
                  │  + reranker      │  cross-encoder rerank (optional)
                  └──────────────────┘

Files to start exploring: src/engram/engram.py is the facade. src/engram/read/reader.py is where the two modes live. benchmarks/smoke_runner.py is the LongMemEval harness.

Benchmarks — LongMemEval-S

Engram is benchmarked against LongMemEval-S (Wu et al., 2024) — 500 questions about a long synthetic chat history, judged by gpt-4o-mini.

Latest result (v0.1.0)

71.0% on a 100-question stratified subset (gpt-4o-mini reader, --decompose --tools, preference-aware routing on).

category	score	n
`single-session-assistant`	88%	17
`temporal-reasoning`	75%	16
`single-session-preference`	71%	17
`knowledge-update`	71%	17
`single-session-user`	69%	16
`multi-session`	59%	17
overall	71%	100

Frontier comparison: AgentMemory reports 96.2% on the full 500. Engram is currently 25pp behind. Documented gap-closing work in Roadmap.

Reproduce

set -a && source /path/to/.env && set +a    # OPENAI_API_KEY required
export LONGMEMEVAL_ORACLE=/path/to/longmemeval_oracle.json
uv run python -m benchmarks.smoke_runner --n 100 --decompose --tools

The runner writes a per-question JSONL trace + a markdown report to benchmarks/reports/. Set PYTHONHASHSEED=42 for fully deterministic insertion order.

What we tried, what worked, what didn't

The benchmark history (visible in CHANGELOG.md) documents two ablation programmes — a Tier 2-3 batch (April 2026, 64% → 68%, 8 techniques ablated independently) and a preference uplift (May 2026, 29% → 71% on single-session-preference after a failed attempt-1 led to a redesign). We publish the negative results too: --ft-cross-encoder looked promising on IR metrics but lost 7pp downstream because the training labels misaligned with multi-session task structure; --reextract and --self-consistency didn't help because the verifier short-circuited them. Honest history makes the benchmark numbers credible.

Running in production

Scope isolation. Every API call takes user_id + org_id (combined into a Scope). Facts, vectors, and chat-message storage are partitioned by scope at the SQL and HNSW levels — no cross-tenant leakage by construction. There is no "admin" scope that can read all data.

Embedding provider tradeoffs. OllamaEmbedding runs entirely local (private, free, slower) — recommended for sensitive data or offline usage. OpenAIEmbedding is faster and more accurate but sends every recorded text to OpenAI. SyntheticEmbedding is for tests only.

LLM API keys. Engram never stores API keys. They're read from the environment by the OpenAILLM / AnthropicLLM / OllamaLLM clients. Use a secrets manager in production.

Rate limits. The default extraction pipeline calls one LLM per session ingested. Bulk imports of large chat histories should chunk + rate-limit to fit your provider's tier. The OpenAILLM and AnthropicLLM clients use the official SDKs which handle retries with exponential backoff.

Determinism. HnswVectorStore defaults to random_seed=42 for reproducible benchmark runs. Set PYTHONHASHSEED=42 and use the same insertion order for byte-for-byte reproducibility.

Per-category routing. For preference/recommendation questions, route to Reader(mode="synthesis") with Engram.context(role_filter=("user",)). Other categories take the default fact-recall reader. See docs/quickstart.md for the canonical pattern.

Contributing

We genuinely welcome contributions. The repo is set up so a new contributor can ship a meaningful change in a couple of hours.

Quick start

git clone https://github.com/jamjet-labs/engram.git
cd engram
uv sync --all-extras
uv run pytest                # 309 tests, ~50s
uv run ruff check && uv run ruff format --check && uv run mypy src/engram

CI gates on all three. Coverage floor is 79% (--cov-fail-under=79) and a downstream py.typed job builds the wheel and verifies mypy --strict passes for library users.

Full developer guide: CONTRIBUTING.md. Security policy: SECURITY.md.

Where contributions are most welcome

Area	What you'd do	Difficulty
New embedding provider	Implement `EmbeddingProvider` for Cohere / Voyage / local-sbert / etc. Pattern in `src/engram/embedding/`.	Easy — single file + tests
New LLM provider	Implement `LLMClient` for Vertex / Together / etc. Pattern in `src/engram/llm/`.	Easy
New tool for the reader	Implement `Tool` for domain-specific work (e.g., `search_calendar`, `query_database`). Pattern in `src/engram/tools/`.	Easy
Tighten preference predicate recall	Currently 76% on LongMemEval-S — misses "what do you think?" decision-help shapes. ~5 questions worth of headroom. See `src/engram/read/preference_gate.py`.	Medium — requires benchmark validation
Postgres backend	Implement `EngramStore` against Postgres. Schema in `src/engram/store/`.	Medium — schema + tests + CI matrix entry
Performance benchmarks	Recall latency / throughput at scale. There's no baseline yet — open territory.	Medium
Migration guide from Rust v0.5.x	Document the API mapping between the Rust runtime and the Python rewrite.	Easy if you've used both
Move LongMemEval ≥80%	The 25pp gap to AgentMemory's 96.2% is the headline gap. Pick a category (multi-session at 59% is the lowest) and find a +5pp improvement.	Hard — but high-impact

We tag issues with good first issue for the first three categories. Open an issue before starting a large change so we can sanity-check the approach.

What makes a good PR here

A failing test first (TDD is enforced by the CI checks)
For benchmark-affecting changes, an n=100 smoke result vs the published baseline (within ±2pp on non-target categories, real lift on the target)
A CHANGELOG.md entry under [Unreleased]
Conventional commit message: feat(scope): description or fix(scope): description

How decisions get made

Architectural changes go through a brainstorm → spec → plan → implementation cycle. The specs and plans are local-only (gitignored under docs/superpowers/); the outcomes land in CHANGELOG entries with PR references. We document negative results too — see the "Preference uplift v1 (rolled back)" entry for a worked example of a strategy that didn't move the metric and what we learned.

Roadmap

In rough priority order:

LongMemEval-S ≥85% — the top headline. Currently 71%; AgentMemory frontier is 96.2%. Multi-session at 59% is the lowest-hanging fruit.
Postgres backend — currently SQLite-only. Schema is straightforward; needs a EngramStore Postgres implementation + CI matrix entry.
Sonnet / Haiku reader compatibility — the v0.1.0 synthesis prompt is calibrated against gpt-4o-mini. Sonnet/Haiku regress on tool-protocol parsing. Native Anthropic tool-use API path would unlock +5–10pp on tool-augmented runs.
Performance benchmarks — recall latency, throughput, memory footprint at scale. No published baseline yet.
Per-category synthesis modes for non-preference categories — only single-session-preference currently uses synthesis mode. Multi-session might benefit from a different tailored prompt.
Migration guide from Rust v0.5.x — the Rust runtime stays in maintenance mode at jamjet-labs/jamjet; a side-by-side API map would help users migrate.
Cross-language client SDKs — Engram is Python today. The HTTP API is stable enough that a TypeScript / Go client is mostly mechanical.

If any of these are interesting, open an issue and let's talk.

Project meta

License: Apache 2.0 — see LICENSE.
Security: Report privately to security@jamjet.dev. See SECURITY.md.
Changelog: CHANGELOG.md.
Rust v0.5.x runtime: in maintenance mode at jamjet-labs/jamjet/runtime/engram. Wire protocol stays compatible.
Examples: examples/ — start with 01_quickstart.py.
Architecture deep-dive: docs/architecture.md (forthcoming).

Engram is built by JamJet Labs. The benchmark targets and design decisions are public; the failures (and there have been a few) are documented in CHANGELOG. If you'd like to help us push past 71% — please do.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

prakashsunil

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jamjet_engram-0.1.0.tar.gz (244.6 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jamjet_engram-0.1.0-py3-none-any.whl (86.5 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file jamjet_engram-0.1.0.tar.gz.

File metadata

Download URL: jamjet_engram-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 244.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jamjet_engram-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6a5b1d76019194c953be40650008098733b46f20b2ddb9ad3c6900218be9aeb0`
MD5	`661810ccfd4db1efb47dbf7a9de432f3`
BLAKE2b-256	`78c66a9f4a04d94564c48925eae56a04f32bc4229a5610c68d638e235cdc03e6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamjet_engram-0.1.0.tar.gz:

Publisher: release.yml on jamjet-labs/engram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: jamjet_engram-0.1.0.tar.gz
- Subject digest: 6a5b1d76019194c953be40650008098733b46f20b2ddb9ad3c6900218be9aeb0
- Sigstore transparency entry: 1451522542
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: jamjet-labs/engram@ccba8901027d7ba24bb38fe76828294df713d904
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jamjet-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ccba8901027d7ba24bb38fe76828294df713d904
- Trigger Event: push

File details

Details for the file jamjet_engram-0.1.0-py3-none-any.whl.

File metadata

Download URL: jamjet_engram-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 86.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jamjet_engram-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da17b01d368ec192194c134ce58e6d53117c9bbb63ff4f6bfb1cb01f64ba6b8b`
MD5	`3df45dc4ed2b0083b3c4924670dccfa0`
BLAKE2b-256	`d7a0c69734f779659ab0a4e8ea406333184f3061de132cfb5659ef660fbed839`

See more details on using hashes here.

Provenance

The following attestation bundles were made for jamjet_engram-0.1.0-py3-none-any.whl:

Publisher: release.yml on jamjet-labs/engram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: jamjet_engram-0.1.0-py3-none-any.whl
- Subject digest: da17b01d368ec192194c134ce58e6d53117c9bbb63ff4f6bfb1cb01f64ba6b8b
- Sigstore transparency entry: 1451522661
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: jamjet-labs/engram@ccba8901027d7ba24bb38fe76828294df713d904
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/jamjet-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ccba8901027d7ba24bb38fe76828294df713d904
- Trigger Event: push

jamjet-engram 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Engram

Why Engram?

Install

60-second tour

Record + recall

Build a context window for an LLM

Per-category routing (the differentiator)

Run as an HTTP server

Run as an MCP server (for Claude Code, Cursor, etc.)

Core concepts

Architecture (one paragraph + a diagram)

Benchmarks — LongMemEval-S

Latest result (v0.1.0)

Reproduce

What we tried, what worked, what didn't

Running in production

Contributing

Quick start

Where contributions are most welcome

What makes a good PR here

How decisions get made

Roadmap

Project meta

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance