Official Python client for Engram — durable, explainable memory for AI agents.

These details have not been verified by PyPI

Project links

Project description

lumetra-engram

Official Python client for Engram — durable, explainable memory for AI agents.

Zero runtime dependencies (uses the standard library's urllib).
Fully typed (py.typed, TypedDict response shapes, IDE-friendly).
Python 3.9+.

The TypeScript twin lives at lumetra-io/engram-js.

Install

pip install lumetra-engram
# or
uv add lumetra-engram
# or
poetry add lumetra-engram

Quickstart

from lumetra_engram import EngramClient

engram = EngramClient(api_key="eng_live_...")  # or set ENGRAM_API_KEY and omit

# Store a fact
engram.store_memory("User prefers dark mode.", "user-123")

# Recall — returns a synthesized answer plus the memories that contributed
result = engram.query(
    "What are this user's UI preferences?",
    buckets=["user-123"],
)

print(result["answer"])
print(result.get("explanation", {}).get("retrieved_memories", []))

Configuration

EngramClient(
    api_key="eng_live_...",            # or ENGRAM_API_KEY env var
    base_url="https://api.lumetra.io", # or ENGRAM_BASE_URL env var
    timeout_seconds=30.0,              # default 30s
    max_retries_on_429=3,              # auto-retry on per-tenant rate limit; 0 disables
)

Automatic 429 retry

The Engram API enforces a per-tenant concurrent-request cap and returns 429 Too Many Requests with a Retry-After header when you exceed it. The client honors that header automatically (up to max_retries_on_429 attempts, capped at 30s per sleep) so bursty workloads don't fail on the first contention spike. Pass max_retries_on_429=0 to opt out and surface 429 as EngramError immediately.

Thread safety

EngramClient is effectively stateless — every call opens its own urllib request and the client doesn't share mutable state across calls. You can reuse one client across threads, or instantiate per-request; both are safe.

BYOK reminder. Engram is bring-your-own-key end-to-end. Configure an OpenAI / Anthropic / Groq / Together / Fireworks key on the Lumetra portal before your first call, or store_memory / query will raise EngramError with status == 412.

API surface

Memories

store_memory(content, bucket="default", *, dedup=None) — store a single fact. dedup is one of "off", "loose", "strict"; None (default) uses the server's policy. See Dedup below.
store_memories(contents, bucket="default") — batched store
list_memories(bucket="default", *, limit=20, offset=0) — paginated list
delete_memory(memory_id, bucket="default") — delete one memory
clear_memories(bucket) — delete every memory in a bucket. No default — explicit bucket required (prevents accidental wipes).

Query knobs

query and query_stream accept these tuning knobs (all optional):

Kwarg	Type	What it does
`max_tokens`	`int`	Cap synthesis output. Lower for agent loops / cost control.
`min_similarity_threshold`	`float`	Drop retrieved chunks below this raw cosine similarity. Citations-grade precision.
`top_k_per_bucket`	`int \| dict`	Per-bucket retrieval depth. `{"edgar_AAPL": 20, "prices_AAPL": 4}` lets you express "deep here, shallow there."
`return_format`	`"prose" \| "json"`	When `"json"`, server returns JSON; result includes parsed `answer_json`.
`response_schema`	`dict` (JSON Schema)	Hint the model with a target shape. Best-effort; validate client-side for strict.

Example — agent loop with terse, structured output over an asymmetric bucket set:

r = engram.query(
    "Apple's active legal proceedings",
    buckets=["edgar_AAPL", "patents_AAPL"],
    top_k_per_bucket={"edgar_AAPL": 20, "patents_AAPL": 5},
    max_tokens=400,
    return_format="json",
    response_schema={
        "type": "array",
        "items": {"properties": {
            "case_name": {"type": "string"},
            "jurisdiction": {"type": "string"},
            "status": {"type": "string"},
        }},
    },
)
for case in r["answer_json"] or []:
    print(case)

Query

query(question, *, buckets=None, top_k=8, skip_synthesis=False, return_explanation=True)
- buckets fuses across multiple buckets in one call. Defaults to ["default"].
- skip_synthesis=True returns retrieval-only — no server-side LLM call
- response shape: {"answer", "memories_found", "explanation": {"retrieved_memories", "graph_facts", "entity_matches", "context_tokens", "profile"}, "usage"}. Each graph_facts[i] includes memory_id so you can match it against retrieved_memories[].memory_id and render the citing memory.
query_stream(question, *, buckets=None, top_k=8, skip_synthesis=False, return_explanation=True) — same args, streams the answer as it's generated

Dedup

The server runs a similarity check before storing. By default ("loose", similarity ≥ 0.95) it collapses near-duplicate writes into the existing memory so re-ingesting the same source doesn't bloat the bucket. For most narrative content this is what you want.

For templated time-series content (financial filings, daily metrics, log rows) where rows are structurally similar but each carries unique values, the default collapses real data. Use dedup="off" to disable.

Every response now includes a status field. When status == "merged", the write was absorbed into an existing memory and three extra fields are present:

r = engram.store_memory("Acme Q1 revenue: $245M", bucket="finance")
if r["status"] == "merged":
    print(f"merged into {r['deduped_into']} ({r['merge_reason']}, sim={r['similarity_score']:.3f})")

merge_reason is one of:

content_hash — byte-identical content
embedding_similarity — vector similarity ≥ threshold
conflict_keep_existing — LLM conflict resolver chose the existing memory
concurrent_insert_race — another worker stored identical content first

Opt out for time-series ingest:

for row in monthly_prices:
    r = engram.store_memory(row, bucket="prices_AAPL", dedup="off")

"strict" is a middle ground — only collapses near-identical content (≥ 0.99). Useful when you want a safety net against exact re-ingest but expect distinct-but-similar rows to coexist.

Streaming

For broad questions, synthesis can take 10–25 seconds. query_stream yields the answer incrementally so you can render it as it's produced instead of waiting for the full response:

from lumetra_engram import EngramClient

engram = EngramClient()

for event in engram.query_stream("Summarize what I worked on this week", buckets=["work"]):
    if event["type"] == "delta":
        print(event["content"], end="", flush=True)
    elif event["type"] == "done":
        print()
        print(f"\nUsed {event['usage']['output_tokens']} tokens")

Two frame types:

{"type": "delta", "content": str} — incremental synthesis output, in order. Zero or more.
{"type": "done", "answer": str, "usage": {...}, "synthesis_usage": {...}, "explanation": {...}} — emitted exactly once at the end with the assembled answer and final usage/explanation.

Break out of the loop early to abort the request and close the connection.

Buckets

list_buckets() — all buckets in your tenant
create_bucket(name, description=None)
delete_bucket(bucket) — No default — explicit bucket required (prevents accidental wipes).

Profile

get_profile(bucket="default") — the canonical profile prepended to recall
regenerate_profile(bucket="default") — rebuild from current memories

Errors

All non-2xx HTTP responses raise EngramError:

from lumetra_engram import EngramClient, EngramError

engram = EngramClient()

try:
    engram.store_memory("User prefers dark mode.", "user-123")
except EngramError as err:
    if err.status == 412:
        print("BYOK not configured — set an LLM provider key in the Lumetra portal.")
    elif err.status == 429:
        print("Rate limited — back off and retry.")
    else:
        print(f"Engram {err.status}: {err}")
        print("Body:", err.body)

err.status is the HTTP status (or 0 for connection failures), err.body is the parsed JSON body when one was returned.

Async usage

This client is synchronous. For async code, wrap calls in asyncio.to_thread:

import asyncio
from lumetra_engram import EngramClient

engram = EngramClient()

async def recall(question: str):
    return await asyncio.to_thread(engram.query, question, buckets=["user-123"])

A dedicated async client may land later; until then, the thread wrapper is the recommended pattern.

Type hints

Return shapes are declared as TypedDict in lumetra_engram.types. They behave as ordinary dict at runtime — JSON-serialize freely — but give mypy and pyright the same level of detail the TypeScript client exposes via interface.

from lumetra_engram import QueryResult

def summarize(result: QueryResult) -> str:
    return result.get("answer", "")

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.1

May 22, 2026

This version

0.5.0

May 21, 2026

0.4.1

May 17, 2026

0.4.0

May 17, 2026

0.3.0

May 17, 2026

0.2.2

May 16, 2026

0.2.1

May 16, 2026

0.2.0

May 16, 2026

0.1.1

May 14, 2026

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumetra_engram-0.5.0.tar.gz (13.6 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lumetra_engram-0.5.0-py3-none-any.whl (16.2 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file lumetra_engram-0.5.0.tar.gz.

File metadata

Download URL: lumetra_engram-0.5.0.tar.gz
Upload date: May 21, 2026
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for lumetra_engram-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`33843612efaaf43aa9c2da71d1bf80e9c145d5fb3f35dd33e246dc88c27aa09e`
MD5	`ba1be6191eab314cb3c938946ea1e71e`
BLAKE2b-256	`8cabb89ab9ef92412429a99e671112ca9245bcb851c2a340aedb30eea0315554`

See more details on using hashes here.

File details

Details for the file lumetra_engram-0.5.0-py3-none-any.whl.

File metadata

Download URL: lumetra_engram-0.5.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for lumetra_engram-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`770b60cbe2bb4d80eb141ee43c03dfa7ddc67144a209d9e36886e54fbe4addd8`
MD5	`40031c0a61535c117186d525b221ca8b`
BLAKE2b-256	`933bb3f1a3de44bea4fb86edaf3dc1237f926626da036338528ca85b75961268`

See more details on using hashes here.

lumetra-engram 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

lumetra-engram

Install

Quickstart

Configuration

Automatic 429 retry

Thread safety

API surface

Memories

Query knobs

Query

Dedup

Streaming

Buckets

Profile

Errors

Async usage

Type hints

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes