Skip to main content

Official Python client for Engram — durable, explainable memory for AI agents.

Project description

lumetra-engram

Official Python client for Engram — durable, explainable memory for AI agents.

  • Zero runtime dependencies (uses the standard library's urllib).
  • Fully typed (py.typed, TypedDict response shapes, IDE-friendly).
  • Python 3.9+.

The TypeScript twin lives at lumetra-io/engram-js.

Install

pip install lumetra-engram
# or
uv add lumetra-engram
# or
poetry add lumetra-engram

Quickstart

from lumetra_engram import EngramClient

engram = EngramClient(api_key="eng_live_...")  # or set ENGRAM_API_KEY and omit

# Store a fact
engram.store_memory("User prefers dark mode.", "user-123")

# Recall — returns a synthesized answer plus the memories that contributed
result = engram.query(
    "What are this user's UI preferences?",
    buckets=["user-123"],
)

print(result["answer"])
print(result.get("explanation", {}).get("retrieved_memories", []))

Configuration

EngramClient(
    api_key="eng_live_...",            # or ENGRAM_API_KEY env var
    base_url="https://api.lumetra.io", # or ENGRAM_BASE_URL env var
    timeout_seconds=30.0,              # default 30s
    max_retries_on_429=3,              # auto-retry on per-tenant rate limit; 0 disables
)

Automatic 429 retry

The Engram API enforces a per-tenant concurrent-request cap and returns 429 Too Many Requests with a Retry-After header when you exceed it. The client honors that header automatically (up to max_retries_on_429 attempts, capped at 30s per sleep) so bursty workloads don't fail on the first contention spike. Pass max_retries_on_429=0 to opt out and surface 429 as EngramError immediately.

BYOK reminder. Engram is bring-your-own-key end-to-end. Configure an OpenAI / Anthropic / Groq / Together / Fireworks key on the Lumetra portal before your first call, or store_memory / query will raise EngramError with status == 412.

API surface

Memories

  • store_memory(content, bucket="default") — store a single fact
  • store_memories(contents, bucket="default") — batched store
  • list_memories(bucket="default", *, limit=20, offset=0) — paginated list
  • delete_memory(memory_id, bucket="default") — delete one memory
  • clear_memories(bucket) — delete every memory in a bucket. No default — explicit bucket required (prevents accidental wipes).

Query

  • query(question, *, buckets=None, top_k=8, skip_synthesis=False, return_explanation=True)
    • buckets fuses across multiple buckets in one call. Defaults to ["default"].
    • skip_synthesis=True returns retrieval-only — no server-side LLM call
    • response shape: {"answer", "explanation": {"retrieved_memories", "profile", "graph_facts"}, "usage"}
  • query_stream(question, *, buckets=None, top_k=8, skip_synthesis=False, return_explanation=True) — same args, streams the answer as it's generated

Streaming

For broad questions, synthesis can take 10–25 seconds. query_stream yields the answer incrementally so you can render it as it's produced instead of waiting for the full response:

from lumetra_engram import EngramClient

engram = EngramClient()

for event in engram.query_stream("Summarize what I worked on this week", buckets=["work"]):
    if event["type"] == "delta":
        print(event["content"], end="", flush=True)
    elif event["type"] == "done":
        print()
        print(f"\nUsed {event['usage']['output_tokens']} tokens")

Two frame types:

  • {"type": "delta", "content": str} — incremental synthesis output, in order. Zero or more.
  • {"type": "done", "answer": str, "usage": {...}, "synthesis_usage": {...}, "explanation": {...}} — emitted exactly once at the end with the assembled answer and final usage/explanation.

Break out of the loop early to abort the request and close the connection.

Buckets

  • list_buckets() — all buckets in your tenant
  • create_bucket(name, description=None)
  • delete_bucket(bucket)No default — explicit bucket required (prevents accidental wipes).

Profile

  • get_profile(bucket="default") — the canonical profile prepended to recall
  • regenerate_profile(bucket="default") — rebuild from current memories

Errors

All non-2xx HTTP responses raise EngramError:

from lumetra_engram import EngramClient, EngramError

engram = EngramClient()

try:
    engram.store_memory("User prefers dark mode.", "user-123")
except EngramError as err:
    if err.status == 412:
        print("BYOK not configured — set an LLM provider key in the Lumetra portal.")
    elif err.status == 429:
        print("Rate limited — back off and retry.")
    else:
        print(f"Engram {err.status}: {err}")
        print("Body:", err.body)

err.status is the HTTP status (or 0 for connection failures), err.body is the parsed JSON body when one was returned.

Async usage

This client is synchronous. For async code, wrap calls in asyncio.to_thread:

import asyncio
from lumetra_engram import EngramClient

engram = EngramClient()

async def recall(question: str):
    return await asyncio.to_thread(engram.query, question, buckets=["user-123"])

A dedicated async client may land later; until then, the thread wrapper is the recommended pattern.

Type hints

Return shapes are declared as TypedDict in lumetra_engram.types. They behave as ordinary dict at runtime — JSON-serialize freely — but give mypy and pyright the same level of detail the TypeScript client exposes via interface.

from lumetra_engram import QueryResult

def summarize(result: QueryResult) -> str:
    return result.get("answer", "")

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumetra_engram-0.2.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lumetra_engram-0.2.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file lumetra_engram-0.2.0.tar.gz.

File metadata

  • Download URL: lumetra_engram-0.2.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for lumetra_engram-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b2e404dc77af8f7cc06ad8a1eae8f87dc5e34eb1c20b2adb8fae871b0750380b
MD5 9cddc96a2e1e11fe1829c20397a1d033
BLAKE2b-256 1e85eac3991e51d20d3a639c30020244a494347ce6747e36c19b1c4a87f13b74

See more details on using hashes here.

File details

Details for the file lumetra_engram-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lumetra_engram-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for lumetra_engram-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cd6dfd2c38dcbbf412a3b6e3baba2d1bf3a88020a37dab119f0029f6209f24f
MD5 4915b079d21316e6d7ed261f8350c412
BLAKE2b-256 513d9a6b73a5941cbfddcc7fa1d2ac2b340a0d89700b25e673a1e6c8b6fcbe67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page