Skip to main content

Production-grade memory infrastructure for multi-agent systems. Namespace isolation, RBAC, provenance, ranked retrieval.

Project description

§ 00 · MASTHEAD · FILED UNDER INFRASTRUCTURE · BY SURENDRA SINGH · — FOR PUBLICATION —

ATTESTOR — A MEMORY JOURNAL FOR AGENTIC SYSTEMS · VOL. 02 · REV. 0.1 · EST. 2026 · NEW YORK · MIT

Attestor

The memory layer for agent teams.

Self‑hosted · Deterministic retrieval · No LLM in the critical path

PyPI PyPI downloads GitHub stars Python License MCP Registry

Problem ↓ · Multi‑Agent ↓ · Pipeline ↓ · Deploy ↓ · Principles ↓ · Reference ↓


Six stars out of eight hundred light up — Attestor recalls exactly what this moment needs

Attestor doesn’t search. It remembers.


The problem

Agent prototypes don’t survive production. Memory is why. Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner’s decisions never reach the executor, the researcher’s findings never reach the reviewer. Teams paper over it by stuffing giant prompts between agents, burning tokens on stale context. That’s a workaround, not an architecture.

The solution — at a glance

Write · Write · Write · Write ... Read — memories accumulate across sessions, load_context primes the next task

Memory accumulates. Load primes.  Four writes across Mon–Thu land as persisted memories. Friday morning a fresh Portfolio Planner wakes up to a new task, calls mem.load_context(), and Attestor ranks + dedupes + budget‑fits all four back into the context window. The agent resumes with full continuity — earnings signal, risk cap, prior stance, compliance precedent — not RAG over documents, but the agents’ own history replayed into a fresh context. Zero LLM calls in the critical path.


Production‑grade memory infrastructure for multi‑agent systems.
The memory tier your agents need when they leave your laptop and start running in production.

Namespace isolation · RBAC · Provenance tracking · Temporal correctness · Ranked retrieval · Token budgets — built for orchestrator‑worker and planner‑executor pipelines. Python library, REST API, or containerized service. No SaaS middleman, no per‑seat fees, no vendor lock‑in.

from attestor import AgentMemory

mem = AgentMemory("./store")
mem.add("Order service uses event sourcing", entity="order-service", tags=["arch"])
mem.recall("how is the order service structured?", budget=2000)
poetry add attestor

MIT · Python 3.10–3.14 · Production deploy in one command

§ Spec Sheet
Storage RolesDoc · Vector · Graph
InterfacesPython · REST · MCP
Retrieval Layers5
RBAC Roles6
Cloud TargetsAmazon Web Services · Microsoft Azure · Google Cloud Platform
LicenseMIT

§ 01 — The Problem

Why agent prototypes don't survive production

Agent prototypes don't survive production. Memory is usually why.

Single agents rediscover the same facts every run. Multi‑agent pipelines are worse — the planner's decisions never reach the executor, the researcher's findings never reach the reviewer. Teams end up stuffing giant prompts between agents to paper over the gap. That's not an architecture — that's a workaround.

What we hear from teams building agent pipelines:

We had a planner, a coder, a reviewer, a deployer — four agents in a pipeline. None of them knew what the others learned. We were passing giant prompts between them and burning tokens on stale information.

Without AttestorWith Attestor
01 — Each agent starts blind — no knowledge of what others learned01 — Shared memory — planner writes, coder reads, reviewer sees both
02 — Giant prompts passed between agents burn context tokens02 — Token‑budget recall — each agent pulls only what fits
03 — No access control — any agent can overwrite any state03 — Six RBAC roles, namespace isolation, write quotas per agent
04 — Contradicting facts from different agents go undetected04 — Contradictions auto‑resolved — newer facts supersede older ones
05 — Session ends, everything learned is gone forever05 — Persistent across sessions, pipelines, and agent restarts

More agents, more sessions, more memories — retrieval gets better while context cost stays flat.


§ 02 — Multi-Agent Systems

Orchestrator · Planner · Executor · Reviewer

Not a chatbot plugin. Infrastructure for agent teams.

Every recall and write is scoped to an AgentContext — a lightweight dataclass carrying identity, role, namespace, parent trail, token budget, write quota, and visibility. Contexts are immutable; spawning a sub‑agent returns a new context with inherited provenance.

# Primitive What it does
01 Namespace isolation Every agent, project, or tenant gets its own namespace. Planner writes, coder reads, reviewer sees both. Isolated by default, shared when you configure it.
02 Six RBAC roles Orchestrator, Planner, Executor, Researcher, Reviewer, Monitor. Read‑only observers to full admins.
03 Provenance tracking Know which agent wrote which memory, when, and under which parent session. The reviewer can trace a decision back to the planner three sessions ago.
04 Cross‑agent contradiction resolution Agent A learns "user works at Google." Agent B learns "user works at Meta." Attestor auto‑supersedes. Full history preserved. Zero inference calls in the critical path.
05 Token budgets per agent recall(query, budget=2000) — a summarizer uses 500 tokens; a deep reasoner uses 5,000. Each agent receives exactly what fits in its context window.
06 Write quotas & review flags Rate‑limit writes per namespace, flag writes for human review, add compliance tags for audit.

§ 03 — The Retrieval Pipeline

Five layers · zero inference calls

Five layers. No LLM. Everything deterministic.

When an agent calls recall(query, budget), five cooperating layers find, fuse, score, and fit the most relevant memories into the requested token ceiling. The store can hold ten million memories; the context window never sees more than the budget.

# Layer Backend Mechanism
01 Tag Match SQLite Tag index, FTS, exact + partial hits
02 Graph Expansion NetworkX / AGE Multi‑hop BFS (depth 2)
03 Vector Search ChromaDB / pgvector Cosine similarity
04 Fusion + Rank In‑process RRF (k=60) + PageRank + confidence decay
05 Diversity + Fit In‑process MMR (λ=0.7) + greedy token‑budget pack

Storage roles

Storage roles — document store, vector store, graph store

Every memory is persisted across three complementary stores. Every supported backend combination is just a different technology choice for one or more of these roles.

Role What it stores Why it exists
Document store The source of truth — content, tags, entity, category, timestamps, provenance, confidence Where add() commits; where recall() hydrates final memory text
Vector store Dense embedding per memory, keyed by memory ID Finds memories by meaning when no tag or word overlaps the query
Graph store Entity nodes + typed edges (uses, authored-by, supersedes) Connects memories indirectly — query “Python” can surface “Django” via the graph

Ingestion flow — what happens on add()

Example framing — a market intelligence system feeding a financial advisor pipeline. Every signal the desk cares about lands here.

flowchart TB
    N1["<b>News wires</b><br/>Reuters &bull; Bloomberg<br/><i>breaking headlines</i>"]
    N2["<b>Market data</b><br/>ticks &bull; OHLC<br/><i>prices &bull; volumes</i>"]
    N3["<b>Earnings reports</b><br/>10-K &bull; 10-Q &bull; 8-K<br/><i>guidance &bull; surprises</i>"]
    N4["<b>Leadership changes</b><br/>CEO &bull; CFO &bull; Board<br/><i>appointments &bull; exits</i>"]
    N5["<b>Geopolitical events</b><br/>tariffs &bull; sanctions<br/><i>policy &bull; conflict</i>"]

    subgraph SOURCES ["&sect; MARKET INTELLIGENCE SOURCES"]
        direction LR
        N1 ~~~ N2 ~~~ N3 ~~~ N4 ~~~ N5
    end

    SOURCES ==>|"<b>mem.add(content, tags, entity, provenance, ts)</b>"| API{{"<b>INGEST API</b>"}}

    D1["<b>Document store</b><br/>insert row<br/>content &bull; tags &bull; entity<br/>ts &bull; source &bull; confidence"]
    D2["<b>Vector store</b><br/>embed text<br/>&rarr; 384-d vector<br/>keyed by memory ID"]
    D3["<b>Graph store</b><br/>extract entities + edges<br/>issuer &bull; sector &bull; person<br/>country &bull; event"]

    subgraph WRITE ["&sect; PARALLEL WRITES &mdash; one logical transaction"]
        direction LR
        D1 ~~~ D2 ~~~ D3
    end

    API --> D1
    API --> D2
    API --> D3

    CD["<b>Contradiction check</b><br/>per entity &bull; per field<br/>e.g. JPM CFO is X vs new JPM CFO is Y"]

    D1 ==> CD
    D2 ==> CD
    D3 ==> CD

    CD ==>|newer fact wins| SUP["<b>Supersede older fact</b><br/>keep in timeline for audit"]
    SUP ==> DONE["<b>Committed &bull; recallable</b>"]

    style SOURCES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style WRITE   fill:#FBF8F1,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style API     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style CD      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style SUP     fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DONE    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style N1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N4      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style N5      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style D3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614

The three writes commit as one logical transaction. On SQL backends it’s a real DB transaction; on distributed backends it’s sequenced with best-effort rollback. Contradictions don’t overwrite — older facts are superseded and retained in the timeline so auditors can reconstruct what the desk knew, and when.


Ingestion flow — variant B • agent-to-agent conversation capture

A different kind of ingestion: the memories are not external feeds — they are the agents’ own conversation as they debate a trade proposal. Every turn is captured with speaker, timestamp, entity, and a decision marker.

flowchart TB
    T1["<b>Portfolio Planner</b><br/><i>&ldquo;Proposing to add 2% JPM<br/>ahead of Q3 print&rdquo;</i>"]
    T2["<b>Market Researcher</b><br/><i>&ldquo;Consensus EPS is +4% QoQ,<br/>whisper number suggests beat&rdquo;</i>"]
    T3["<b>Risk Analyst</b><br/><i>&ldquo;Tariff headline risk this week<br/>&mdash; raise stop to 8%&rdquo;</i>"]
    T4["<b>Compliance Reviewer</b><br/><i>&ldquo;Ok with position limit.<br/>Logging rationale.&rdquo;</i>"]
    DEC["<b>Decision reached</b><br/>buy 2% JPM &bull; stop 8% &bull; pre-earnings"]

    subgraph CONV ["&sect; AGENT-TO-AGENT CONVERSATION &mdash; trade proposal thread"]
        direction LR
        T1 --> T2
        T2 --> T3
        T3 --> T4
        T4 --> DEC
    end

    CAP["<b>Turn-level capture</b><br/>speaker &bull; utterance &bull; ts<br/>entity: JPM &bull; topic: position sizing<br/>kind: <i>conversation</i>"]

    CONV ==>|auto-capture hook<br/>every turn| CAP
    CAP ==>|"<b>mem.add(content, speaker, thread_id, kind=&quot;chat&quot;)</b>"| API{{"<b>INGEST API</b>"}}

    DOC[("<b>Document store</b><br/>turn rows, thread_id,<br/>speaker, ts, decision flag")]
    VEC[("<b>Vector store</b><br/>embedding per turn")]
    GR[("<b>Graph store</b><br/>edges: agent &rarr; entity<br/>agent &rarr; decision")]

    API --> DOC
    API --> VEC
    API --> GR

    DONE["<b>Thread memorialised</b><br/>replayable &bull; attributable &bull; auditable"]
    DOC ==> DONE
    VEC ==> DONE
    GR ==> DONE

    style CONV fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style CAP  fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style API  fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style DONE fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style T1   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T2   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T3   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style T4   fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style DEC  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DOC  fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style VEC  fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style GR   fill:#FBF8F1,stroke:#1A1614,color:#1A1614

Different from feed ingestion: the source is the agents themselves, not the outside world. Every turn is attributed to a speaker, tied to a thread, and flagged if it contained a decision. Nothing is paraphrased — the verbatim utterance is preserved so the reasoning can be reconstructed under audit.

Recall flow — what happens on recall()

Same market intelligence system — now the Portfolio Planner asks a real question ahead of the morning call.

flowchart TB
    U["<b>Financial Advisor</b><br/>typing into chat UI<br/>ahead of the 8am call"]
    BUBBLE["<b>Chat message</b><br/><i>What do we know about JPM&rsquo;s CFO transition<br/>and the fallout for US regional banks?</i>"]

    subgraph CHAT ["&sect; ADVISOR CHAT INTERFACE &mdash; human in the loop"]
        direction LR
        U ==> BUBBLE
    end

    CHAT ==>|routed to| AGENT(["<b>Portfolio Planner agent</b><br/>decomposes intent &bull; issues recall"])

    Q1["<i>JPM CFO transition</i>"]
    Q2["<i>Semiconductor supply-chain<br/>risk after latest tariff move</i>"]
    Q3["<i>Earnings surprises in<br/>US regional banks, last 90 days</i>"]

    subgraph QUERIES ["&sect; DECOMPOSED RECALL QUERIES &mdash; what the agent actually asks attestor"]
        direction LR
        Q1 ~~~ Q2 ~~~ Q3
    end

    AGENT ==> QUERIES
    QUERIES ==>|"<b>mem.recall(query, budget=2000)</b>"| API{{"<b>RECALL API</b>"}}

    L1["<b>01 &bull; Tag Match</b><br/>&rarr; document store<br/>FTS on JPM, CFO,<br/>tariff, earnings"]
    L2["<b>02 &bull; Graph Expansion</b><br/>&rarr; graph store<br/>BFS: JPM &rarr; CFO<br/>&rarr; Jeremy Barnum"]
    L3["<b>03 &bull; Vector Search</b><br/>&rarr; vector store<br/>cosine on query<br/>top-K nearest embeddings"]

    subgraph SOURCES ["&sect; STAGE A &mdash; parallel sources &bull; fan-out across 3 indexes"]
        direction LR
        L1 ~~~ L2 ~~~ L3
    end

    API --> L1
    API --> L2
    API --> L3

    IDS[("Candidate memory IDs<br/>~100s &bull; deduped")]

    L1 --> IDS
    L2 --> IDS
    L3 --> IDS

    L4["<b>04 &bull; Fusion &amp; Rank</b><br/>RRF k=60 &bull; PageRank boost on central entities<br/>&bull; confidence decay on stale prints"]
    L5["<b>05 &bull; Diversity &amp; Fit</b><br/>MMR &lambda;=0.7 &mdash; drop near-duplicate news wires<br/>&bull; greedy pack under 2,000 tokens"]
    OUT["<b>Portfolio Planner context</b>"]
    REPLY["<b>Chat reply to advisor</b><br/>sourced &bull; dated &bull; auditable"]

    IDS ==>|hydrate from doc store| L4
    L4 ==> L5
    L5 ==>|ranked memories &le; budget<br/>zero LLM calls in the path| OUT
    OUT ==>|grounded answer streamed to chat| REPLY

    style CHAT    fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style U       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style BUBBLE  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style AGENT   fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style REPLY   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style QUERIES fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style SOURCES fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style API     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style L1      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style L2      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style L3      fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style IDS     fill:#F5F1E8,stroke:#1A1614,color:#1A1614
    style L4      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style L5      fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style OUT     fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style Q1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style Q2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style Q3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614

Only memory IDs travel between layers until the hydrate step. A store with ten million market-intel rows still returns a tight result set inside the caller’s token ceiling. Graph expansion is the step that lets “tariff” surface memories about “TSMC” and “Nvidia” without either word appearing in the query.


Recall flow — variant B • agent context recall

A different kind of recall: no human in the loop. An agent resuming a task — or handing off to a peer — pulls back its own prior working context: earlier decisions, peer rationale, what was true at the last checkpoint.

flowchart TB
    RESUME["<b>Risk Analyst</b> resuming mid-task<br/>or <b>Compliance Reviewer</b> taking handoff<br/><i>no user prompt &mdash; agent self-initiated</i>"]

    INTENT["<b>Context intent</b><br/>&ldquo;what did the desk already decide<br/>about JPM this week, and why?&rdquo;"]

    RESUME ==> INTENT
    INTENT ==>|"<b>mem.recall_context(thread_id, entity=&quot;JPM&quot;, since=7d)</b>"| API{{"<b>CONTEXT RECALL API</b>"}}

    F1["<b>Thread filter</b><br/>same thread_id<br/>or same namespace"]
    F2["<b>Entity filter</b><br/>entity = JPM<br/>and related via graph"]
    F3["<b>Temporal filter</b><br/>within last 7d<br/>supersedes resolved"]
    F4["<b>Speaker filter</b><br/>peer agents in role<br/>RBAC-visible only"]

    subgraph FILTERS ["&sect; CONTEXT FILTERS &mdash; tighter than open-query recall"]
        direction LR
        F1 ~~~ F2 ~~~ F3 ~~~ F4
    end

    API --> FILTERS

    TURNS["<b>Prior conversation turns</b><br/>Planner proposal &bull; Researcher consensus<br/>Analyst stop raise &bull; Compliance sign-off"]
    DECS["<b>Prior decisions</b><br/>buy 2% JPM &bull; stop 8%<br/>decided 3 days ago"]
    DELTA["<b>What changed since</b><br/>new tariff headline today<br/>stop needs re-evaluation"]

    FILTERS ==> TURNS
    FILTERS ==> DECS
    FILTERS ==> DELTA

    PACK["<b>Context pack</b><br/>chronologically ordered<br/>&bull; speaker-attributed<br/>&bull; fit to agent token budget"]

    TURNS ==> PACK
    DECS ==> PACK
    DELTA ==> PACK

    PACK ==>|loaded into agent<br/>working memory| RESUMED["<b>Agent resumes task</b><br/>with full prior context<br/>&bull; no re-asking peers<br/>&bull; no lost decisions"]

    style RESUME   fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style INTENT   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style API      fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style FILTERS  fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F1       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F2       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F3       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style F4       fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style TURNS    fill:#F5F1E8,stroke:#1A1614,color:#1A1614
    style DECS     fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style DELTA    fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style PACK     fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style RESUMED  fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8

Differs from open-query recall in three ways: (1) the caller is an agent, not a human — triggered by resume / handoff, not by a chat message; (2) the filters are tighter — thread, namespace, RBAC, time window — not just semantic similarity; (3) the output preserves chronology and attribution rather than ranking purely by relevance. This is how a long-running pipeline stays coherent across restarts, handoffs, and multi-day workflows.


Isolation — namespace & RBAC boundary

Multi-tenant by construction. Every memory lives inside a namespace; every agent is bound to one of six roles (ORCHESTRATOR, PLANNER, EXECUTOR, RESEARCHER, REVIEWER, MONITOR); every call is authorised before it touches storage.

flowchart TB
    CALLER["<b>Incoming call</b><br/>agent identity &bull; API key / JWT<br/>claims: namespace, role, thread"]

    AUTH["<b>AuthZ gate</b><br/>verify signature &bull; resolve role<br/>&bull; check write quota per agent"]

    CALLER ==> AUTH

    R1["<b>ORCHESTRATOR</b><br/>spawns sub-agents<br/>full read/write"]
    R2["<b>PLANNER</b><br/>decomposes tasks<br/>writes plans + decisions"]
    R3["<b>EXECUTOR</b><br/>runs the work<br/>add + recall own thread"]
    R4["<b>RESEARCHER</b><br/>gathers facts<br/>often read-only"]
    R5["<b>REVIEWER</b><br/>audits decisions<br/>read + flag-for-review"]
    R6["<b>MONITOR</b><br/>observability only<br/>read + timeline"]

    subgraph ROLES ["&sect; RBAC ROLES &mdash; 6 built-in, customisable"]
        direction LR
        R1 ~~~ R2 ~~~ R3 ~~~ R4 ~~~ R5 ~~~ R6
    end

    AUTH ==> ROLES

    NS1["<b>namespace: fund-alpha</b><br/>portfolio planner threads<br/>risk analyst threads"]
    NS2["<b>namespace: fund-beta</b><br/>separate data plane<br/>no cross-read"]
    NS3["<b>namespace: research</b><br/>shared feed ingest<br/>read-only for funds"]

    subgraph TENANTS ["&sect; NAMESPACE BOUNDARIES &mdash; hard isolation"]
        direction LR
        NS1 ~~~ NS2 ~~~ NS3
    end

    ROLES ==>|"filtered by namespace + role"| TENANTS

    STORE[("<b>Doc &bull; Vector &bull; Graph</b><br/>row-level tenant column<br/>filtered on every query")]

    TENANTS ==> STORE

    style CALLER  fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style AUTH    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style ROLES   fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style TENANTS fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style STORE   fill:#FBF8F1,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style R1      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R2      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R3      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R4      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R5      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style R6      fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style NS1     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style NS2     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style NS3     fill:#FBF8F1,stroke:#C15F3C,color:#1A1614

Namespaces are enforced at the row level (tenant column on every row, filtered on every query), not just in application code. Cross-namespace reads require an ORCHESTRATOR or REVIEWER context. Every write records the agent’s trail (parent chain, session id, namespace) as provenance metadata so feed ingests cannot impersonate a peer.


Temporal — timeline & supersession

Attestor doesn’t overwrite — it supersedes. Every fact has a validity window, and the timeline is replayable to any point in the past. Auditors can answer not just “what does the desk know?” but “what did the desk know on 2026-04-10 at 08:00?”

flowchart LR
    F1["<b>fact v1</b><br/><i>JPM CFO is Jeremy Barnum</i><br/>valid_from: 2022-05<br/>valid_to: <b>&infin;</b><br/>confidence: 0.95"]
    F2["<b>fact v2</b><br/><i>JPM CFO is Jane Doe</i><br/>valid_from: 2026-04-11<br/>valid_to: <b>&infin;</b><br/>confidence: 0.98<br/>supersedes: v1"]
    F3["<b>fact v3</b><br/><i>JPM CFO appointment delayed</i><br/>valid_from: 2026-04-12<br/>valid_to: <b>&infin;</b><br/>confidence: 0.90<br/>supersedes: v2"]

    subgraph TL ["&sect; TIMELINE &mdash; same entity, multiple states, none deleted"]
        direction LR
        F1 ==>|"new fact ingested<br/>contradiction detected"| F2
        F2 ==>|"corrected headline<br/>contradiction detected"| F3
    end

    Q1["<b>recall today</b><br/><i>who is JPM CFO?</i>"]
    Q2["<b>recall as-of 2026-04-11</b><br/><i>who was JPM CFO yesterday?</i>"]
    Q3["<b>auditor replay</b><br/><i>show full timeline</i>"]

    Q1 -->|"valid_to = &infin;<br/>latest confident fact"| A1["<b>v3 only</b><br/>&ldquo;appointment delayed&rdquo;"]
    Q2 -->|"filter: valid_at 2026-04-11"| A2["<b>v2 only</b><br/>&ldquo;Jane Doe&rdquo;"]
    Q3 -->|"no filter<br/>full chain"| A3["<b>v1 &rarr; v2 &rarr; v3</b><br/>with supersession edges"]

    TL ==> Q1
    TL ==> Q2
    TL ==> Q3

    style TL fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F1 fill:#FBF8F1,stroke:#6B5F4F,color:#1A1614
    style F2 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style F3 fill:#FBF8F1,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style Q1 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style Q2 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style Q3 fill:#1A1614,stroke:#C15F3C,color:#F5F1E8
    style A1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style A2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style A3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614

Supersession is a graph edge, not a delete. The document store keeps every version; the graph store links v1 —supersedes→ v2. Recall defaults to “latest confident fact,” but any call can pass as_of to replay the past, which is how regulatory audit and post-mortem reconstruction both work on the same primitive.


Provenance — source-to-citation chain

Every sentence the agent writes back to the advisor is traceable to its source. Not a paraphrase of a paraphrase — a cryptographic chain from raw feed ingest to grounded answer.

flowchart LR
    RAW["<b>Raw signal</b><br/><i>Reuters wire, 08:14 UTC</i><br/>&ldquo;JPM names Jane Doe CFO&rdquo;<br/>sha256: a1b2c3&hellip;"]

    ING["<b>Ingest</b><br/>parse &bull; entity extract<br/>attach provenance envelope<br/>source_id &bull; source_ts &bull; hash"]

    MEM["<b>Memory row</b><br/>id: mem_42<br/>content, entity=JPM<br/>provenance: {source_id, hash,<br/>ingest_ts, confidence: 0.98}"]

    RECALL["<b>Retrieval</b><br/>5-layer pipeline<br/>returns mem_42 among others"]

    ANS["<b>Agent answer</b><br/><i>&ldquo;Jane Doe was named CFO on<br/>2026-04-11 [source: Reuters]&rdquo;</i><br/>citations: [mem_42]"]

    AUDIT["<b>Auditor click-through</b><br/>mem_42 &rarr; raw Reuters wire<br/>with timestamp + hash check"]

    RAW ==> ING
    ING ==> MEM
    MEM ==> RECALL
    RECALL ==> ANS
    ANS -.->|citation link<br/>resolves to| AUDIT
    AUDIT -.->|verifies hash<br/>against raw signal| RAW

    style RAW    fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style ING    fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style MEM    fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style RECALL fill:#FBF8F1,stroke:#C15F3C,color:#1A1614
    style ANS    fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style AUDIT  fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614

The citation in the agent’s reply is not a string the LLM chose to emit — it’s the memory_id carried through the retrieval pipeline. An auditor clicks the citation and lands on the raw wire with timestamp and content hash. If the upstream signal was tampered with, the hash check fails. This is what distinguishes grounded from plausible.


§ 04 — Deployment Matrix

Your cloud · your infrastructure · Terraform templates included

Same API. Every backend. Your infrastructure, not theirs.

Attestor ships as a Python library, a REST API, or a containerized service. Reference Terraform templates for Amazon Web Services (ECS Fargate + ArangoDB), Microsoft Azure (Container Apps + Cosmos DB), and Google Cloud Platform (Cloud Run + AlloyDB) live under attestor/infra/. Clone, set your own variables, terraform apply in your account — no SaaS middleman, no per-seat fees, no vendor lock-in.

$ pip install attestor
$ attestor api --host 0.0.0.0 --port 8080

Starlette ASGI on http://localhost:8080. SQLite + ChromaDB + NetworkX provision automatically under ~/.attestor. Point every agent in your stack at the same URL — they share memory instantly. No Docker. No API keys. Air‑gap it behind your firewall and walk away.

# Target Notes
01 AWS — ECS Fargate + ArangoDB ECS task with ArangoDB sidecar behind an ALB. Terraform template included — clone, adapt, terraform apply in your account.
02 Azure — Container Apps Cosmos DB DiskANN for doc + vector; NetworkX for graph. Terraform template included.
03 Google Cloud Platform — Cloud Run + AlloyDB AlloyDB (PostgreSQL + ScaNN + pgvector + AGE) behind Cloud Run. Terraform template included.
04 PostgreSQL backend pgvector + Apache AGE. Neon serverless or any Postgres 16. Doc · Vector · Graph
05 ArangoDB backend Multi‑model: graph + document + vector in one engine. Oasis or self‑hosted.
06 Local / On‑Prem SQLite + ChromaDB + NetworkX. Air‑gapped deployments. No network egress.

Same container, pluggable stores

Deployment matrix — one Attestor container, six targets, three storage roles swap per target

Every deployment is the same Python library wrapped in the same Starlette ASGI container. DocumentStore, VectorStore, and GraphStore are three interfaces; each column above is one implementation of each. Same API, same retrieval behavior, your infrastructure.

Runtime topology — three integration modes

The same Attestor engine runs in three shapes — same storage, same retrieval, different coupling. Pick by latency budget and blast radius.

flowchart TB
    subgraph M1 ["&sect; MODE A &mdash; EMBEDDED LIBRARY &bull; lowest latency"]
        direction LR
        AG1["<b>Agent process</b><br/>Python &bull; import attestor"]
        MW1["<b>Attestor in-proc</b><br/>AgentMemory('./store')"]
        ST1[("<b>Local stores</b><br/>SQLite · Chroma · NetworkX<br/>on local disk")]
        AG1 ==> MW1 ==> ST1
    end

    subgraph M2 ["&sect; MODE B &mdash; SIDECAR CONTAINER &bull; process isolation"]
        direction LR
        AG2["<b>Agent container</b><br/>any language<br/>HTTP client"]
        MW2["<b>Attestor sidecar</b><br/>attestor api<br/>on localhost:8080"]
        ST2[("<b>Shared volume</b><br/>or managed backends")]
        AG2 ==>|"HTTP / MCP<br/>same pod"| MW2 ==> ST2
    end

    subgraph M3 ["&sect; MODE C &mdash; SHARED SERVICE &bull; multi-agent mesh"]
        direction LR
        AF1["<b>Agent A</b>"]
        AF2["<b>Agent B</b>"]
        AF3["<b>Agent C</b>"]
        MW3["<b>Attestor service</b><br/>App Runner · Cloud Run<br/>· Container Apps"]
        ST3[("<b>Managed backends</b><br/>Postgres · ArangoDB · Cosmos")]
        AF1 ==> MW3
        AF2 ==> MW3
        AF3 ==> MW3
        MW3 ==> ST3
    end

    style M1 fill:#F5F1E8,stroke:#1A1614,stroke-width:2px,color:#1A1614
    style M2 fill:#F5F1E8,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style M3 fill:#F5F1E8,stroke:#C15F3C,stroke-width:3px,color:#1A1614
    style AG1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW1 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style ST1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AG2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW2 fill:#FBF8F1,stroke:#C15F3C,stroke-width:2px,color:#1A1614
    style ST2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF1 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF2 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style AF3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614
    style MW3 fill:#1A1614,stroke:#C15F3C,stroke-width:2px,color:#F5F1E8
    style ST3 fill:#FBF8F1,stroke:#1A1614,color:#1A1614

Mode A is sub-millisecond for a single agent prototyping on a laptop. Mode B adds language independence — a Go or Rust agent can call the sidecar over HTTP without Python in its image. Mode C is the production shape: one Attestor service in front of a multi-agent mesh, with managed storage behind. Code path is identical across all three — only configuration changes.

Promotion path

Promotion path — laptop to dev VM to managed cloud, same API throughout

Prototype on a laptop. Promote to Docker Compose on a VM without rewriting a single line. Promote to managed container runtime by swapping the storage URLs. The code never learns which backend it’s talking to.


§ 05 — Principles

What we won't compromise on

# Principle What it means
01 Self‑hosted by default Your data stays in your infrastructure. No SaaS middleman, no per‑seat fees, no lock‑in. Run on a laptop, a VM, or any cloud.
02 Deterministic retrieval Tag match, graph traversal, vector search, RRF fusion, MMR diversity — all deterministic. No LLM judges. No hidden inference calls in the critical path.
03 One API, every backend Same mem.recall() call whether the store is SQLite on a laptop or ArangoDB behind a Cloud Run service. Swap backends without rewriting agents.
04 Agent teams are first‑class Namespaces, roles, quotas, and provenance are not bolt‑ons. The primitives were designed for orchestrator–worker pipelines from day one.
05 Boring where it counts Postgres, SQLite, ChromaDB, NetworkX. Proven, debuggable, no magic. Terraform templates, not a hosted console.

Install attestor and point your agents at one URL. They share memory instantly.

↓ Reference documentation follows


Reference

Everything below is the technical manual. If you are evaluating, the pitch ended at § 05.

Table of Contents


Quick Start

poetry add attestor
from attestor import AgentMemory

mem = AgentMemory("./store")
mem.add("Architecture decision: event sourcing for order service",
        category="technical", entity="order-service", tags=["arch", "decision"])
results = mem.recall("how is the order service structured?", budget=2000)

For REST API self-host, MCP integration, and cloud deploy — see REST API, MCP Integration, Cloud Deployment. attestor doctor ~/.attestor verifies all four components (Document Store, Vector Store, Graph Store, Retrieval Pipeline).


Architecture

Attestor Architecture

The diagram above shows how a call to AgentMemory.recall() flows: the top-level API in core.py fans out across the three storage roles (document, vector, graph), the retrieval orchestrator fuses and ranks their results, and the scorer packs them into the caller’s token budget. The tree below enumerates every module referenced in the diagram.

Component Overview

attestor/
├── core.py                    # AgentMemory — public API surface (add / recall / search / timeline / health)
├── models.py                  # Memory + RetrievalResult dataclasses
├── context.py                 # AgentContext — multi-agent provenance, RBAC, token budgets
├── client.py                  # MemoryClient — drop-in HTTP client that mirrors AgentMemory
├── cli.py                     # CLI entry point (22 subcommands — add, recall, doctor, api, mcp, hook, ...)
├── api.py                     # Starlette ASGI REST API (8 routes)
├── locomo.py                  # LOCOMO benchmark runner (evaluation, not runtime)
├── mab.py                     # MAB benchmark runner (evaluation, not runtime)
├── store/
│   ├── base.py                # Abstract interfaces: DocumentStore, VectorStore, GraphStore
│   ├── registry.py            # Backend factory + selection by config
│   ├── connection.py          # Shared connection helpers
│   ├── embeddings.py          # Provider auto-detect (local / OpenAI / Bedrock / Vertex AI / Azure OpenAI)
│   ├── sqlite_store.py        # SQLite storage (WAL, 17 columns, 6 indexes)
│   ├── chroma_store.py        # ChromaDB vector search (local sentence-transformers)
│   ├── schema.sql             # SQLite schema definition
│   ├── postgres_backend.py    # PostgreSQL (pgvector + Apache AGE)
│   ├── arango_backend.py      # ArangoDB (native doc + vector + graph)
│   ├── aws_backend.py         # Amazon Web Services (DynamoDB + OpenSearch Serverless + Neptune)
│   ├── azure_backend.py       # Microsoft Azure (Cosmos DB DiskANN + NetworkX persisted to Cosmos)
│   └── gcp_backend.py         # Google Cloud Platform AlloyDB (PostgreSQL + ScaNN + Vertex AI embeddings)
├── graph/
│   ├── networkx_graph.py      # NetworkX MultiDiGraph with PageRank + multi-hop BFS
│   └── extractor.py           # Entity/relation extraction (50+ known tools)
├── retrieval/
│   ├── orchestrator.py        # 5-layer cascade with RRF fusion
│   ├── tag_matcher.py         # Stop-word filtered tag extraction
│   └── scorer.py              # Temporal + entity + PageRank boosts, confidence decay
├── temporal/
│   └── manager.py             # Contradiction detection + supersession
├── extraction/
│   ├── extractor.py           # Memory extraction orchestrator
│   ├── rule_based.py          # Deterministic rule-based extractor
│   └── llm_extractor.py       # Optional LLM-backed extractor
├── mcp/
│   └── server.py              # MCP server (8 tools, 2 resources, 2 prompts)
├── hooks/
│   ├── session_start.py       # Context injection at session start
│   ├── post_tool_use.py       # Auto-capture from Write/Edit/Bash
│   └── stop.py                # Session summary generation
├── utils/
│   ├── config.py              # MemoryConfig dataclass + load/save
│   └── tokens.py              # Token-budget helpers
└── infra/                     # Reference Terraform templates (you supply state + credentials)
    └── aws_openarangodb/      # VPC + ECS Fargate + ArangoDB sidecar + ALB (validated end-to-end)

core.py is the only module intended as a public API — every other path is internal and may change between releases. Agents running in-process import AgentMemory; agents talking to a remote Attestor service import MemoryClient from client.py (same method surface, HTTP transport).

Three Storage Roles

Every backend implements one or more of these roles:

Role Purpose Local Default Cloud Options
Document Core storage, CRUD, filtering SQLite PostgreSQL, AlloyDB, ArangoDB, DynamoDB, Cosmos DB
Vector Semantic similarity search ChromaDB pgvector, ScaNN (AlloyDB), ArangoDB, OpenSearch Serverless, Cosmos DiskANN
Graph Entity relationships, multi-hop BFS traversal NetworkX Apache AGE (Postgres/AlloyDB), ArangoDB, Neptune, NetworkX-on-Cosmos (Azure)

Cloud backends fill all three roles in a single service. Degradation is explicit and tiered: if the vector store is unreachable, retrieval falls back to tag + graph layers; if the graph store is unreachable, retrieval falls back to tag + vector; the document store is the only hard dependency. Non-fatal errors in vector or graph operations are caught and logged — the SQLite / document path never breaks.


How It Works

Memory is infrastructure, not a prompt attachment

Attestor runs as a separate tier — a library, a container, or a cloud service — that agents query on demand. Stored memories never enter the context window until an agent explicitly calls recall() with a token budget. Retrieval cost stays constant as the store grows from 100 to 5,000,000 memories; only the ranking candidate pool expands.

Token cost is bounded by budget, not store size

Naive context-injection approach:
  Month 1:   2K tokens loaded every message
  Month 6:  15K tokens loaded every message  ← context crowded

Attestor:
  Month 1:   ≤2K tokens returned per recall  (ranked from 100 memories)
  Month 6:   ≤2K tokens returned per recall  (ranked from 5,000 memories)
                                             ← bounded cost, deeper recall

How a recall works

When an agent calls memory_recall("deployment setup", budget=2000):

Store: 5,000 memories

  Tag search finds:     15 memories tagged "deployment"
  Graph search finds:    8 memories linked to "AWS", "Docker" entities
  Vector search finds:  20 semantically similar memories

  After dedup + RRF fusion:  30 unique candidates, scored and ranked

  Budget fitting (2,000 tokens):
    Memory A (score 0.95):  500 tokens → in   (total: 500)
    Memory B (score 0.90):  600 tokens → in   (total: 1,100)
    Memory C (score 0.88):  400 tokens → in   (total: 1,500)
    Memory D (score 0.85):  300 tokens → in   (total: 1,800)
    Memory E (score 0.80):  400 tokens → SKIP (exceeds 2,000)

  Result: 4 memories, 1,800 tokens. 4,996 memories never entered context.

MCP Integration

Attestor ships an MCP server so any MCP-compatible client (Claude Code, Cursor, Windsurf, custom agents) can store and retrieve memories. Start it with attestor mcp.

Tool Purpose Key Parameters
memory_add Store a fact content, tags[], category, entity, namespace, event_date, confidence
memory_recall Smart multi-layer retrieval query, budget (default: 2000), namespace
memory_search Filter with date ranges query, category, entity, namespace, status, after, before, limit
memory_get Fetch by ID memory_id
memory_forget Archive (soft delete) memory_id
memory_timeline Chronological entity history entity, namespace
memory_stats Store size, counts
memory_health Health check (call first!)

Categories

core_belief · preference · career · project · technical · personal · location · relationship · event · session · general

MCP Resources

  • attestor://entity/{name} — Entity details + related entities from graph
  • attestor://memory/{id} — Full memory object

MCP Prompts

  • recall — Search memories for relevant context
  • timeline — Chronological history of an entity

Retrieval Pipeline

The retrieval system uses a 5-layer cascade with multi-signal fusion:

Query: "deployment setup"
  │
  ├─ Layer 0: Graph Expansion
  │  Extract entities from query → BFS traversal (depth=2)
  │  "deployment" → finds "AWS", "Docker", "Terraform" connections
  │
  ├─ Layer 1: Tag Match (SQLite)
  │  extract_tags(query) → tag_search() → score 1.0
  │
  ├─ Layer 2: Entity-Field Search
  │  Memories about graph-connected entities → score 0.5
  │
  ├─ Layer 3: Vector Search (ChromaDB)
  │  Semantic similarity → score = 1 - cosine_distance
  │
  ├─ Layer 4: Graph Relation Triples
  │  Inject relationship context → score 0.6
  │
  ▼ FUSION
  ├─ Reciprocal Rank Fusion (RRF, k=60)
  │  score = Σ 1/(k + rank_in_source)
  │  OR Graph Blend: 0.7 * norm_vector + 0.3 * norm_pagerank
  │
  ▼ SCORING
  ├─ Temporal Boost: +0.2 * max(0, 1 - age_days/90)
  ├─ Entity Boost:   +0.30 exact match, +0.15 substring
  ├─ PageRank Boost:  +0.3 * entity_pagerank_score
  │
  ▼ DIVERSITY
  ├─ MMR Rerank: λ*relevance - (1-λ)*max_jaccard_similarity (λ=0.7)
  │
  ▼ CONFIDENCE
  ├─ Time Decay:    -0.001 per hour since last access
  ├─ Access Boost:  +0.03 per access_count
  ├─ Clamp:         [0.1, 1.0]
  │
  ▼ BUDGET
  └─ Greedy selection by score until token budget filled

Querying "Python" also finds memories about "FastAPI" if they're connected in the entity graph. Multi-hop reasoning through relationship traversal.


Python API

Basic Usage

from attestor import AgentMemory

mem = AgentMemory("./my-agent")  # auto-provisions all backends

# Store
mem.add("User prefers Python over Java",
        tags=["preference", "coding"],
        category="preference",
        entity="Python")

# Recall with token budget
results = mem.recall("what language?", budget=2000)

# Formatted context for prompt injection
context = mem.recall_as_context("user background", budget=4000)

# Search with filters
memories = mem.search(category="project", entity="Python", limit=10)

# Timeline
history = mem.timeline("Python")

# Contradiction handling — automatic
mem.add("User works at Google", tags=["career"], category="career", entity="Google")
mem.add("User works at Meta", tags=["career"], category="career", entity="Meta")
# ^ Google memory auto-superseded

# Namespace isolation
mem.add("Team standup at 9am", namespace="team:alpha")
results = mem.recall("standup time", namespace="team:alpha")

# Maintenance
mem.forget(memory_id)             # Archive
mem.forget_before("2025-01-01")   # Archive old memories
mem.compact()                     # Permanently delete archived
mem.export_json("backup.json")    # Export
mem.import_json("backup.json")    # Import (dedup by content hash)

# Health & stats
mem.health()  # → {sqlite: ok, chroma: ok, networkx: ok, retrieval: ok}
mem.stats()   # → {total: 500, active: 480, ...}

# Context manager
with AgentMemory("./store") as mem:
    mem.add("auto-closed on exit")

Memory Object

@dataclass
class Memory:
    id: str                    # UUID
    content: str               # The actual fact/observation
    tags: List[str]            # Searchable tags
    category: str              # Classification (preference, career, project, ...)
    entity: str                # Primary entity (company, tool, person)
    namespace: str             # Isolation key (default: "default")
    created_at: str            # ISO timestamp
    event_date: str            # When the fact occurred
    valid_from: str            # Temporal validity start
    valid_until: str           # Set when superseded
    superseded_by: str         # ID of replacement memory
    confidence: float          # 0.0-1.0
    status: str                # active | superseded | archived
    access_count: int          # Times recalled
    last_accessed: str         # Last recall timestamp
    content_hash: str          # SHA-256 for dedup
    metadata: Dict[str, Any]   # Arbitrary JSON

Multi-Agent Systems

Attestor is built for production multi-agent pipelines — orchestrator-worker, planner-executor, researcher-reviewer, and hierarchical swarms. Every recall and write is scoped to an AgentContext that carries identity, role, namespace, parent trail, token budget, write quota, and visibility policy. Contexts are immutable; spawning a sub-agent returns a new context with inherited provenance.

from attestor.context import AgentContext, AgentRole, Visibility

# Create a root context
ctx = AgentContext.from_env(
    agent_id="orchestrator",
    namespace="project:acme",
    role=AgentRole.ORCHESTRATOR,
    token_budget=20000,
)

# Spawn child contexts for sub-agents (immutable — returns new instance)
planner = ctx.as_agent("planner", role=AgentRole.PLANNER, token_budget=5000)
researcher = ctx.as_agent("researcher", role=AgentRole.RESEARCHER, read_only=True)

# Provenance tracking — metadata auto-enriched
planner.add_memory("Architecture decision: use event sourcing",
                   category="technical", visibility=Visibility.TEAM)
# metadata includes: _agent_id, _session_id, _namespace, _visibility, _role

# Recall is scoped to namespace + cached within session
results = researcher.recall("architecture decisions")

# Token budget tracked
print(researcher.token_budget - researcher.token_budget_used)

# Governance
researcher.flag_for_review("Need human approval for deployment plan")
researcher.add_compliance_tag("SOC2")

# Session introspection
summary = ctx.session_summary()
# → {agent_trail, memories_written, memories_recalled, token_usage, review_flags}

AgentContext Features

Feature Description
Namespace isolation Each agent/project gets isolated memory partition
RBAC roles ORCHESTRATOR, PLANNER, EXECUTOR, RESEARCHER, REVIEWER, MONITOR
Read-only mode Agents can recall but not write
Write quotas max_writes_per_agent (default: 100)
Token budgets Per-agent budget tracking
Recall cache Dedup redundant queries within a session
Scratchpad Inter-agent data passing
Provenance Agent trail, parent tracking, visibility levels
Compliance Review flags, compliance tags for audit
Distributed mode Set memory_url to use HTTP client instead of local

Cloud Backends

Each cloud backend fills all three roles (document, vector, graph) in a single service:

PostgreSQL (Neon, Cloud SQL, self-hosted)

Uses pgvector for vectors, Apache AGE for graph. AGE is optional — without it, graph gracefully degrades.

mem = AgentMemory("./store", config={
    "backends": ["postgres"],
    "postgres": {"url": "postgresql://user:pass@host:5432/attestor"}
})

ArangoDB (ArangoGraph Cloud, Docker)

Native document, vector, and graph support in one database.

mem = AgentMemory("./store", config={
    "backends": ["arangodb"],
    "arangodb": {"url": "https://instance.arangodb.cloud:8529", "database": "attestor"}
})

Azure (Cosmos DB)

Cosmos DB with DiskANN vector indexing. Graph via NetworkX persisted to Cosmos containers.

mem = AgentMemory("./store", config={
    "backends": ["azure"],
    "azure": {"cosmos_endpoint": "https://account.documents.azure.com:443/"}
})

Google Cloud Platform (AlloyDB)

Extends PostgreSQL backend with AlloyDB Connector (IAM auth) and Vertex AI embeddings (768D).

mem = AgentMemory("./store", config={
    "backends": ["gcp"],
    "gcp": {"project_id": "my-project", "cluster": "attestor", "instance": "primary"}
})

Installing cloud extras

poetry add "attestor[postgres]"    # PostgreSQL
poetry add "attestor[arangodb]"    # ArangoDB
poetry add "attestor[aws]"         # AWS (DynamoDB + OpenSearch + Neptune)
poetry add "attestor[azure]"       # Azure Cosmos DB
poetry add "attestor[gcp]"         # Google Cloud Platform AlloyDB + Vertex AI
poetry add "attestor[docker]"      # opt-in local Docker auto-start for ArangoDB when backend.docker = true and mode = "local"
poetry add "attestor[all]"         # Everything
  • pipx install "attestor[docker]" — opt-in local Docker auto-start for ArangoDB when backend.docker = true and mode = "local".

Cloud Deployment

The attestor/infra/ directory ships reference Terraform templates — not push-button deploy scripts. Clone the template that matches your target cloud, set your own variables, supply your own credentials, and terraform apply from your own workstation. We have validated each template end-to-end; you keep ownership of state, secrets, and account.

cd attestor/infra/aws_openarangodb
cp variables.tf my.tfvars            # edit: arango_password, region, project_name
terraform init
terraform apply -var-file=my.tfvars
# tear down
terraform destroy -var-file=my.tfvars

Prerequisites: Docker (to build the image), Terraform ≥ 1.5, the relevant cloud CLI (aws/gcloud/az) authenticated to your account.

Cloud Reference template What it provisions Status
Amazon Web Services attestor/infra/aws_openarangodb/ VPC + ECR + ECS Fargate task (attestor + ArangoDB sidecar) + ALB Validated end-to-end
Microsoft Azure Container Apps + Cosmos DB Backend code ships in store/azure_backend.py; Terraform template forthcoming Backend ready, template pending
Google Cloud Platform Cloud Run + AlloyDB Backend code ships in store/gcp_backend.py; Terraform template forthcoming Backend ready, template pending

Today only the AWS template is shipped. Azure and Google Cloud Platform users can run the backend directly against their existing managed services (Cosmos DB / AlloyDB) while the deploy templates are being finalised.

Sensitive variables (passwords, API keys, account IDs) are declared sensitive = true in the templates and read from your .tfvarsnever commit .tfvars or *.tfstate*; both are already in .gitignore.

REST API Endpoints

All deployments expose the same Starlette ASGI API:

Method Endpoint Description
GET /health Component health check
GET /stats Store statistics
POST /add Add a memory
POST /recall Smart retrieval with budget
POST /search Filtered search
POST /timeline Entity chronological history
POST /forget Archive a memory
GET /memory/{id} Get memory by ID

Response envelope: {"ok": true, "data": {...}} or {"ok": false, "error": "message"}


Embedding Providers

Attestor auto-detects the best available embedding provider:

Priority Provider Model Dimensions Trigger
1 Cloud-native Bedrock Titan / Azure OpenAI / Vertex AI 768-1536 Cloud backend configured
2 OpenAI / OpenRouter text-embedding-3-small 1536 OPENAI_API_KEY or OPENROUTER_API_KEY set
3 Local (default) all-MiniLM-L6-v2 384 Always available, no API key

The local fallback downloads ~90MB on first use. All providers implement the same interface — switching is transparent.


CLI Reference

Canonical CLI is attestor. Legacy aliases memwright and agent-memory also work for backward compatibility:

MCP Server

attestor mcp                          # Start MCP server (uses ~/.attestor)
attestor mcp --path /custom/path      # Custom store location

Memory Operations

attestor add ./store "User prefers Python" --tags "pref,coding" --category preference
attestor recall ./store "what language?" --budget 4000
attestor search ./store --category project --entity Python --limit 20
attestor list ./store --status active --category technical
attestor timeline ./store --entity Python
attestor get ./store <memory-id>
attestor forget ./store <memory-id>

Maintenance

attestor doctor ~/.attestor            # Health check (SQLite, ChromaDB, NetworkX, Retrieval)
attestor stats ./store                 # Memory counts, DB size, breakdowns
attestor export ./store -o backup.json
attestor import ./store backup.json
attestor compact ./store               # Permanently delete archived memories
attestor inspect ./store               # Raw DB inspection

Lifecycle Hooks

attestor hook session-start           # Inject context at agent session start
attestor hook post-tool-use           # Auto-capture tool observations
attestor hook stop                    # Generate session summary on exit

Hooks integrate with any harness that supports session lifecycle callbacks.


Configuration

Store location

Default: ~/.attestor/. Configurable with --path on any CLI command.

~/.attestor/
├── memory.db        # SQLite database (core storage)
├── config.json      # Retrieval tuning parameters
├── graph.json       # NetworkX entity graph
└── chroma/          # ChromaDB vector store + embeddings

config.json

All fields optional. Defaults apply if the file doesn't exist:

{
  "default_token_budget": 2000,
  "min_results": 3,
  "backends": ["sqlite", "chroma", "networkx"],
  "enable_mmr": true,
  "mmr_lambda": 0.7,
  "fusion_mode": "rrf",
  "confidence_gate": 0.0,
  "confidence_decay_rate": 0.001,
  "confidence_boost_rate": 0.03
}
Parameter Default Description
default_token_budget 2000 Max tokens returned per recall
min_results 3 Minimum results to return
enable_mmr true Maximal Marginal Relevance diversity reranking
mmr_lambda 0.7 Relevance vs diversity balance (0=diverse, 1=relevant)
fusion_mode "rrf" "rrf" (parameter-free) or "graph_blend" (weighted)
confidence_decay_rate 0.001 Score penalty per hour since last access
confidence_boost_rate 0.03 Score boost per access count
confidence_gate 0.0 Minimum confidence threshold to include in results

Environment Variables

Variable Purpose
ATTESTOR_PATH Default store path
ATTESTOR_URL Remote API URL (distributed mode)
ATTESTOR_NAMESPACE Default namespace
ATTESTOR_TOKEN_BUDGET Default token budget
ATTESTOR_SESSION_ID Session ID for provenance tracking

Testing

Running Tests

# All unit tests — no Docker, no API keys
poetry run pytest tests/ -v

# With coverage
poetry run pytest tests/ -v --cov=attestor --cov-report=term-missing

# Live integration tests (need credentials)
NEON_DATABASE_URL='postgresql://...' poetry run pytest tests/test_postgres_live.py -v
AZURE_COSMOS_ENDPOINT='https://...' poetry run pytest tests/test_azure_live.py -v

Test Coverage

  • 607 unit tests covering all backends, retrieval, config, embeddings, and CLI
  • 14 live integration tests per cloud backend (Neon, Azure, ArangoDB)
  • Mock tests for every cloud backend — no cloud account needed
  • All unit tests run without Docker or API keys

Compatibility

MCP Clients

Client Config File
Any MCP client Standard MCP stdio transport
Claude Code .mcp.json (project) or ~/.claude/.mcp.json (global)
Cursor .cursor/mcp.json
Windsurf MCP config in settings

Same attestor mcp command for every client.

Python

  • Python 3.10, 3.11, 3.12, 3.13, 3.14

Uninstall

1. Remove MCP server config (if used)

Delete the memory entry from your MCP client's config file.

2. Uninstall the package

poetry remove attestor

3. Delete stored memories (optional)

# Export first if you want a backup
attestor export ~/.attestor -o attestor-backup.json

# Then delete
rm -rf ~/.attestor

License

MIT


mcp-name: io.github.bolnet/attestor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attestor-3.0.0.tar.gz (193.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attestor-3.0.0-py3-none-any.whl (210.4 kB view details)

Uploaded Python 3

File details

Details for the file attestor-3.0.0.tar.gz.

File metadata

  • Download URL: attestor-3.0.0.tar.gz
  • Upload date:
  • Size: 193.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for attestor-3.0.0.tar.gz
Algorithm Hash digest
SHA256 6437446a988fe4917d5acd0f540f849ce2d792df959f0e11ea227d4a61eddd55
MD5 0224356d9fe1114004966c8328711a62
BLAKE2b-256 05d285069ea752afb7b067094affdd05a16dd0e80477638f02280f33b0e1debe

See more details on using hashes here.

Provenance

The following attestation bundles were made for attestor-3.0.0.tar.gz:

Publisher: workflow.yml on bolnet/attestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file attestor-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: attestor-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 210.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for attestor-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32081b71325bb3cfe8bdc9be7cedd71352e4b8bfd65994b30a824e6ae6482019
MD5 6c56d2e11f04f488805d1cc7cea5af1c
BLAKE2b-256 9f0ee35f0c88b3913b5dd94565298db066c0cb27df6f6b27df8586099b14b8b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for attestor-3.0.0-py3-none-any.whl:

Publisher: workflow.yml on bolnet/attestor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page