Skip to main content

Layered memory platform for AI agents โ€” log-first, RL-scored, consolidation-backed

Project description

๐Ÿง  extremis

Your AI agent finally remembers. No more re-explaining your project.

Session 1: you describe your stack, your auth setup, your preferences.
Session 2: your agent already knows โ€” because it learned.

Python License: MIT PyPI npm CI MCP Cloud Glama LongMemEval-S R@5


Works with: Claude Code ย ยทย  Cursor ย ยทย  Windsurf ย ยทย  Claude Desktop ย ยทย  OpenAI ย ยทย  Gemini CLI ย ยทย  any MCP agent


Deploy to Render

One click ยท auto-provisions Postgres ยท memory persists across restarts


Benchmark results

Evaluated on LongMemEval-S โ€” 500 QA instances, each backed by ~53 timestamped conversation sessions.

Metric Score What it measures
Retrieval R@5 94.4% Top-5 recalled memories include the answer session
QA accuracy 38.8% Correct answer given recalled context (claude-haiku-4-5 judge)

The retrieval number is the meaningful one โ€” it measures what extremis controls. QA accuracy is a joint score with the downstream LLM; replacing Haiku with a stronger model raises it significantly.

Raw results (500 instances) ยท Benchmark script ยท Reproduce it yourself


Quickest start โ€” wrap your existing LLM client

Change one import. Get persistent, learning memory for free.

Claude (Anthropic)

# Before
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")

# After โ€” one line change, nothing else in your app changes
from extremis.wrap import Anthropic
from extremis import Extremis

client = Anthropic(api_key="sk-ant-...", memory=Extremis())

# Your existing code works unchanged
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What's my name?"}]
)
# extremis recalled context before the call, saved the conversation after

OpenAI

from extremis.wrap import OpenAI
from extremis import Extremis

client = OpenAI(api_key="sk-...", memory=Extremis())
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What did we discuss last time?"}]
)
pip3.11 install "extremis[wrap-anthropic]"   # for Claude
pip3.11 install "extremis[wrap-openai]"      # for OpenAI

The problem

You're building a coding agent. Session 1: you explain your project structure, your auth setup using jose middleware, your preferred patterns. Session 2: the agent asks you to explain everything again.

Or you're building a support agent. It learned a customer hates terse responses. Next conversation: terse again. The learning is gone.

Every team building an AI agent hits the same wall.

Your agent forgets everything the moment a session ends. So you add memory. You set up a vector database, write chunking logic, figure out retrieval ranking, handle stale entries, add multi-user isolation. Three weeks later you've built a half-working RAG pipeline and still haven't shipped the actual feature.

And even when you ship it โ€” it doesn't learn. Every memory is treated identically. The fact your agent recalled a hundred times and the user loved sits next to one it got wrong once. Nothing improves. There's no feedback loop. You're running the same dumb cosine search forever.

The other problem is lock-in. Your vectors are in Pinecone. Moving them means re-embedding everything, rewriting your retrieval logic, and hoping nothing breaks.

extremis solves all three.


What makes extremis different

1. Memory that forgets intelligently

Every competitor focuses on storing memory. Nobody talks about forgetting.

Human memory doesn't keep everything forever โ€” unimportant things fade, important things strengthen. Agents with infinite, flat memory become slow and noisy over time. Intelligent forgetting is the hard problem nobody is solving.

extremis does two things here: recency decay (old memories rank lower automatically) and asymmetric RL weighting (negative feedback hurts 1.5ร— more than positive feedback helps, because mistakes should leave a stronger mark). The result is a memory that naturally surfaces what matters and buries what doesn't.

mem = Extremis(config=Config(
    recency_half_life_days=30,  # episodic memories halve in rank every 30 days
    rl_alpha=0.8,               # strong RL signal โ€” useful things stick, useless things fade
))

# This memory will rank lower in every future search
mem.report_outcome([bad_memory_id], success=False, weight=1.0)
# โ†’ score decreases by 1.5 (not 1.0 โ€” the asymmetry is intentional)

2. Memory that explains itself

Agents make decisions based on memory. But why did it recall that specific memory? Without explainability you're guessing, debugging is painful, and auditing is impossible.

Every recall() result includes a plain-English reason:

results = mem.recall("what does the user prefer?")

for r in results:
    print(r.memory.content)
    print(r.reason)

# "User prefers concise answers, no filler words"
# โ†’ "similarity 0.91 ยท score +4.0 ยท used 8ร— ยท 3d old"

# "User prefers dark mode in all UIs"
# โ†’ "semantic (always included) ยท similarity 0.73 ยท score +1.0 ยท used 3ร— ยท 12d old"

# "User once mentioned preferring email over Slack"
# โ†’ "similarity 0.54 ยท score -1.5 ยท first recall ยท 45d old"

The reason tells you: how semantically relevant it was, how much feedback has validated it, how many times it's been used, and how old it is. Auditable. Debuggable.


3. Cross-agent shared memory

Right now memory is per-agent. But the next wave of AI is agent teams โ€” a research agent, a writing agent, a review agent, all working together. They need a shared brain.

extremis's namespace model already supports this. Multiple agents can read from and write to the same memory pool:

# All three agents share the same memory namespace
research = Extremis(config=Config(namespace="team_alpha"))
writer   = Extremis(config=Config(namespace="team_alpha"))
reviewer = Extremis(config=Config(namespace="team_alpha"))

# Research agent stores what it found
research.remember("GPT-4 outperforms Claude on math benchmarks by 12%")
research.remember("Source: Stanford HAI report, April 2026")

# Writing agent recalls it without any extra wiring
results = writer.recall("GPT-4 performance data")
# โ†’ [GPT-4 outperforms Claude on math benchmarks by 12%]
# โ†’ [Source: Stanford HAI report, April 2026]

# Knowledge graph is shared too
research.kg_add_entity("Stanford HAI", EntityType.ORG)
research.kg_add_relationship("Stanford HAI", "HAI Report", "published")
print(writer.kg_query("Stanford HAI"))  # same graph

4. No RAG pipeline to build

One pip install. Two lines of config. extremis handles embedding, storage, retrieval ranking, consolidation, and the knowledge graph. You call remember() and recall().

# Local โ€” zero infra
from extremis import Extremis
mem = Extremis()

# Your existing vector store
mem = Extremis(config=Config(store="pinecone", pinecone_api_key="..."))

# Self-hosted server โ€” no model download on the client
from extremis import HostedClient
mem = HostedClient(api_key="extremis_sk_...", base_url="http://your-server:8000")

# Same three lines work for all three
mem.remember("User is building a WhatsApp AI", conversation_id="c1")
results = mem.recall("what is the user building?")
mem.report_outcome([r.memory.id for r in results], success=True)

5. Backend portability โ€” no lock-in

Your vectors in Pinecone. Your team moves to Chroma. Your product needs Postgres. One command, everything migrates โ€” and re-embeds automatically if you're switching models:

extremis-migrate --from pinecone --to postgres \
  --source-pinecone-api-key pk_... \
  --dest-postgres-url postgresql://...

# Switching to OpenAI embeddings at the same time
extremis-migrate --from sqlite --to chroma \
  --dest-embedder text-embedding-3-small

Coming soon

Memory health dashboard โ€” freshness score, contradiction count, retrieval hit rate, coverage gaps. Memory observability nobody is building yet.

Domain profiles โ€” pre-built memory configurations for common agent types:

# Coming in v0.2
from extremis.profiles import SalesAgent, CodingAgent, SupportAgent

mem = Extremis(profile=SalesAgent())
# Knows to remember: customer names, deal stage, objections, preferences
# Knows to forget: small talk after 7 days, meeting logistics after 24h
# Attention: high for "budget", "decision maker", "timeline"

How extremis compares

extremis Mem0 LangChain Zep Raw Pinecone
Self-hostable โœ… โŒ cloud only โœ… โœ… โœ…
Backend-agnostic โœ… 4 backends โŒ โš ๏ธ manual โŒ โ€”
RL-scored retrieval โœ… โŒ โŒ โŒ โŒ
Asymmetric feedback (1.5ร—) โœ… โŒ โŒ โŒ โŒ
Intelligent forgetting โœ… โŒ โŒ โŒ โŒ
Knowledge graph โœ… โŒ โŒ โœ… โŒ
5-layer memory โœ… โš ๏ธ basic โš ๏ธ basic โš ๏ธ basic โŒ
Log-first durability โœ… โŒ โŒ โŒ โŒ
Migration CLI โœ… โŒ โŒ โŒ โ€”
MCP server (Claude Code / Cursor) โœ… โŒ โŒ โŒ โŒ
Open source โœ… MIT โš ๏ธ partial โœ… โœ… โ€”

How it works

The intelligence layer

extremis sits above your vector store. RL scoring, the knowledge graph, consolidation, and attention scoring are all backend-independent โ€” they work the same whether your vectors are in SQLite, Pinecone, or Chroma.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     YOUR APP / AGENT                            โ”‚
โ”‚      remember() ยท recall() ยท report_outcome() ยท kg_*()         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  EXTREMIS INTELLIGENCE LAYER                      โ”‚
โ”‚   RL scoring ยท Knowledge graph ยท Consolidation ยท Observer       โ”‚
โ”‚   Attention scorer ยท Namespace isolation ยท Log durability       โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚              โ”‚              โ”‚              โ”‚
โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”
โ”‚SQLite  โ”‚   โ”‚Postgres โ”‚   โ”‚  Chroma  โ”‚   โ”‚Pinecone โ”‚
โ”‚(local) โ”‚   โ”‚+pgvectorโ”‚   โ”‚ (local)  โ”‚   โ”‚(hosted) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The memory flow

Every conversation
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  remember("user said X")     โ”€โ”€โ–ถ  fsync to JSONL log (durable)
                                    + episodic memory (embedded + stored)

  recall("topic")             โ”€โ”€โ–ถ  embed query
                                    โ†’ identity + procedural  (always included)
                                    โ†’ semantic + episodic    (ranked by score)
                                    โ† ranked results

  report_outcome(ids, +1/-1)  โ”€โ”€โ–ถ  adjust utility scores
                                    negative gets 1.5ร— weight (human memory bias)

Periodically
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  consolidate()               โ”€โ”€โ–ถ  read log since last checkpoint
                                    โ†’ Claude Haiku extracts facts
                                    โ†’ semantic/procedural memories written
                                    โ†’ checkpoint advanced (safe to re-run)

Retrieval ranking

Every recalled memory gets a final_rank that balances three signals:

final_rank = cosine_similarity
           ร— (1 + ฮฑ ยท tanh(utility_score))   โ† learned from feedback
           ร— exp(โˆ’ln2 ยท age_days / half_life) โ† recency decay

A memory that has proven useful (+1 feedback) ranks above an equally similar but unvalidated memory. Negative signals apply 1.5ร— weight โ€” the same asymmetry human threat-learning uses.

Memory layers

Layer What it holds Written by Always recalled?
identity Who the user fundamentally is Human review only โœ… Always
procedural Behavioural rules: "ask about deadline first" Consolidator โœ… Always
semantic Durable facts: "user is a solo Python developer" Consolidator By relevance
episodic Timestamped conversation events remember() By relevance
working Session-scoped, expires on a set datetime remember_now() By relevance

Knowledge graph

Beyond vectors, extremis maintains a structured graph โ€” answers structural questions that semantic search can't:

mem.kg_add_entity("Alice", EntityType.PERSON)
mem.kg_add_entity("Acme Corp", EntityType.ORG)
mem.kg_add_relationship("Alice", "Acme Corp", "works_at", weight=0.95)
mem.kg_add_attribute("Alice", "timezone", "Asia/Dubai")
mem.kg_add_attribute("Alice", "tone", "formal")

# "Who does Alice work for?" โ€” can't answer with cosine similarity alone
result = mem.kg_query("Alice")
# โ†’ Entity + all relationships + all attributes + BFS traverse

# Two-hop traverse
graph = mem.kg_traverse("Alice", depth=2)

Attention scoring

Before deciding how much to engage with an incoming message, score it โ€” free, zero LLM cost:

score = sender_score + channel_score + content_score + context_score  (0โ€“100)

full      โ‰ฅ 75  โ†’ engage fully
standard  โ‰ฅ 50  โ†’ balanced response  
minimal   โ‰ฅ 25  โ†’ brief acknowledgement
ignore    < 25  โ†’ skip

Observer (log compression)

Compresses raw log entries into priority-tagged observations โ€” no LLM, runs instantly:

๐Ÿ”ด CRITICAL  decisions, errors, deadlines, shipped/launched, reward signals
๐ŸŸก CONTEXT   reasons, insights, learnings, "because", "discovered"
๐ŸŸข INFO      everything else

Install

Requires Python 3.11+

If pip install extremis says "no matching distribution found" โ€” your default pip points to Python 3.9 or older. This is common on macOS.

Check your version: python3 --version

Platform Fix
macOS brew install python@3.11 then use pip3.11
Linux sudo apt install python3.11 python3.11-pip
Windows python.org/downloads
# Confirm you have Python 3.11+
python3.11 --version

# Core โ€” SQLite + local sentence-transformers (no API key needed)
pip3.11 install extremis

# + MCP server (Claude Desktop / Code)
pip3.11 install "extremis[mcp]"

# + Postgres backend
pip3.11 install "extremis[postgres]"

# + Chroma backend
pip3.11 install "extremis[chroma]"

# + Pinecone backend
pip3.11 install "extremis[pinecone]"

# + OpenAI embeddings (swap out the 90 MB model download)
pip3.11 install "extremis[openai]"

# + LLM client wrappers (Claude / OpenAI โ€” automatic memory, one import change)
pip3.11 install "extremis[wrap-anthropic]"   # for Claude
pip3.11 install "extremis[wrap-openai]"      # for OpenAI

# + Hosted API server
pip3.11 install "extremis[server]"

# + Python SDK for hosted cloud
pip3.11 install "extremis[client]"

# Everything
pip3.11 install "extremis[all]"

Requires Python 3.11+

First run note โ€” sentence-transformers downloads all-MiniLM-L6-v2 (~90 MB) on first use. One-time, cached to ~/.cache/huggingface/. To skip it, use OpenAI embeddings: EXTREMIS_EMBEDDER=text-embedding-3-small.


SDKs

Language Package Source
Python pip install extremis src/extremis/
TypeScript npm install @extremis/sdk sdk/typescript/

All SDKs talk to the same /v1/* HTTP API and expose the same hallucination-detection signals โ€” effective_confidence, verification.verdict, and per-issue recommendations โ€” as first-class typed fields.

TypeScript quick start

import { ExtremisClient } from "@extremis/sdk";

const mem = new ExtremisClient({ apiKey: "extremis_sk_..." });
await mem.remember("User is building a WhatsApp AI product");
const results = await mem.recall("WhatsApp product");

for (const r of results) {
  console.log(r.memory.content, r.effective_confidence);
  for (const rec of r.sources?.recommendations ?? []) {
    console.warn(`[${rec.severity}] ${rec.issue} โ€” ${rec.action}`);
  }
}

Zero runtime dependencies. Node 18+, Bun, Deno, Cloudflare Workers, browsers. Full TypeScript SDK docs โ†’


Hallucination detection

Production memory systems need to know when an extracted "fact" is actually a hallucination. extremis runs a three-layer detection stack at consolidation time:

  1. NLI faithfulness โ€” local cross-encoder checks whether each extracted memory is entailed by the source conversation
  2. LLM-as-judge โ€” escalates borderline (0.5โ€“0.85) scores to a Claude Haiku judge for a structured verdict
  3. Self-consistency โ€” for IDENTITY and SEMANTIC layers, samples the extractor N times and keeps only claims that converge in embedding space

Failing memories are tagged and downranked, never silently dropped. Every flagged memory carries actionable recommendations โ€” what to do now (action) and how to fix the cause (suggestion) โ€” surfaced through both Python and TypeScript SDKs.

results = mem.recall("Where does the user work?")
for r in results:
    for rec in r.sources["recommendations"]:
        print(f"[{rec['severity']}] {rec['issue']}")
        print("  Action:    ", rec["action"])
        print("  Suggestion:", rec["suggestion"])

Install with pip install "extremis[verification]" to enable the local NLI check. Without the extra, the stack falls back to judge-only.


Full API quick start

from extremis import Extremis, MemoryLayer
from extremis.types import EntityType

mem = Extremis()  # ~/.extremis/ by default

# โ”€โ”€ Remember โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mem.remember("User is building a WhatsApp AI", conversation_id="conv_001")
mem.remember("User prefers concise answers", conversation_id="conv_001")

# Skip the log for time-sensitive or high-confidence facts
mem.remember_now(
    "Flight departs Thursday at 06:00",
    layer=MemoryLayer.EPISODIC,
    confidence=0.99,
)

# โ”€โ”€ Recall โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
results = mem.recall("what product is the user building?", limit=5)
for r in results:
    print(f"[{r.memory.layer.value}] {r.memory.content}  rank={r.final_rank:.3f}")

# โ”€โ”€ Feedback โ†’ memories get smarter over time โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mem.report_outcome([r.memory.id for r in results[:2]], success=True)

# โ”€โ”€ Knowledge graph โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mem.kg_add_entity("User", EntityType.PERSON)
mem.kg_add_entity("Friday", EntityType.PROJECT)
mem.kg_add_relationship("User", "Friday", "building")
mem.kg_add_attribute("User", "timezone", "Asia/Dubai")

print(mem.kg_query("User"))

# โ”€โ”€ Attention scoring โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
result = mem.score_attention("URGENT: the API is down!", channel="dm")
print(result.level)   # โ†’ "full"
print(result.score)   # โ†’ 85

# โ”€โ”€ Consolidation (nightly / on-demand) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
from extremis.consolidation import LLMConsolidator
consolidator = LLMConsolidator(mem._config, mem._embedder)
r = consolidator.run_pass(mem.get_log(), mem.get_local_store(), mem.get_local_store())
print(f"{r.memories_created} facts extracted from logs")

Storage backends

All backends share the same API. Swap with one env var.

Don't want anything stored locally?

Three options โ€” all work out of the box:

Option Local footprint Cost
Postgres on Supabase / Neon None Free tier available
Pinecone RL score sidecar only (~KB) Free tier available
Amazon S3 Vectors RL score sidecar only (~KB) Pay-per-use, cheap at scale
HostedClient (your own server) None at all Your hosting cost

Quickest: free Postgres on Supabase

# 1. Create project at supabase.com, grab the connection string
# 2. Enable pgvector: run "CREATE EXTENSION vector;" in the SQL editor
pip3.11 install "extremis[postgres]"
EXTREMIS_STORE=postgres EXTREMIS_POSTGRES_URL=postgresql://... python3.11 your_app.py

Zero footprint: HostedClient

from extremis import HostedClient
# deploy extremis-server on Railway/Fly/Render, point at it
mem = HostedClient(api_key="extremis_sk_...", base_url="https://your-server.railway.app")
# nothing written locally โ€” not even the embedding model

SQLite โ€” default, zero infrastructure

EXTREMIS_STORE=sqlite
EXTREMIS_FRIDAY_HOME=~/.extremis   # DB at ~/.extremis/local.db

Postgres + pgvector โ€” production scale, ranking in SQL

pip3.11 install "extremis[postgres]"
EXTREMIS_STORE=postgres
EXTREMIS_POSTGRES_URL=postgresql://user:pass@host/extremis

Requires CREATE EXTENSION vector; in your database. Schema migrates automatically on first start.

Chroma โ€” local vector DB, great for teams

pip3.11 install "extremis[chroma]"
EXTREMIS_STORE=chroma
EXTREMIS_CHROMA_PATH=~/.extremis/chroma

Pinecone โ€” serverless hosted vectors

pip3.11 install "extremis[pinecone]"
EXTREMIS_STORE=pinecone
EXTREMIS_PINECONE_API_KEY=pk_...
EXTREMIS_PINECONE_INDEX=extremis

Create the index first (dimension must match your embedder):

from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="pk_...")
pc.create_index("extremis", dimension=384, metric="cosine",
                spec=ServerlessSpec(cloud="aws", region="us-east-1"))

Amazon S3 Vectors โ€” cheap, durable, large-scale archival tier

pip3.11 install "extremis[s3-vectors]"
EXTREMIS_STORE=s3_vectors
EXTREMIS_S3_VECTORS_BUCKET=extremis-vectors
EXTREMIS_S3_VECTORS_INDEX=extremis
EXTREMIS_S3_VECTORS_REGION=us-east-1   # optional; AWS_REGION also works

Credentials come from the standard AWS boto3 chain (env vars / ~/.aws / IAM role) โ€” no API key flag. Create the vector bucket + index once:

aws s3vectors create-vector-bucket --vector-bucket-name extremis-vectors
aws s3vectors create-index \
    --vector-bucket-name extremis-vectors --index-name extremis \
    --data-type float32 --dimension 384 --distance-metric cosine \
    --metadata-configuration 'nonFilterableMetadataKeys=["content","extra_metadata","source_memory_ids","confidence","created_at","validity_start","last_accessed_at","access_count","do_not_consolidate"]'

Best for cold/archival or extreme-scale workloads โ€” query latency is ~100s of ms vs Pinecone's ~10s. Pair it with a hot tier when you need chat-rate recall.

OpenAI embeddings โ€” no model download

pip3.11 install "extremis[openai]"
EXTREMIS_EMBEDDER=text-embedding-3-small
OPENAI_API_KEY=sk-...
EXTREMIS_EMBEDDING_DIM=1536

Works with any storage backend. Removes the 90 MB local model download.


Migrating backends

Move all memories between backends in one command. extremis re-embeds automatically if the source and destination use different embedding models.

pip3.11 install "extremis[chroma,pinecone]"

# Escape Pinecone lock-in โ†’ local SQLite
extremis-migrate --from pinecone --to sqlite \
  --source-pinecone-api-key pk_... \
  --source-pinecone-index my-index

# Local SQLite โ†’ Postgres (upgrade to production)
extremis-migrate --from sqlite --to postgres \
  --dest-postgres-url postgresql://...

# Switch to OpenAI embeddings while migrating
extremis-migrate --from sqlite --to chroma \
  --dest-embedder text-embedding-3-small

# Tier down to S3 Vectors for cheap, durable archival
extremis-migrate --from pinecone --to s3_vectors \
  --source-pinecone-api-key pk_... --source-pinecone-index my-index \
  --dest-s3-vectors-bucket extremis-vectors \
  --dest-s3-vectors-index extremis --dest-s3-vectors-region us-east-1

# Dry run โ€” count what would be migrated
extremis-migrate --from sqlite --to chroma --dry-run

Hosted API

Run extremis as a service โ€” your users call it with an API key, all compute happens server-side. No model download on the client. No local database.

Status: The server is fully built and self-hostable today. A managed cloud at api.extremis.com is in progress โ€” join the waitlist.

One-click deploy to Render (memory lives in Render Postgres)

Deploy to Render

Clicking this button deploys extremis-server and provisions a free Postgres database automatically via render.yaml. Memory lives in Render's managed Postgres โ€” persistent across restarts and redeploys.

Getting your API key โ€” check the logs, it's already there.

On first startup, extremis auto-generates a key and prints it in the server logs. In Render:

  1. Click your extremis service โ†’ Logs tab
  2. Look for the block that says extremis โ€” FIRST START
  3. Copy the key that starts with extremis_sk_...
============================================================
  extremis โ€” FIRST START
============================================================
  No API keys found. Generated your first key:

  extremis_sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

  Namespace: default
  Store this key โ€” it will NOT be shown again.
============================================================

Connect from anywhere with zero local footprint:

from extremis import HostedClient
mem = HostedClient(api_key="extremis_sk_...", base_url="https://your-app.onrender.com")

To create additional keys (e.g. per user/namespace), use Render's Shell tab:

extremis-server create-key --namespace alice --label "alice prod"

Deploy to Railway (manual โ€” 3 steps)

โš ๏ธ Don't use SQLite on Railway. Container filesystems are ephemeral โ€” memories are lost on every restart. Always use Railway Postgres.

  1. Create a new project on railway.app โ†’ Deploy from GitHub repo โ†’ select extremis
  2. Add a Postgres plugin: + New โ†’ Database โ†’ PostgreSQL
  3. Set these environment variables on the extremis service:
    EXTREMIS_STORE=postgres
    EXTREMIS_POSTGRES_URL=${{Postgres.DATABASE_URL}}
    

Railway injects the URL automatically. Memory now lives in Railway's managed Postgres.

Self-host locally in 2 minutes

pip3.11 install "extremis[server]"

# Generate an API key
extremis-server create-key --namespace alice --label "prod"
# โ†’ extremis_sk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  (shown once, store it)

# Start the server
extremis-server serve --host 0.0.0.0 --port 8000

# Or with Docker (bundles Postgres + pgvector)
docker compose up

Connect from Python

from extremis import HostedClient

# Point at your self-hosted server
mem = HostedClient(api_key="extremis_sk_...", base_url="http://your-server:8000")

# Exact same API as Memory โ€” nothing else changes
mem.remember("User is building a WhatsApp AI", conversation_id="c1")
results = mem.recall("WhatsApp")
mem.report_outcome([r.memory.id for r in results], success=True)

API endpoints

POST /v1/memories/remember     append to log + episodic store
POST /v1/memories/recall       semantic search, layered retrieval
POST /v1/memories/report       RL signal (+1/โˆ’1)
POST /v1/memories/store        direct write to any layer
POST /v1/memories/consolidate  LLM consolidation pass
GET  /v1/memories/observe      priority-tagged log compression
POST /v1/kg/write              add entity / relationship / attribute
POST /v1/kg/query              query + BFS graph traverse
POST /v1/attention/score       0โ€“100 message priority score
GET  /v1/health

All requests require Authorization: Bearer extremis_sk_.... Namespace is derived from the key.

Key management

extremis-server create-key --namespace prod_user_123 --label "production"
extremis-server list-keys
extremis-server list-keys --namespace prod_user_123
extremis-server revoke-key --key-hash abc123...

Deploy to production

Railway / Render (fastest โ€” 10 minutes):

  1. Point at the Dockerfile
  2. Set EXTREMIS_STORE=postgres and EXTREMIS_POSTGRES_URL
  3. Deploy

Fly.io:

fly launch
fly secrets set EXTREMIS_STORE=postgres EXTREMIS_POSTGRES_URL=postgresql://...
fly deploy

Self-hosted Docker:

docker build -t extremis-server .
docker run -p 8000:8000 \
  -e EXTREMIS_STORE=postgres \
  -e EXTREMIS_POSTGRES_URL=postgresql://... \
  -v lore_data:/data \
  extremis-server

MCP setup

Claude Desktop

pip3.11 install "extremis[mcp]"

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "extremis": {
      "command": "extremis-mcp",
      "env": {
        "EXTREMIS_FRIDAY_HOME": "~/.extremis",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Restart Claude Desktop. Nine tools appear automatically.

Claude Code

claude mcp add extremis extremis-mcp \
  --env EXTREMIS_FRIDAY_HOME=~/.extremis \
  --env ANTHROPIC_API_KEY=sk-ant-...

SSE / HTTP mode

extremis-mcp --transport sse --port 8765

MCP tools

Tool What it does LLM cost
memory_remember Append to log + episodic store None
memory_recall Semantic search, identity+procedural always included None
memory_report_outcome +1/โˆ’1 RL signal on recalled memories None
memory_remember_now Direct write to any layer (bypass log) None
memory_consolidate Distil logs into semantic/procedural memories Haiku
memory_kg_write Add entity / relationship / attribute None
memory_kg_query Query entity + BFS graph traverse None
memory_observe Compress log into ๐Ÿ”ด๐ŸŸก๐ŸŸข observations None
memory_score_attention Score a message 0โ€“100 None

Multi-user / namespace isolation

Two isolation models:

Instance-level โ€” each user gets their own process and EXTREMIS_FRIDAY_HOME. What Claude Desktop does naturally.

Namespace-level โ€” one deployment, many users. All memories, logs, and graph data scoped per namespace. Zero leakage.

EXTREMIS_NAMESPACE=alice extremis-mcp   # Alice's memory
EXTREMIS_NAMESPACE=bob   extremis-mcp   # Bob's โ€” completely separate, same DB
mem_alice = Extremis(config=Config(namespace="alice"))
mem_bob   = Extremis(config=Config(namespace="bob"))
# same DB file, zero crossover

Configuration

All settings via EXTREMIS_ environment variables or a .env file:

Variable Default Description
EXTREMIS_STORE sqlite Backend: sqlite ยท postgres ยท chroma ยท pinecone
EXTREMIS_NAMESPACE default User/agent isolation scope
EXTREMIS_FRIDAY_HOME ~/.extremis Base dir for logs and SQLite DB
EXTREMIS_POSTGRES_URL (empty) Postgres DSN (required when store=postgres)
EXTREMIS_CHROMA_PATH ~/.extremis/chroma ChromaDB persistence directory
EXTREMIS_PINECONE_API_KEY (empty) Pinecone API key
EXTREMIS_PINECONE_INDEX extremis Pinecone index name
EXTREMIS_EMBEDDER all-MiniLM-L6-v2 Model name โ€” sentence-transformers or OpenAI
EXTREMIS_EMBEDDING_DIM 384 Vector dimension (must match model)
EXTREMIS_OPENAI_API_KEY (empty) OpenAI key (required for OpenAI embedders)
EXTREMIS_CONSOLIDATION_MODEL claude-haiku-4-5-20251001 LLM for consolidation
EXTREMIS_RL_ALPHA 0.5 Utility score weight in retrieval ranking
EXTREMIS_RECENCY_HALF_LIFE_DAYS 90 Recency decay half-life
EXTREMIS_ATTENTION_FULL_THRESHOLD 75 Score โ‰ฅ this โ†’ full attention
EXTREMIS_ATTENTION_STANDARD_THRESHOLD 50 Score โ‰ฅ this โ†’ standard
EXTREMIS_ATTENTION_MINIMAL_THRESHOLD 25 Score โ‰ฅ this โ†’ minimal

Project structure

extremis/
โ”œโ”€โ”€ src/extremis/
โ”‚   โ”œโ”€โ”€ api.py              โ† Memory โ€” the local API
โ”‚   โ”œโ”€โ”€ client.py           โ† HostedClient โ€” the cloud API (same interface)
โ”‚   โ”œโ”€โ”€ config.py           โ† Config (EXTREMIS_ env vars)
โ”‚   โ”œโ”€โ”€ types.py            โ† Memory, Entity, Observation, AttentionResult, ...
โ”‚   โ”œโ”€โ”€ interfaces.py       โ† LogStore, MemoryStore, Embedder protocols
โ”‚   โ”œโ”€โ”€ migrate.py          โ† Migrator + extremis-migrate CLI
โ”‚   โ”œโ”€โ”€ storage/
โ”‚   โ”‚   โ”œโ”€โ”€ sqlite.py       โ† SQLiteMemoryStore
โ”‚   โ”‚   โ”œโ”€โ”€ postgres.py     โ† PostgresMemoryStore (pgvector, ranking in SQL)
โ”‚   โ”‚   โ”œโ”€โ”€ chroma.py       โ† ChromaMemoryStore
โ”‚   โ”‚   โ”œโ”€โ”€ pinecone_store.py โ† PineconeMemoryStore
โ”‚   โ”‚   โ”œโ”€โ”€ kg.py           โ† SQLiteKGStore
โ”‚   โ”‚   โ”œโ”€โ”€ log.py          โ† FileLogStore (JSONL, fsync, checkpoints)
โ”‚   โ”‚   โ””โ”€โ”€ score_index.py  โ† SQLiteScoreIndex (RL scores for external backends)
โ”‚   โ”œโ”€โ”€ embeddings/
โ”‚   โ”‚   โ”œโ”€โ”€ sentence_transformers.py
โ”‚   โ”‚   โ””โ”€โ”€ openai.py
โ”‚   โ”œโ”€โ”€ consolidation/
โ”‚   โ”‚   โ”œโ”€โ”€ consolidator.py โ† LLMConsolidator (log โ†’ Claude Haiku โ†’ memories)
โ”‚   โ”‚   โ””โ”€โ”€ prompts.py
โ”‚   โ”œโ”€โ”€ observer/
โ”‚   โ”‚   โ””โ”€โ”€ observer.py     โ† HeuristicObserver (๐Ÿ”ด๐ŸŸก๐ŸŸข)
โ”‚   โ”œโ”€โ”€ scorer/
โ”‚   โ”‚   โ””โ”€โ”€ attention.py    โ† AttentionScorer (0โ€“100)
โ”‚   โ”œโ”€โ”€ mcp/
โ”‚   โ”‚   โ””โ”€โ”€ server.py       โ† FastMCP server (9 tools)
โ”‚   โ””โ”€โ”€ server/
โ”‚       โ”œโ”€โ”€ app.py          โ† FastAPI hosted API
โ”‚       โ”œโ”€โ”€ auth.py         โ† API key management
โ”‚       โ”œโ”€โ”€ deps.py         โ† FastAPI dependencies
โ”‚       โ””โ”€โ”€ routes/         โ† memories, kg, health
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ””โ”€โ”€ tests/                  โ† 50 test files, no LLM calls

Contributing

See CONTRIBUTING.md. The quickest contribution is a new storage backend โ€” implement the MemoryStore protocol in storage/ and add tests. We'll merge it.

Security

See SECURITY.md for reporting vulnerabilities.

License

MIT ยท Built by Ashwani Jha


If extremis saves you from building another RAG pipeline, a โญ goes a long way.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extremis-0.4.0.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

extremis-0.4.0-py3-none-any.whl (113.8 kB view details)

Uploaded Python 3

File details

Details for the file extremis-0.4.0.tar.gz.

File metadata

  • Download URL: extremis-0.4.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for extremis-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a0455c20222fae40930e00649b03d7e5e1227a23424c097e6d4e1ae08f27d74e
MD5 ac2da29069cbfd3438aa206427c73892
BLAKE2b-256 4146665c602173d54fd0e8ec394db6ecaa1faa806513df17808867cf601cc52b

See more details on using hashes here.

Provenance

The following attestation bundles were made for extremis-0.4.0.tar.gz:

Publisher: publish.yml on ashwanijha04/extremis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file extremis-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: extremis-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 113.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for extremis-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f009aa02d697eff7d767472c1b769a953d8987d9abd97e60588874db8e9874a9
MD5 be857704b9bb7b0003ceb88c64cfbc91
BLAKE2b-256 dbcf45306da9a314d9397282986d535ae4ed01a7e73428748647dc0fbcb1dc34

See more details on using hashes here.

Provenance

The following attestation bundles were made for extremis-0.4.0-py3-none-any.whl:

Publisher: publish.yml on ashwanijha04/extremis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page