Skip to main content

Agent Learning Memory Architecture - Persistent memory for AI agents

Project description

ALMA - Agent Learning Memory Architecture

PyPI version Python 3.10+ License: MIT CI LongMemEval R@5 Buy Me a Coffee

Your AI forgets everything. ALMA fixes that.

Give any AI agent permanent memory that learns and improves over time.

pip install alma-memory — 5 minutes to persistent memory. Free forever on SQLite.

Documentation | Benchmark Report | Setup Guide | PyPI


"But Claude Code Already Has Memory..."

Yes. Claude Code, OpenClaw, ChatGPT, and Gemini all have built-in memory now. So why would you need ALMA?

Because their memory is a notepad. ALMA is a learning system.

Built-in Memory (Claude, ChatGPT, OpenClaw) ALMA
What it stores Facts and preferences — "user likes dark mode" Outcomes — what strategies worked, failed, and why
Does it learn? No. It remembers what you told it. Yes. After 3+ similar outcomes, it auto-creates reusable strategies.
Does it warn you? No. Yes. Anti-patterns track what NOT to do, with why_bad + better_alternative.
Cross-platform? No. Claude doesn't know what ChatGPT learned. Yes. One memory layer shared across every AI tool.
Multi-agent? No. Each session is isolated. Yes. Junior agents inherit from senior agents.
Scoring? Basic relevance or "most recent" 4-factor: similarity + recency + success rate + confidence
Lifecycle? Grows until you delete things Automatic: decay, compression, consolidation, archival
Your data? Stored on their servers Your database. SQLite, PostgreSQL, Qdrant — you choose.
Benchmark? Not benchmarked R@5 = 0.964 on LongMemEval (500 questions)

The key insight: Built-in memory makes your AI remember. ALMA makes your AI learn.

An agent with Claude's memory knows "user prefers TypeScript." An agent with ALMA knows "when deploying to production, blue-green deployment worked 8 out of 10 times, rolling updates caused 2 incidents — avoid rolling updates for this service, here's why."

How ALMA Works With Built-in Memory (Not Against It)

ALMA doesn't replace Claude Code's memory or OpenClaw's memory — it sits underneath as a deeper layer. Use built-in memory for quick preferences. Use ALMA for:

  • Strategy tracking — which approaches worked for which problems
  • Failure prevention — anti-patterns that stop your agent from repeating mistakes
  • Team knowledge — sharing lessons across multiple agents and platforms
  • Workflow continuity — checkpoints and state that survive across sessions
  • Measurable retrieval — benchmarked at R@5=0.964, not "trust me it works"
from alma import ALMA

alma = ALMA.from_config(".alma/config.yaml")

# Before task: What strategies worked for this type of problem?
memories = alma.retrieve(task="Deploy auth service", agent="backend-dev")
# Returns: heuristics, past outcomes, anti-patterns, domain knowledge

# After task: Record what happened so next time is better
alma.learn(agent="backend-dev", task="Deploy auth service",
           outcome="success", strategy_used="Blue-green deployment")

That's it. Next time the backend agent deploys — on Claude, ChatGPT, or any platform — it already knows blue-green works and rolling updates don't.


Proven: #1 on LongMemEval

ALMA is benchmarked against LongMemEval (ICLR 2025) — the standard benchmark for AI agent memory. 500 questions, ~53 conversation sessions each.

ALMA Benchmark Comparison

System LongMemEval API Keys Memory Types Feedback Loop
ALMA R@5=0.964 None 5 Yes (v1.0)
Mem0 ~49% acc.* GPT-4o 2 No
Zep 71.2% acc.* GPT-4o 1 No
Letta Not published GPT-4o 2 No
Beads Not published None N/A (tasks) No
RuVector Not published None N/A (vectors) Self-learning

*Accuracy (end-to-end with LLM) vs ALMA's Recall@5 (retrieval-only). Different metrics — not directly comparable.

R@5 = 0.964 means when your agent asks "what did we discuss about X?", the correct answer is in the top 5 results 96.4% of the time. No cloud APIs. Runs entirely on your machine.

v1.0 absorbs ideas from the open-source community: RuVector (MIT, 3.8k stars) inspired ALMA's retrieval feedback loop — memories that agents actually use get scored higher. Beads (MIT, 20.7k stars) task dependency concepts are planned for a future release.

Reproduce it yourself in 3 commands
pip install alma-memory[local] sentence-transformers
curl -fsSL -o /tmp/longmemeval.json \
  https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned/resolve/main/longmemeval_s_cleaned.json
python -m benchmarks.longmemeval.runner --data /tmp/longmemeval.json

Full methodology: BENCHMARK-REPORT.md


How It Works

ALMA Retrieval Pipeline

Retrieve: Your agent asks ALMA for relevant memories. ALMA searches using FAISS vector similarity, scores results by relevance + recency + success rate + confidence, and returns the most useful context.

Learn: After the task, ALMA records what happened — success or failure, what strategy was used, how long it took.

Improve: After 3+ similar outcomes, ALMA automatically creates reusable heuristics. After 2+ similar failures, it creates anti-patterns. Your agent gets smarter without any manual work.

ALMA Learning Loop


What Makes ALMA Different

1. It learns, not just stores

Other memory systems are databases. ALMA is a learning system.

Other systems ALMA
Store text, retrieve similar Store outcomes, learn patterns, track what works
All memories equal Confidence scoring — proven strategies rank higher
No concept of mistakes Anti-patterns: what NOT to do, why, and what to do instead
Grows forever Memories decay — unused knowledge fades, reinforced knowledge strengthens
No usage tracking Retrieval feedback loop — tracks which memories agents actually use, adjusts future scores

2. Five memory types (not just embeddings)

Five Memory Types

Type What it stores Example
Heuristic Strategies that work "For forms with >5 fields, validate incrementally"
Outcome Task results "Login test passed using JWT — 340ms"
Anti-Pattern What NOT to do "Don't use sleep() for async waits — causes flaky tests"
Domain Knowledge Facts "Auth uses OAuth 2.0, tokens expire in 24h"
User Preference Your constraints "Prefer verbose output, Python 3.12, dark theme"

3. Multi-agent knowledge sharing

Multi-Agent Memory Sharing

Junior agents inherit from senior agents. Teams share across roles.

agents:
  senior_dev:
    share_with: [junior_dev, qa_agent]
  junior_dev:
    inherit_from: [senior_dev]

4. Token-efficient context loading

ALMA 4-Layer MemoryStack

Only load what you need: Identity (~100 tokens) + Essential Story (~800 tokens) at wake-up. On-demand and deep search activate when needed. 95% of your context window stays free.

5. Your data, your infrastructure

ALMA is a library, not a service. Your database, your rules.

Backend Best For Cost
SQLite + FAISS Local dev, getting started $0
PostgreSQL + pgvector Production $0 (Supabase free tier)
Qdrant / Pinecone / Chroma Managed vector DB Varies
Azure Cosmos DB Enterprise Azure pricing

Install

pip install alma-memory[local]   # Includes SQLite + FAISS + local embeddings
Other backends
pip install alma-memory[postgres]  # PostgreSQL + pgvector
pip install alma-memory[qdrant]    # Qdrant
pip install alma-memory[pinecone]  # Pinecone
pip install alma-memory[chroma]    # ChromaDB
pip install alma-memory[azure]     # Azure Cosmos DB
pip install alma-memory[all]       # Everything

Quick Start

1. Config file:

# .alma/config.yaml
alma:
  project_id: "my-project"
  storage: sqlite
  embedding_provider: local
  storage_dir: .alma
  db_name: alma.db
  embedding_dim: 384

2. Use it:

from alma import ALMA

alma = ALMA.from_config(".alma/config.yaml")

# Retrieve what the agent learned
memories = alma.retrieve(task="Fix the login bug", agent="developer", top_k=5)

# Inject into your prompt
prompt = f"## Context from past runs\n{memories.to_prompt()}\n\n## Task\nFix the login bug"

# After the task, learn from the outcome
alma.learn(agent="developer", task="Fix login bug",
           outcome="success", strategy_used="Cleared session cache")

3. That's it. Every run gets smarter. No manual prompt engineering needed.

4. Close the feedback loop (v1.0):

from alma.types import MemoryType, FeedbackSignal

# Record which memories were useful
alma.record_usage(
    retrieved_memory_ids=["m1", "m2", "m3"],
    used_memory_ids=["m1", "m3"],
    memory_type=MemoryType.HEURISTIC,
    agent="dev-agent",
)

# Explicit feedback
alma.record_feedback(
    memory_id="m1",
    memory_type=MemoryType.HEURISTIC,
    signal=FeedbackSignal.THUMBS_UP,
    agent="dev-agent",
)

Memories that agents actually use get scored higher on future retrievals. Memories that get skipped decay faster. Your agent's recall improves automatically over time.


Bootstrap From Existing Knowledge

Already have conversations, project files, or chat exports? ALMA doesn't just dump them into a vector database like RAG. It reads, classifies, and structures them into the 5 memory types:

  • Decisions you made → DomainKnowledge (retrievable facts)
  • Preferences you stated → UserPreference (constraints agents respect)
  • Things that worked → Outcomes (success records with strategies)
  • Problems you hit → AntiPatterns (mistakes agents won't repeat)
  • Raw content → DomainKnowledge (searchable context)
from alma.ingestion import ingest_directory, ingest_conversations

# Ingest project files — auto-classifies into memory types
result = ingest_directory("/path/to/project", agent="dev", project_id="myapp")
# result.domain_knowledge: 47 facts extracted
# result.user_preferences: 12 preferences found
# result.anti_patterns: 3 problems identified
# result.outcomes: 8 milestones recorded

# Ingest chat exports (6 formats supported)
result = ingest_conversations("/path/to/chats", agent="dev", project_id="myapp")

This is not RAG. RAG retrieves text chunks by similarity. ALMA retrieves classified, scored, typed memories that improve over time. The ingestion step is how you bootstrap — after that, ALMA learns from real outcomes.

Supported formats: Claude Code JSONL, ChatGPT JSON, Claude.ai JSON, Codex JSONL, Slack JSON, plain text.


Claude Code / MCP Integration

Connect ALMA directly to Claude with 22 MCP tools:

{
  "mcpServers": {
    "alma-memory": {
      "command": "python",
      "args": ["-m", "alma.mcp", "--config", ".alma/config.yaml"]
    }
  }
}

At a Glance

Metric Value
LongMemEval R@5 0.964 (#1 open-source)
Tests passing 2,121
Storage backends 7
Graph backends 4
MCP tools 22
Memory types 5
Chat formats ingested 6
Monthly cost (local) $0.00
API keys needed None
Time to first memory < 5 minutes

Learn More


Support

If ALMA helps your AI agents get smarter:


Your AI should not treat you like a stranger every morning.


License

ALMA-memory is released under the MIT License. Free to use, modify, and distribute — forever.

Created by @RBKunnela

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alma_memory-0.10.0.tar.gz (365.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alma_memory-0.10.0-py3-none-any.whl (422.8 kB view details)

Uploaded Python 3

File details

Details for the file alma_memory-0.10.0.tar.gz.

File metadata

  • Download URL: alma_memory-0.10.0.tar.gz
  • Upload date:
  • Size: 365.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for alma_memory-0.10.0.tar.gz
Algorithm Hash digest
SHA256 1db66f57f0c4249bc3aeb200a4676b575a3969cdf2d35eeb8ee36c0c516bf670
MD5 d3531c1441ac9ead9158303eb4fcf0ff
BLAKE2b-256 29bba14b0948e0560e1b8f28e59b4f7f10849b529e70100100b73f9f33c7535b

See more details on using hashes here.

File details

Details for the file alma_memory-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: alma_memory-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 422.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for alma_memory-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 766ceddd5d896b5fd1a1924e6153692492e82d8be24df7adc9a75c23a387c4b9
MD5 563a92e2ebbff8c279db914226139f05
BLAKE2b-256 f98fcbac4db02423688ffd2d52ebf79078f6934e2be0b82977075f94736b3062

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page