Skip to main content

File-based persistent memory for AI agents. Zero dependencies.

Project description

antaris-memory

Production-ready file-based persistent memory for AI agents. Zero dependencies (core).

Store, search, decay, and consolidate agent memories using only the Python standard library. Sharded storage for scalability, fast search indexes, namespace isolation, memory types, retrieval feedback loops, MCP server support, and automatic schema migration. No vector databases, no infrastructure, no API keys.

PyPI Tests Python 3.9+ License


Install

pip install antaris-memory

Quick Start

from antaris_memory import MemorySystem

mem = MemorySystem("./workspace", half_life=7.0)
mem.load()  # No-op on first run; auto-migrates old formats

# Store memories
mem.ingest("Decided to use PostgreSQL for the database.",
           source="meeting-notes", category="strategic")

# Typed helpers
mem.ingest_fact("PostgreSQL supports JSON natively")
mem.ingest_preference("User prefers concise explanations")
mem.ingest_mistake("Forgot to close DB connections in worker threads",
                   correction="Use context managers for all DB connections",
                   root_cause="Manual connection management in worker pool")
mem.ingest_procedure("Deploy: push to main, CI runs, auto-deploy to staging")

# Input gating — drops ephemeral noise (P3) before storage
mem.ingest_with_gating("Decided to switch to Redis for caching", source="chat")
mem.ingest_with_gating("thanks for the update!", source="chat")  # dropped (P3)

# Search (BM25; hybrid BM25+cosine if embedding fn set)
for r in mem.search("database decision"):
    print(f"[{r.confidence:.2f}] {r.content}")

# Save
mem.save()

Namespaces

Every MemorySystem instance doubles as a namespace manager. Call mem.namespace("name") to get a fully isolated memory proxy — search in one namespace never returns results from another.

from antaris_memory import MemorySystem

mem = MemorySystem("./workspace")

# Create and use isolated namespaces
alpha = mem.namespace("project-alpha")
alpha.ingest("Alpha uses PostgreSQL for the primary database", source="infra")
alpha.ingest_fact("Alpha API runs on port 8080")

beta = mem.namespace("project-beta")
beta.ingest("Beta uses SQLite for local storage", source="infra")

# Search is scoped — alpha never sees beta's data
results = alpha.search("database")    # only PostgreSQL result
results = beta.search("database")     # only SQLite result

# Namespace lifecycle
mem.create_namespace("staging")
mem.archive_namespace("project-beta")
mem.delete_namespace("staging", delete_data=True)
all_ns = mem.list_namespaces()  # [{"name": "default", ...}, {"name": "project-alpha", ...}]

Standard key-prefix constants for multi-tenant scoping:

from antaris_memory.namespace import TENANT_ID, AGENT_ID, CONVERSATION_ID

tenant_ns = mem.namespace(f"{TENANT_ID}-acme")         # "tenant-acme"
agent_ns = mem.namespace(f"{AGENT_ID}-researcher-01")   # "agent-researcher-01"
conv_ns = mem.namespace(f"{CONVERSATION_ID}-abc123")     # "conversation-abc123"

Each namespace has its own workspace directory, shards, indexes, and WAL — full hard isolation.


Memory Types

Every memory entry has a memory_type field that controls decay rate, importance boost, and recall priority.

# Explicit memory_type on ingest()
mem.ingest("Deploy: push to main, CI runs, auto-deploy to staging",
           memory_type="procedural")

# Typed helpers set memory_type automatically
mem.ingest_fact("PostgreSQL supports JSONB indexing")          # memory_type="fact"
mem.ingest_preference("User prefers Python examples")          # memory_type="preference"
mem.ingest_procedure("Run pytest from venv, not global pip")   # memory_type="procedure"
mem.ingest_mistake(
    what_happened="Forgot to handle connection timeout",
    correction="Add timeout=30 to all HTTP calls",
    root_cause="Default timeout is infinite in requests lib",
    severity="high",
    tags=["http", "reliability"],
)  # memory_type="mistake"

# Filter search by type
procedures = mem.search("deploy process", memory_type="procedure")
mistakes = mem.search("timeout", memory_type="mistake")
Type Use for Decay Importance Recall priority
episodic Events, decisions, meeting notes Normal 1.0x Normal
fact Facts, concepts, general knowledge Normal 1.2x High
preference User preferences, style notes 3x slower 1.2x High
procedure How-to steps, runbooks 3x slower 1.3x High
mistake Errors to avoid, lessons learned 10x slower 2.0x Highest

Retrieval Feedback Loop

Record whether retrieved memories led to good or bad outcomes. The system adapts memory importance in real time — good outcomes boost importance (x1.2), bad outcomes reduce it (x0.8).

# 1. Search for relevant memories
results = mem.search("database migration strategy", explain=True)

# 2. Use results to generate a response...
ids = [r.entry.hash for r in results]

# 3. Record the outcome
mem.record_outcome(ids, "good")     # boost importance of helpful memories
mem.record_outcome(ids, "bad")      # reduce importance of unhelpful memories
mem.record_outcome(ids, "neutral")  # no change

# Router integration — log routing decisions alongside retrieval outcomes
mem.record_routing_outcome(model="claude-haiku-3-5", outcome="good")

# View aggregate stats
stats = mem.feedback_stats()
# {"total": 42, "good": 30, "bad": 5, "neutral": 7, "routing": 12, "retrieval": 30}

Feedback is persisted as outcomes.jsonl in the workspace directory and survives restarts.


Bulk Ingest

bulk_ingest() uses O(1) deferred index rebuild — a single rebuild_indexes() at the end instead of one per WAL flush. Benchmarked at 12,041 items/s with 1M entries ingested in ~86s on Apple M4, with near-flat scaling.

# List-based bulk ingest
count = mem.bulk_ingest([
    "The API gateway handles authentication for all services.",
    {"content": "Deploy using helm upgrade --install", "memory_type": "procedure"},
    {"content": "PostgreSQL is the primary database", "source": "infra-docs"},
    {"content": "Never use SELECT * in production queries", "memory_type": "mistake",
     "source": "code-review"},
])

# Context manager for existing ingest() call sites
with mem.bulk_mode():
    for line in open("corpus.txt"):
        mem.ingest(line.strip(), source="corpus")
    # Index rebuilt exactly once when the block exits

# Generator-based 1M ingest
def corpus_generator():
    for i in range(1_000_000):
        yield f"Memory entry {i} with substantive content for indexing"

mem.bulk_ingest(corpus_generator())  # all entries written to shards on disk

At runtime, a safety limit caps the active in-memory set to 20,000 entries by default. A UserWarning is emitted when the limit is hit. Typical agent use (10K-100K active memories) keeps search latency under 5ms.


Production Cleanup API

Four methods for production maintenance — bulk removal, index repair, and WAL management without manual shard surgery.

purge() — Bulk removal with glob patterns

# Remove all memories from a specific pipeline session
result = mem.purge(source="pipeline:pipeline_abc123")
print(f"Removed {result['removed']} memories, {result['wal_removed']} WAL entries")

# Glob pattern — remove ALL pipeline sessions at once
result = mem.purge(source="pipeline:pipeline_*")

# Remove by content substring (case-insensitive)
result = mem.purge(content_contains="context_packet")

# Custom predicate (OR logic — removes if ANY criterion matches)
result = mem.purge(
    source="openclaw:auto",
    content_contains="symlink mismatch",
)

# Always persist after purge
mem.save()

Return value:

{
    "removed": 10,        # from in-memory set
    "wal_removed": 2,     # from WAL file
    "total": 12,
    "audit": {
        "operation": "purge",
        "count": 12,
        "sources": ["pipeline:pipeline_abc123"],
        "timestamp": "2026-02-19T..."
    }
}

rebuild_indexes() — Repair search indexes after bulk operations

result = mem.rebuild_indexes()
print(f"Indexed {result['memories']} memories, {result['words_indexed']} words")
# {"memories": 9990, "words_indexed": 5800, "tags": 24}

wal_flush() — Force-flush WAL to shard files

flushed = mem.wal_flush()
print(f"Flushed {flushed} pending WAL entries to shards")

wal_inspect() — Health check without mutating state

status = mem.wal_inspect()
# {"pending_entries": 14, "size_bytes": 8192, "sample": ["content preview 1...", ...]}
print(f"WAL pending: {status['pending_entries']} entries ({status['size_bytes']} bytes)")

Typical production maintenance flow

from antaris_memory import MemorySystem

mem = MemorySystem("./workspace")
mem.load()

# 1. Inspect WAL health
status = mem.wal_inspect()
if status["pending_entries"] > 100:
    print(f"WAL has {status['pending_entries']} pending — flushing...")
    mem.wal_flush()

# 2. Purge stale/unwanted data
result = mem.purge(source="pipeline:pipeline_old_session_*")
print(f"Purged {result['total']} stale entries")

# 3. Rebuild indexes after purge
index_result = mem.rebuild_indexes()
print(f"Re-indexed {index_result['memories']} memories")

# 4. Persist
mem.save()

BM25 Search

Full-text search uses BM25 with IDF weighting and field boosting. Zero dependencies, zero API calls.

# Basic search — ranked by BM25 score × decay × access frequency
results = mem.search("database migration")

# With filters
results = mem.search(
    "deploy process",
    category="strategic",
    memory_type="procedure",
    min_confidence=0.3,
    limit=10,
)

# Explain mode — returns SearchResult objects with score breakdowns
results = mem.search("authentication flow", explain=True)
for r in results:
    print(f"[{r.score:.3f}] {r.entry.content[:80]}")
    print(f"  matched: {r.matched_terms}  |  {r.explanation}")

Hybrid Semantic Search

Plug in any embedding function to activate BM25+cosine hybrid scoring (40% BM25, 60% semantic):

import openai

def my_embed(text: str) -> list[float]:
    resp = openai.embeddings.create(model="text-embedding-3-small", input=text)
    return resp.data[0].embedding

mem.set_embedding_fn(my_embed)  # hybrid activates automatically

# Or use a local model
import ollama
mem.set_embedding_fn(
    lambda text: ollama.embeddings(model="nomic-embed-text", prompt=text)["embedding"]
)

When no embedding function is set, search uses BM25 only (zero API calls).


WAL Journaling

Every ingest() call appends to a write-ahead log before touching shard files. On crash, pending entries are replayed automatically on the next load().

mem.ingest("Important decision about the API design", source="meeting")
# Entry is now in the WAL — crash-safe

# WAL auto-flushes every 50 appends or at 1 MB
# Force-flush before backups:
mem.wal_flush()

# Check WAL health:
status = mem.wal_inspect()
print(f"{status['pending_entries']} entries pending ({status['size_bytes']} bytes)")

Sharding

Memories are stored in date/category shards — plain JSON files you can inspect with any text editor.

workspace/
├── shards/
│   ├── 2026-02-strategic.json
│   ├── 2026-02-operational.json
│   └── 2026-01-tactical.json
├── indexes/
│   ├── search_index.json
│   ├── tag_index.json
│   └── date_index.json
├── .wal/
│   └── pending.jsonl          # Write-ahead log (auto-managed)
├── namespaces/
│   ├── project-alpha/         # Isolated namespace workspace
│   └── project-beta/
├── namespace_manifest.json
├── access_counts.json
├── outcomes.jsonl             # Retrieval feedback log
├── migrations/history.json
└── memory_audit.jsonl         # Deletion audit trail (GDPR)

Decay

Retrieval scores combine recency x importance x access frequency using Ebbinghaus-inspired forgetting curves. Configurable half-life (default 7 days).

mem = MemorySystem("./workspace", half_life=14.0)  # 14-day half-life

# Memory types override decay rates:
# - procedure/preference: 3x half-life (21 days with default)
# - mistake: 10x half-life (70 days with default)

# Compact to remove fully decayed entries
result = mem.compact()
print(f"Compacted {result['entries_before']}{result['entries_after']} entries")
print(f"Freed {result['space_freed_mb']:.1f} MB")

MCP Server

Expose your memory workspace as MCP tools for Claude Desktop, Cursor, or any MCP-compatible host.

from antaris_memory import create_mcp_server  # pip install mcp

server = create_mcp_server("./workspace")
server.run()  # Stdio transport

MCP tools exposed: memory_search, memory_ingest, memory_consolidate, memory_stats.

# Or run directly from the CLI
antaris-memory-mcp --workspace ./workspace

Input Gating (P0-P3)

mem.ingest_with_gating("CRITICAL: API key compromised", source="alerts")
# P0 (critical) — stored with confidence 0.9

mem.ingest_with_gating("Decided to switch to PostgreSQL", source="meeting")
# P1 (operational) — stored

mem.ingest_with_gating("thanks for the update!", source="chat")
# P3 (ephemeral) — dropped silently
Level Category Stored Examples
P0 Strategic Yes Security alerts, errors, deadlines
P1 Operational Yes Decisions, assignments, technical choices
P2 Tactical Yes Background info, research
P3 No Greetings, acknowledgments, filler

Classification: keyword and pattern matching — no LLM calls. 0.177ms avg per input.

Note: ingest() silently drops content shorter than 15 characters. Single-concept memories ("Use Redis", "Done") fall below this threshold. Store them with a brief qualifier: "Prefer Redis for caching" (24 chars).


Context Packets (Sub-Agent Injection)

# Single-query context packet
packet = mem.build_context_packet(
    task="Debug the authentication flow",
    tags=["auth", "security"],
    max_memories=10,
    max_tokens=2000,
    include_mistakes=True,
)
print(packet.render("markdown"))  # structured markdown for prompt injection

# Multi-query with deduplication
packet = mem.build_context_packet_multi(
    task="Fix performance issues",
    queries=["database bottleneck", "slow queries", "caching strategy"],
    max_tokens=3000,
)
packet.trim(max_tokens=1500)

Selective Forgetting (GDPR-ready)

audit = mem.forget(entity="John Doe")       # Remove by entity
audit = mem.forget(topic="project alpha")    # Remove by topic
audit = mem.forget(before_date="2025-01-01") # Remove old entries
# Audit trail written to memory_audit.jsonl

Shared Memory Pools

from antaris_memory import SharedMemoryPool, AgentPermission

pool = SharedMemoryPool("./shared", pool_name="team-alpha")
pool.grant("agent-1", AgentPermission.READ_WRITE)
pool.grant("agent-2", AgentPermission.READ_ONLY)

mem_1 = pool.open("agent-1")
mem_1.ingest("Deployed new API endpoint")

mem_2 = pool.open("agent-2")
results = mem_2.search("API deployment")

Concurrency

from antaris_memory import FileLock

# Exclusive write access (atomic on all platforms including network filesystems)
with FileLock("/path/to/shard.json", timeout=10.0):
    data = load(shard)
    modify(data)
    save(shard, data)

OpenClaw Integration

antaris-memory ships as a native OpenClaw plugin. Once enabled, the plugin fires automatically before and after each agent turn:

  • before_agent_start — searches memory for relevant context, injects into agent prompt
  • agent_end — ingests the turn into persistent memory
openclaw plugins enable antaris-memory

What It Does

  • Sharded storage for production scalability (10,000+ memories, sub-second search)
  • Fast search indexes (full-text, tags, dates) stored as transparent JSON files
  • WAL journaling for crash-safe ingestion with automatic replay
  • Namespace isolation with hard boundaries between tenants, agents, or projects
  • Memory types with distinct decay, importance, and recall behaviour
  • Retrieval feedback loop adapts memory importance based on outcome signals
  • Bulk ingest at 12,041 items/s with deferred O(1) index rebuild
  • Automatic schema migration from single-file to sharded format with rollback
  • Multi-agent shared memory pools with access controls
  • Retrieval weighted by recency x importance x access frequency (Ebbinghaus-inspired decay)
  • Input gating classifies incoming content by priority (P0-P3) and drops noise at intake
  • Detects contradictions between stored memories using deterministic rule-based comparison
  • Runs fully offline — zero network calls, zero tokens, zero API keys

What It Doesn't Do

  • Not a vector database — no embeddings by default. Core search uses BM25 keyword ranking. Semantic search requires you to supply an embedding function (set_embedding_fn(fn)).
  • Not a knowledge graph — flat memory store with metadata indexing. No entity relationships or graph traversal.
  • Not semantic by default — contradiction detection uses explicit conflict rules, not inference.
  • Not LLM-dependent — all operations are deterministic. No model calls, no prompt engineering.
  • Not infinitely scalable — JSON file storage works well up to ~50,000 memories per workspace.

Benchmarks

Measured on Apple M4, Python 3.14.

Memories Ingest Search (avg) Search (p99) Consolidate Disk
100 5.3ms (0.053ms/entry) 0.40ms 0.65ms 4.2ms 117KB
500 16.8ms (0.034ms/entry) 1.70ms 2.51ms 84.3ms 575KB
1,000 33.2ms (0.033ms/entry) 3.43ms 5.14ms 343.3ms 1.1MB
5,000 173.7ms (0.035ms/entry) 17.10ms 25.70ms 4.3s 5.6MB

Bulk ingest: 12,041 items/s | 1M entries in ~86s | near-flat scaling via deferred index rebuild

Input gating classification: 0.177ms avg per input.


Architecture

MemorySystemV4
├── ShardManager         — Date/topic sharding
├── IndexManager         — Full-text, tag, and date indexes
│   ├── SearchIndex      — BM25 inverted index
│   ├── TagIndex         — Tag → hash mapping
│   └── DateIndex        — Date range queries
├── SearchEngine         — BM25 + optional cosine hybrid
├── WALManager           — Write-ahead log (crash-safe ingestion)
├── ReadCache            — LRU search result cache
├── AccessTracker        — Per-entry access-count boosting
├── PerformanceMonitor   — Timing/counter stats
├── MigrationManager     — Schema versioning with rollback
├── InputGate            — P0-P3 classification at intake
├── DecayEngine          — Ebbinghaus forgetting curves
├── ConsolidationEngine  — Dedup, clustering, contradiction detection
├── ForgettingEngine     — Selective deletion with audit
├── RetrievalFeedback    — Outcome tracking + importance adaptation
├── SharedMemoryPool     — Multi-agent coordination
├── NamespaceManager     — Multi-tenant isolation
└── ContextPacketBuilder — Sub-agent context injection

Running Tests

git clone https://github.com/Antaris-Analytics/antaris-memory.git
cd antaris-memory
python -m pytest tests/ -v

384 tests. All pass with zero external dependencies.


Migrating from Earlier Versions

No breaking changes from v2.x. All new APIs (bulk_ingest, record_outcome, feedback_stats, etc.) are additive. Existing workspaces load automatically — no migration required.

For pre-3.0 stores using MD5 hashing, run tools/migrate_hashes.py to upgrade to BLAKE2b-128.

pip install --upgrade antaris-memory

Zero Dependencies (Core)

The core package uses only the Python standard library. Optional extras:

  • pip install mcp — enables create_mcp_server()
  • Supply your own embedding function to set_embedding_fn() — any callable returning list[float] works (OpenAI, Ollama, sentence-transformers, etc.)

Part of the Antaris Analytics Suite — v3.0.0

License

Apache 2.0 — see LICENSE for details.


Built by Antaris Analytics Deterministic infrastructure for AI agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_memory-3.1.0.tar.gz (138.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_memory-3.1.0-py3-none-any.whl (102.7 kB view details)

Uploaded Python 3

File details

Details for the file antaris_memory-3.1.0.tar.gz.

File metadata

  • Download URL: antaris_memory-3.1.0.tar.gz
  • Upload date:
  • Size: 138.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_memory-3.1.0.tar.gz
Algorithm Hash digest
SHA256 fc3266412b10aebb14f07951860e3c1e6ccfce539250b153ae9b27b25542ab61
MD5 bad3a35e071394b4b3f54406623d2c5c
BLAKE2b-256 fc5af59e3941c5aafc4faa3972850451097111008c28615da7b396b70c493638

See more details on using hashes here.

File details

Details for the file antaris_memory-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: antaris_memory-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 102.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_memory-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f4d44f01c1eed8f59c2ea70d41e00ceea359be07b48e2a96907f952a97388c06
MD5 a908998ab481cde1ed7207e55be8a9ac
BLAKE2b-256 ecd9557cb7738c6968702d9c525373639e40c3640be4e4757840386898fd4124

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page