File-based persistent memory for AI agents. Zero dependencies.
Project description
antaris-memory
File-based persistent memory for AI agents. Zero dependencies (stdlib only), crash-safe writes, file-based JSON storage.
Installation
pip install antaris-memory
Zero dependencies. No API keys. No external services.
Core Features
- BM25 search with TF-IDF scoring - Full-text search with keyword relevance ranking
- Write-ahead log (WAL) - Crash-safe writes with automatic recovery
- Sharding - Horizontal scaling for large memory stores (10,000+ entries)
- Decay engine - Memories fade over time unless reinforced (Ebbinghaus curves)
- Sentiment tagging - Automatic emotional context detection
- Temporal awareness - Time-based queries and chronological context
- Confidence scoring - Reliability metrics for stored information
- Compression and consolidation - Automatic deduplication and clustering
- Forgetting engine - Selective deletion with audit trails
- Input gating - P0-P3 priority classification at intake
- Recovery presets - Context restoration after compaction/restart
- Thread-safe operations - FileLock using os.mkdir() atomicity
- MCP server integration - Expose as MCP tools (optional)
- Export/Import (v4.2.0) - Serialize all memories to JSON; import with merge and deduplication
- GCS Backend Stub (v4.2.0) - Interface defined for Google Cloud Storage backend (full implementation in v4.3)
Quick Start
from antaris_memory import MemorySystem
# Initialize — agent_name personalizes logs and namespace labels
mem = MemorySystem("./workspace", half_life=7.0, agent_name="MyBot")
mem.load()
# Store memories
mem.ingest("PostgreSQL chosen for primary database",
category="technical", memory_type="episodic")
mem.ingest("API costs exceed $500/month budget",
category="operational", confidence=0.9)
# Search with BM25 ranking (searches all memories by default)
results = mem.search("database decision")
for r in results:
print(f"[{r.confidence:.2f}] {r.content}")
# Multi-tenant: scope search to a specific session
results = mem.search("database decision", session_id="session-abc")
# Save to disk
mem.save()
API Reference
Core Exports
from antaris_memory import (
MemorySystem, # Main interface (aliases MemorySystemV4)
MemoryEntry, # Individual memory record
SearchResult, # Search result with metadata
RecoveryManager, # Post-compaction context restoration
RecoveryConfig, # Recovery configuration presets
# Engine Components
DecayEngine, # Time-based memory degradation
SentimentTagger, # Emotional context detection
TemporalEngine, # Time-aware queries
ConfidenceEngine, # Reliability scoring
CompressionEngine, # Deduplication and clustering
ForgettingEngine, # Selective deletion
ConsolidationEngine, # Memory optimization
InputGate, # Priority classification
)
MemorySystem
Primary interface for all memory operations:
mem = MemorySystem(
workspace_path="./memory",
half_life=7.0, # Days until 50% decay
enable_wal=True, # Write-ahead logging
shard_threshold=1000, # Entries per shard
recovery_config=RecoveryConfig() # Post-restart recovery
)
Core Methods
# Loading and saving
mem.load() # Load from disk
mem.save() # Save to disk with WAL
# Memory ingestion
mem.ingest(content, source="", category="", memory_type="episodic",
confidence=1.0, tags=[], metadata={})
# Typed ingestion helpers
mem.ingest_fact("PostgreSQL supports JSON columns")
mem.ingest_preference("User prefers concise responses")
mem.ingest_mistake("Connection pool not closed properly", "Use context managers or explicit close() in finally block")
mem.ingest_procedure("Deploy: git push → CI → staging → prod")
# Input gating (P0-P3 classification)
mem.ingest_with_gating("Critical security alert", source="monitoring")
mem.ingest_with_gating("Thanks for the update", source="chat") # Dropped (P3)
# Search
results = mem.search(query, limit=10, explain=False)
results = mem.search(query, tags=["technical", "database"])
results = mem.between("2026-02-01", "2026-02-28")
# Search with instrumentation context (primary API for OpenClaw plugin / agent use)
results, ctx = mem.search_with_context(query, limit=5)
mem.mark_used([r.id for r in results], ctx) # Boosts relevance of retrieved memories
# Temporal queries
recent = mem.on_date("2026-02-14")
this_week = mem.between("2026-02-17", "2026-02-24")
story = mem.narrative(topic="database migration")
# Maintenance
report = mem.consolidate() # Dedup and optimize
mem.forget(entity="John Doe") # GDPR deletion with audit
mem.compact() # Archive old shards
BM25 Search Engine
Full-text search using BM25 algorithm with TF-IDF scoring:
# Basic search
results = mem.search("database performance issues")
# Search with explanation
results = mem.search("postgres slow", explain=True)
for r in results:
print(f"[{r.relevance:.2f}] {r.content[:60]}")
print(f" Explanation: {r.explanation}")
# Advanced parameters
results = mem.search(
query="API optimization",
limit=20,
min_confidence=0.7,
memory_types=["procedural", "episodic"],
categories=["technical"]
)
Search scoring combines:
- BM25 keyword relevance
- Temporal decay (recent memories score higher)
- Access frequency (frequently accessed memories boost)
- Memory type boost (procedural > episodic for how-to queries)
Write-Ahead Log (WAL)
Crash-safe writes with automatic recovery:
# WAL enabled by default
mem = MemorySystem("./workspace", enable_wal=True)
# Writes go through WAL first
mem.ingest("Important data") # Written to WAL, then committed
mem.save() # Flush WAL to main storage
# Automatic recovery on next load
mem.load() # Replays uncommitted WAL entries
WAL format:
workspace/
├── wal/
│ ├── 20260224_143022_001.wal
│ ├── 20260224_143022_002.wal
│ └── current.wal
Sharding
Horizontal scaling for large memory stores:
# Auto-sharding at 1000 entries per shard (default)
mem = MemorySystem("./workspace", shard_threshold=1000)
# Custom sharding strategy
mem = MemorySystem("./workspace", shard_strategy="temporal") # By date
mem = MemorySystem("./workspace", shard_strategy="semantic") # By topic
Shard structure:
workspace/
├── shards/
│ ├── 2026-02-technical.json # 847 entries
│ ├── 2026-02-operational.json # 1,203 entries
│ └── 2026-01-tactical.json # 512 entries
├── indexes/
│ ├── global_search.json
│ └── shard_manifest.json
Decay Engine
Memories fade over time unless reinforced:
# Configure decay parameters
mem = MemorySystem("./workspace", half_life=7.0) # 7-day half-life
# Query with/without decay consideration
recent = mem.search("performance", use_decay=True)
historical = mem.search("performance", use_decay=False)
# Decay statistics
stats = mem.get_stats()
print(f"Total memories: {stats['total']}")
print(f"Average confidence: {stats['avg_confidence']:.3f}")
Decay follows Ebbinghaus forgetting curve:
strength = initial_strength * exp(-ln(2) * age_days / half_life)
Sentiment Tagging
Automatic emotional context detection:
# Automatic sentiment detection
mem.ingest("The deployment failed catastrophically")
# → Tagged with sentiment: negative, intensity: 0.8
mem.ingest("Successfully migrated all user data")
# → Tagged with sentiment: positive, intensity: 0.7
# Query by sentiment via search filter
positive_memories = mem.search("", sentiment_filter="positive")
issues = mem.search("", sentiment_filter="negative", category="technical")
# Access sentiment metadata
for result in mem.search("deployment"):
if result.metadata.get("sentiment"):
sent = result.metadata["sentiment"]
print(f"Sentiment: {sent['polarity']} ({sent['intensity']:.2f})")
Sentiment classification uses lexicon-based analysis (no model calls).
Temporal Engine
Time-aware queries and chronological context:
# Date range queries
Q1_memories = mem.between("2026-01-01", "2026-03-31")
# Single-day queries
yesterday_memories = mem.on_date("2026-02-23")
# Chronological narrative (returns formatted string)
story = mem.narrative(topic="database migration")
# Time-filtered search
recent_deployments = mem.search(
"deployment",
date_range=("2026-02-01", "2026-02-28")
)
Confidence Engine
Track reliability of stored information:
# Store with confidence scores
mem.ingest("PostgreSQL handles 10K QPS", confidence=0.95) # Measured
mem.ingest("MongoDB might be faster", confidence=0.3) # Speculation
# Filter by confidence
reliable = mem.search("database performance", min_confidence=0.8)
# Confidence statistics
stats = mem.get_stats()
print(f"Total memories: {stats['total']}")
print(f"Average confidence: {stats['avg_confidence']:.3f}")
Compression and Consolidation
Automatic deduplication and memory optimization:
# Manual consolidation — deduplicates and clusters similar memories
report = mem.consolidate()
print(f"Consolidated: {report}")
# Auto-consolidation at ingest (configurable threshold)
mem = MemorySystem("./workspace", auto_consolidate_threshold=5000)
# Compact — archives old shards and frees memory
report = mem.compact()
print(f"Compacted: {report}")
Forgetting Engine
Selective deletion with audit trails:
# GDPR deletion by entity
result = mem.forget(entity="John Doe")
print(f"Removed {len(result['removed'])} entries")
# Time-based cleanup
mem.forget(before_date="2025-01-01")
# Topic-based deletion
mem.forget(topic="staging")
# Audit trail (returned by forget())
result = mem.forget(entity="Jane Smith")
audit = result["audit"]
Input Gating
Priority classification at intake (P0-P3):
Note:
ingest_with_gating()automatically filters low-signal content (P3 ephemeral). Dropped items are logged at DEBUG level. Usemem.ingest()directly if you want to store everything without filtering.
# Automatic classification
mem.ingest_with_gating("URGENT: API down", source="alerts")
# → P0 (critical) → stored immediately
mem.ingest_with_gating("Decided on PostgreSQL", source="meeting")
# → P1 (operational) → stored
mem.ingest_with_gating("sounds good!", source="chat")
# → P3 (ephemeral) → dropped (logged at DEBUG)
# Gating statistics
stats = mem.get_stats()
print(f"Total stored: {stats['total']}")
Priority levels:
- P0 (Critical): Security alerts, errors, deadlines, financial data
- P1 (Operational): Decisions, technical choices, assignments
- P2 (Tactical): Background info, research, general discussion
- P3 (Ephemeral): Greetings, acknowledgments, social noise (dropped)
Recovery Manager
Context restoration after compaction or restart:
# Default smart recovery
config = RecoveryConfig() # 50 memories, 24h window
mem = MemorySystem("./workspace", recovery_config=config)
# Minimal recovery for token efficiency
config = RecoveryConfig(recovery_mode="minimal") # 10 memories, session only
# Custom recovery
config = RecoveryConfig(
recovery_search_limit=100,
recovery_time_window="48h",
recovery_channels="current",
recovery_inject="cache"
)
mem.load()
mem.recover_memories() # Automatic on load
# Access recovered context
recovery_mgr = mem.recovery_manager
cached = recovery_mgr.get_cached_memories()
context_block = recovery_mgr.inject_into_context()
Recovery presets:
| Mode | Memories | Window | Tokens | Use Case |
|---|---|---|---|---|
smart |
50 | 24h | 5-10K | Balanced recovery |
minimal |
10 | session | 1-2K | Token-constrained |
Context Packets
Package relevant memories for sub-agent injection:
# Single query packet
packet = mem.build_context_packet(
task="Debug authentication flow",
max_memories=10,
max_tokens=2000,
include_mistakes=True,
tags=["auth", "security"]
)
# Render for injection
markdown_context = packet.render("markdown")
xml_context = packet.render("xml")
# Multi-query with deduplication
packet = mem.build_context_packet_multi(
task="Performance optimization",
queries=["slow queries", "database bottleneck", "caching"],
max_tokens=3000
)
# Token budget management
packet.trim(max_tokens=1500)
print(f"Final token count: {packet.token_count}")
MCP Server Integration
Expose memory as MCP tools:
# Requires: pip install mcp
from antaris_memory import create_mcp_server
# Create server
server = create_mcp_server(workspace="./memory")
# Run with stdio transport
server.run() # Connect from Claude Desktop, Cursor, etc.
# Available MCP tools:
# - memory_search(query, limit)
# - memory_ingest(content, category, memory_type)
# - memory_consolidate()
# - memory_stats()
Thread Safety
Multiple processes can safely access the same workspace:
from antaris_memory import FileLock
# Exclusive write lock
with FileLock("/path/to/shard.json", timeout=10.0):
data = load_shard()
modify_data(data)
save_shard(data)
# Optimistic concurrency for reads
from antaris_memory import VersionTracker
tracker = VersionTracker()
version = tracker.snapshot("/path/to/data.json")
data = load_data()
process_data(data)
tracker.check(version) # Raises ConflictError if modified
FileLock uses os.mkdir() for cross-platform atomic operations.
Export and Import (v4.2.0)
Serialize all memories to a portable JSON snapshot, then restore or merge them into any workspace.
from antaris_memory import MemorySystem
memory = MemorySystem(workspace="./data")
memory.load()
# Export all memories
count = memory.export("./backup.json")
print(f"Exported {count} memories")
# Import with merge (deduplicates by content hash)
imported = memory.import_from("./backup.json", merge=True)
print(f"Imported {imported} new memories")
export(output_path) writes a single JSON file containing every memory entry, shard manifests, and index metadata. Returns the total number of exported memories.
import_from(input_path, merge=True) reads the snapshot and merges entries into the current workspace, skipping exact duplicates (by content hash). Returns the count of newly added memories. Set merge=False to overwrite instead.
GCS Backend Stub (v4.2.0)
The Google Cloud Storage backend interface is now defined and importable. The full cloud-backed implementation ships in v4.3.
from antaris_memory.backends.gcs import GCSMemoryBackend
backend = GCSMemoryBackend(bucket="my-agent-memories", prefix="prod/")
# Interface is complete; persistence calls are stubbed until v4.3
Provides the same load(), save(), search(), and ingest() API as the default file backend, making it a drop-in replacement once fully implemented.
Benchmarks
Tested on Apple M4, Python 3.14, SSD storage.
Search Performance
| Memories | Ingest (avg) | Search (avg) | Search (p99) | Memory (MB) |
|---|---|---|---|---|
| 100 | 0.053ms | 0.40ms | 0.65ms | 8 |
| 1,000 | 0.033ms | 3.43ms | 5.14ms | 45 |
| 10,000 | 0.035ms | 24.7ms | 38.2ms | 180 |
| 50,000 | 0.041ms | 127ms | 195ms | 850 |
Comparison with Other Libraries
Search performance against existing memory libraries:
| Library | 1K memories | 10K memories | Dependencies |
|---|---|---|---|
| antaris-memory | 3.4ms | 24.7ms | 0 (stdlib) |
| mem0 | 610ms | 1,507,000ms | Redis + Vector DB |
| langchain-memory | 185ms | 4,460ms | Multiple |
Result: 61,030x faster than mem0 at scale, 180x faster at small scale.
Note: These benchmarks compare antaris-memory's local file-based storage against mem0's default networked backend (Qdrant vector DB + Redis). antaris-memory is designed as a zero-infrastructure local solution; mem0's strength is cloud-scale distributed search. The speed advantage comes from eliminating network round-trips and serialization overhead.
Input Gating Performance
P0-P3 classification speed:
| Metric | Value |
|---|---|
| Average classification | 0.177ms |
| P99 classification | 0.45ms |
| Throughput | 5,650 classifications/sec |
Storage Efficiency
| Memories | Raw JSON | Compressed | Compression Ratio |
|---|---|---|---|
| 1,000 | 1.1MB | 340KB | 3.2:1 |
| 10,000 | 11.2MB | 2.8MB | 4.0:1 |
| 50,000 | 56.8MB | 12.1MB | 4.7:1 |
Storage Format
Plain JSON files for transparency and debuggability:
workspace/
├── shards/
│ ├── 2026-02-technical.json # Technical memories
│ ├── 2026-02-operational.json # Operational decisions
│ └── 2026-01-archive.json # Archived memories
├── indexes/
│ ├── search_index.json # BM25 inverted index
│ ├── tag_index.json # Tag mappings
│ ├── date_index.json # Temporal index
│ └── confidence_index.json # Confidence levels
├── namespaces/ # Isolated namespace stores
│ └── project-alpha/
│ ├── shards/
│ ├── indexes/
│ └── ...
├── wal/
│ ├── current.wal # Active write-ahead log
│ └── 20260224_143022.wal # Rotated WAL files
├── audit/
│ └── deletions.json # GDPR audit trail
└── config.json # Workspace configuration
Architecture
MemorySystem (v4.2.0)
├── Core Components
│ ├── ShardManager # Horizontal scaling
│ ├── IndexManager # Search indexes
│ └── WALManager # Write-ahead logging
├── Search Engine
│ ├── BM25Engine # Keyword ranking
│ ├── TFIDFScorer # Term frequency scoring
│ └── TemporalRanker # Time-based relevance
├── Memory Processing
│ ├── DecayEngine # Ebbinghaus forgetting
│ ├── SentimentTagger # Emotional context
│ ├── ConfidenceEngine # Reliability scoring
│ ├── CompressionEngine # Deduplication
│ ├── ConsolidationEngine # Memory optimization
│ └── ForgettingEngine # Selective deletion
├── Input Processing
│ ├── InputGate # P0-P3 classification
│ └── MemoryTyper # Episodic/semantic/procedural
├── Recovery System
│ ├── RecoveryManager # Post-restart context
│ └── RecoveryConfig # Smart/minimal presets
├── Concurrency
│ ├── FileLock # Cross-platform locking
│ └── VersionTracker # Optimistic concurrency
├── Integration
│ ├── MCPServer # Model Context Protocol
│ └── ContextPacketBuilder # Sub-agent injection
└── Backends (v4.2.0)
├── FileBackend # Default local JSON storage
├── ExportImport # JSON snapshot export/import
└── GCSMemoryBackend # Google Cloud Storage stub (full impl v4.3)
Memory Types
Store memories with type-specific optimizations:
# Episodic: events, decisions, meeting notes
mem.ingest("Decided to migrate to PostgreSQL in Q2 meeting", memory_type="episodic")
# Semantic: facts, concepts, general knowledge
mem.ingest("PostgreSQL supports ACID transactions", memory_type="semantic")
# Procedural: how-to steps, runbooks, processes (shorthand helper)
mem.ingest_procedure("Deploy: git push → CI → staging → production")
# Preference: user preferences, style notes (shorthand helper)
mem.ingest_preference("User prefers Python code examples over pseudocode")
# Mistake: errors to avoid, lessons learned (shorthand helper)
mem.ingest_mistake("Forgot to close database connections in worker threads", "Use context managers or explicit close() in finally block")
Type-specific recall boosts:
- Procedural: 2.5x boost for how-to queries
- Preference: 2.0x boost for style/format queries
- Mistake: 1.8x boost for troubleshooting queries
- Semantic: 1.2x boost for factual queries
- Episodic: Baseline (1.0x)
Namespace Isolation
Multi-tenant workspaces with hard boundaries:
from antaris_memory import NamespacedMemory, NamespaceManager
# Create isolated namespaces
manager = NamespaceManager("./workspace")
agent_a = manager.create_namespace("agent-a")
agent_b = manager.create_namespace("agent-b")
# Each namespace is fully isolated
agent_a.ingest("Agent A decision")
agent_b.ingest("Agent B decision")
# Search within namespace only
results_a = agent_a.search("decision") # Only sees agent A memories
results_b = agent_b.search("decision") # Only sees agent B memories
# Cross-namespace operations (explicit)
all_decisions = manager.search_across_namespaces("decision",
namespaces=["agent-a", "agent-b"])
Testing
Run the full test suite:
git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-memory
python -m pytest tests/ -v
# Run specific test categories
python -m pytest tests/test_search.py -v # Search engine
python -m pytest tests/test_wal.py -v # Write-ahead log
python -m pytest tests/test_decay.py -v # Memory decay
python -m pytest tests/test_concurrency.py -v # Thread safety
564 tests pass with zero external dependencies.
Migration from v3.x
Automatic schema migration on first load:
# v3.x workspaces load automatically
mem = MemorySystem("./existing_v3_workspace")
mem.load() # Auto-detects v3 format, migrates to v4
# New v4 features available immediately
mem.ingest_with_gating("Test message", source="migration")
results = mem.search("test", explain=True)
Limitations
We are honest about what this library cannot do:
- Storage scale: JSON files work well up to ~50,000 memories. Beyond that, you need a database.
- Semantic understanding: Core search is keyword-based. Add your own embedding function for semantic search.
- Graph relationships: Flat memory store. No entity relationships or graph traversal.
- Real-time updates: File-based storage has write latency. Not suitable for real-time applications.
- Distributed systems: Single-machine only. No clustering or distributed consensus.
When you hit these limits, you know it's time for a more complex solution.
License
Licensed under the Apache License 2.0. See LICENSE for details.
Related Packages
- antaris-router - Adaptive model routing with SLA enforcement
- antaris-guard - Security and prompt injection detection
- antaris-context - Context window optimization
- antaris-pipeline - Agent orchestration pipeline
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antaris_memory-4.9.19.tar.gz.
File metadata
- Download URL: antaris_memory-4.9.19.tar.gz
- Upload date:
- Size: 15.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e0c9a90c9a320c755152dfddb616775183ab3333709ec0e52520284604e042c
|
|
| MD5 |
cfe5965be733ce4e6ab353072ebecdc2
|
|
| BLAKE2b-256 |
9599e48434fdf31b58b83a5d84c946990dfda7e65e115db5963c59afd6688a48
|
File details
Details for the file antaris_memory-4.9.19-py3-none-any.whl.
File metadata
- Download URL: antaris_memory-4.9.19-py3-none-any.whl
- Upload date:
- Size: 15.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9848ef201cf0140098b6bdc10d1eadf208258fa1ed3000f39e70b34bd4352801
|
|
| MD5 |
ceac6565d7893241466bd41d31e02323
|
|
| BLAKE2b-256 |
2b625af94e78fb810e64582e2de79c8a0526e2cad9759632023a86bbae81796e
|