A neuro-inspired memory architecture for AI agents — combines a Semantic Palace graph, capacity-bounded Working Memory, and asynchronous consolidation.
Project description
NEXUS Memory
A neuro-inspired long-term memory architecture for AI agents.
NEXUS combines a capacity-bounded Working Memory, a graph-based Semantic Palace, and asynchronous background consolidation to give LLM agents persistent, scalable memory — without blocking real-time interactions.
📄 Paper: NEXUS: A Scalable, Neuro-Inspired Architecture for Long-Term Event Memory in LLM Agents — Shivam Tyagi, 2025 — DOI: 10.13140/RG.2.2.25477.82407
Architecture
┌─────────────────────────────────┐
│ Asynchronous Consolidation │
│ (8 Background Processes) │
│ • Chunking • Cross-Ref. │
│ • Conflict Res. • Skill Ext. │
│ • Forgetting • Spaced Rep. │
│ • Reflection • Defragment. │
└────────────────┬────────────────┘
│ background
┌──────────┐ ┌──────────┐ ┌───────────▼─────────┐ ┌──────────┐
│ Input │──▶│ Attention │──▶│ Episode Buffer │──▶│ Semantic │
│ Text │ │ Gate │ │ (append-only log) │ │ Palace │
└──────────┘ │ (salience │ └─────────────────────┘ │ Graph │
│ filter) │ │ G=(V,E) │
└──────────┘ └────┬─────┘
│
┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ Query │──▶│ Retrieval│──▶│ Working Memory │◀──────────┘
│ │ │ Engine │ │ (7 ± 2 slots) │
└──────────┘ │ Q(v) = │ └───────────────────┘
│ β₁cos + │
│ β₂decay+ │ ┌───────────────────┐
│ β₃freq + │──▶│ Meta-Memory │
│ β₄sal │ │ (confidence map) │
└──────────┘ └───────────────────┘
Core idea: Inspired by human Dual-Process Theory (Daniel Kahneman's Thinking, Fast and Slow), NEXUS decouples memory operations into two pathways:
- System 1 (Fast & Heuristic): Real-time ingestion. Routes interactions to the short-term Episode Buffer in milliseconds without blocking the agent.
- System 2 (Slow & Analytical): Background consolidation. Uses LLM reasoning to chunk, organize, and abstract semantic knowledge asynchronously while the agent is idle.
Quick Start — Claude Code (MCP)
The fastest way to use NEXUS is as a persistent memory layer for Claude Code. One command, and your AI remembers you across every session.
Run the install script:
bash <(curl -s https://raw.githubusercontent.com/shivamtyagi18/nexus-memory/main/install_nexus_mcp.sh)
The script will:
- Create a dedicated venv at
~/.nexus/venv - Install
nexus-memoryinto it - Prompt for your LLM choice and API key
- Register the MCP server in
~/.claude.json - Optionally configure automatic memory hooks
Then restart Claude Code. Verify with /mcp — nexus should appear as connected.
Available tools (10):
| Tool | Description |
|---|---|
nexus_encode |
Store information in long-term memory |
nexus_recall |
Retrieve memories by natural-language query |
nexus_get_context |
Inject working memory into the current prompt |
nexus_how_well_do_i_know |
Confidence check on a topic |
nexus_knowledge_gaps |
List topics NEXUS knows it doesn't know |
nexus_pin |
Mark a memory as permanent (never decayed) |
nexus_forget |
Archive a memory |
nexus_consolidate |
Run a consolidation cycle |
nexus_stats |
System-wide statistics |
nexus_get_suggestions |
Proactive insights from background consolidation |
LLM options — set during install or via environment variables:
| Model | Provider | Requires |
|---|---|---|
mistral (default) |
Local Ollama | ollama pull mistral |
claude-* |
Anthropic | NEXUS_LLM_API_KEY |
gpt-* |
OpenAI | NEXUS_LLM_API_KEY |
gemini* |
NEXUS_LLM_API_KEY |
Installation (Python Library)
pip install nexus-memory
With optional FAISS accelerated vector search:
pip install nexus-memory[faiss]
Or install from source:
git clone https://github.com/shivamtyagi18/nexus-memory.git
cd nexus-memory
pip install -e .
Prerequisites
NEXUS uses an LLM for reasoning tasks (consolidation, reflection, skill extraction). By default it connects to a local Ollama instance:
ollama pull mistral
Alternatively, you can use OpenAI, Anthropic, or Google Gemini — see Using Cloud LLM Providers below.
Using Cloud LLM Providers
NEXUS is provider-agnostic. Just change the llm_model and pass your API key:
from nexus import NEXUS, NexusConfig
# ── OpenAI ──────────────────────────────────────────────
config = NexusConfig(
llm_model="gpt-4o",
openai_api_key="sk-...",
)
# ── Anthropic ───────────────────────────────────────────
config = NexusConfig(
llm_model="claude-3-5-sonnet-20241022",
anthropic_api_key="sk-ant-...",
)
# ── Google Gemini ───────────────────────────────────────
config = NexusConfig(
llm_model="gemini-1.5-flash",
gemini_api_key="AIza...",
)
# ── Local Ollama (default) ──────────────────────────────
config = NexusConfig(
llm_model="mistral", # or llama3, codellama, phi3, etc.
)
memory = NEXUS(config=config)
Routing is automatic based on the model name prefix: gpt-* → OpenAI, claude* → Anthropic, gemini* → Gemini, everything else → Ollama.
Quick Start
from nexus import NEXUS, NexusConfig
# Initialize
config = NexusConfig(
storage_path="./my_agent_memory",
llm_model="mistral",
)
memory = NEXUS(config=config)
# Encode information
memory.encode("User prefers Python for backend development.")
memory.encode("User is allergic to shellfish.", context="medical")
# Recall by natural-language query
results = memory.recall("What language does the user prefer?")
for mem in results:
print(f" [{mem.strength:.2f}] {mem.content}")
# Check what you know (and don't know)
confidence = memory.how_well_do_i_know("programming languages")
print(f"Confidence: {confidence.overall:.0%}")
# Run background consolidation
memory.consolidate()
# Persist to disk
memory.save()
Framework Integrations
NEXUS can be used natively inside standard agent frameworks.
LangChain
Use NexusLangChainMemory to replace ConversationBufferMemory. This gives your agent the cost-savings of a capacity-bounded Working Memory while asynchronously archiving the conversation into the Semantic Palace.
from langchain.chains import ConversationChain
from nexus.integrations.langchain_memory import NexusLangChainMemory
from nexus import NEXUS
# 1. Initialize NEXUS
nexus_engine = NEXUS(storage_path="./langchain_nexus_db")
# 2. Wrap it for LangChain
nexus_memory = NexusLangChainMemory(nexus_client=nexus_engine, top_k=3)
# 3. Plug it into standard chains
conversation = ConversationChain(
llm=my_llm,
memory=nexus_memory,
)
conversation.predict(input="I prefer using PyTorch.")
See examples/langchain_agent.py or examples/quickstart.py for complete working code.
Claude Code (MCP Server)
See Quick Start — Claude Code (MCP) above for one-command setup.
Key API
| Method | Description |
|---|---|
encode(content, context, source) |
Ingest new information through the Attention Gate |
recall(query, top_k) |
Retrieve relevant memories via graph traversal |
how_well_do_i_know(topic) |
Meta-memory confidence check |
consolidate(depth) |
Run background consolidation ("full", "light", "defer") |
save() |
Persist all state to disk |
pin(memory_id) |
Mark a memory as permanent |
forget(memory_id) |
Gracefully forget a memory (leaves a tombstone) |
stats() |
System-wide statistics |
Configuration
All parameters are optional and have sensible defaults:
from nexus import NexusConfig
config = NexusConfig(
# Working Memory
working_memory_slots=7, # Miller's Law: 7 ± 2
# Retrieval scoring weights
recency_weight=0.2,
relevance_weight=0.4,
strength_weight=0.2,
salience_weight=0.2,
# Forgetting
decay_rate=0.99, # per-day temporal decay
strength_hard_threshold=0.05, # below this → forget
# Palace graph
room_merge_threshold=0.85, # similarity to auto-merge rooms
# LLM provider (pick one)
llm_model="mistral", # Ollama (default)
# llm_model="gpt-4o", # OpenAI
# llm_model="claude-3-5-sonnet-20241022",# Anthropic
# llm_model="gemini-1.5-flash", # Google
ollama_base_url="http://localhost:11434",
# Storage
storage_path="./nexus_data",
)
What's New in v1.0.0
- Consolidation robustness overhaul — fixed a critical bug where singleton episodes leaked in the buffer indefinitely, causing consolidation to report "no significant memories" even when important facts were present
- Smarter salience scoring — the heuristic scorer now differentiates content types (personal facts, knowledge updates, instructions) instead of scoring everything the same
- Better contradiction detection — Mistral no longer incorrectly discards memories that agree with existing ones
- Validated across 4 models — benchmarked with gpt-4o-mini, Mistral 7B, CodeLlama 7B, and Llama 3.2 3B
See CHANGELOG.md for full details.
Benchmarks
LoCoMo (Multi-System Comparison)
NEXUS was benchmarked against four baseline architectures on the LoCoMo long-sequence dataset (28 dialog turns, 15 evaluation questions, consolidation enabled):
| System | F1 Score | Latency | Tokens/Query | Consolidation |
|---|---|---|---|---|
| FullContext | 0.345 | 1147ms | 550 | — |
| MemGPT-style | 0.334 | 1397ms | 478 | — |
| NaiveRAG | 0.312 | 1387ms | 145 | — |
| NEXUS v2 | 0.279 | 1317ms | 146 | 41.2s (async) |
| Mem0-style | 0.235 | 1088ms | 106 | — |
Results with GPT-4o-mini. NEXUS consolidation runs asynchronously and does not block queries.
Local Model Comparison (v1.0.0)
All runs use the fixed consolidation pipeline with heuristic scoring:
| Model | F1 Score | Exact Match | Latency | Best Category |
|---|---|---|---|---|
| CodeLlama 7B | 0.317 | 0.200 | 5634ms | Temporal (0.682) |
| Mistral 7B | 0.284 | 0.067 | 3181ms | Knowledge Update (0.516) |
| gpt-4o-mini | 0.262 | 0.000 | 1271ms | Single-hop (0.350) |
| Llama 3.2 3B | 0.184 | 0.067 | 1446ms | Multi-hop (0.134) |
Key finding: CodeLlama 7B outperforms all models on temporal reasoning (F1=0.682) and achieves the highest exact-match rate (20%). Mistral 7B remains the best all-rounder with strong knowledge-update handling.
LongMemEval (Long-Term Interactive Memory)
NEXUS integrates an evaluation harness for the LongMemEval benchmark to test retrieval over 50+ chat sessions:
| System Configuration | Exact Match Accuracy | Average Query Latency |
|---|---|---|
| Baseline (Full Context) | 100.0% | 11.98s |
| NEXUS Dual-Process | 80.0% | 0.98s |
NEXUS restricts the LLM context to the 5 most relevant memories, resulting in a >12× latency reduction compared to context-stuffing.
Vector Search Backend
NEXUS supports two vector search backends. FAISS is auto-detected when installed:
| Backend | 1K vectors | 10K vectors | 100K vectors | Memory (100K) |
|---|---|---|---|---|
| NumPy | 22 µs | 179 µs | 2.75 ms | 146.5 MB |
| FAISS | 28 µs | 200 µs | 2.24 ms | 979 B |
At scale, FAISS is 1.2× faster with 150,000× less memory.
Reproducing Benchmarks
pip install -e ".[benchmarks]"
# Multi-system comparison (requires API key)
python benchmarks/run_benchmark.py --model gpt-4o-mini --systems nexus --consolidate --dataset locomo
# Local model comparison (requires Ollama)
python benchmarks/run_benchmark.py --model mistral --systems nexus --consolidate --dataset locomo
python benchmarks/run_benchmark.py --model codellama --systems nexus --consolidate --dataset locomo
# Vector backend comparison
python benchmarks/vector_benchmark.py
Project Structure
nexus-memory/
├── nexus/ # Core library
│ ├── __init__.py
│ ├── core.py # NEXUS orchestrator
│ ├── models.py # Data models & NexusConfig
│ ├── palace.py # Semantic Palace graph
│ ├── episode_buffer.py # Append-only temporal log
│ ├── working_memory.py # Capacity-bounded priority queue
│ ├── attention_gate.py # Salience filter
│ ├── retrieval.py # Multi-factor retrieval engine
│ ├── consolidation.py # Async background processes
│ ├── meta_memory.py # Confidence mapping
│ ├── vector_store.py # Vector persistence
│ ├── llm_interface.py # Multi-provider LLM connector (Ollama/OpenAI/Anthropic/Gemini)
│ ├── metrics.py # Observability: counters, gauges, histograms, Prometheus export
│ └── integrations/ # Framework adapters
│ ├── langchain_memory.py # LangChain BaseMemory component
│ └── mcp_server.py # Claude Code MCP server (10 tools)
├── install_nexus_mcp.sh # One-command Claude Code setup
├── tests/ # 190 tests across 14 files
├── baselines/ # Baseline implementations for comparison
├── benchmarks/ # Benchmark harness & scripts
├── examples/ # Usage examples
├── paper/ # IEEE research paper (LaTeX + Markdown)
│ └── figures/ # Benchmark charts and UI diagrams
├── pyproject.toml
├── CHANGELOG.md
├── LICENSE
└── README.md
Citation
If you use NEXUS in your research, please cite:
@article{tyagi2025nexus,
title={NEXUS: A Scalable, Neuro-Inspired Architecture for Long-Term Event Memory in LLM Agents},
author={Tyagi, Shivam},
year={2025},
doi={10.13140/RG.2.2.25477.82407}
}
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nexus_memory-1.0.1.tar.gz.
File metadata
- Download URL: nexus_memory-1.0.1.tar.gz
- Upload date:
- Size: 70.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5908d7125a8821614c285067821a5751bed96bd5d09f244d2620733b023e853c
|
|
| MD5 |
25c488677c0eee486141a5fd6933d7ad
|
|
| BLAKE2b-256 |
d80ab97a83cd75163d69d44fb66892875c45661daff01b7d461ba0eef567f4d0
|
File details
Details for the file nexus_memory-1.0.1-py3-none-any.whl.
File metadata
- Download URL: nexus_memory-1.0.1-py3-none-any.whl
- Upload date:
- Size: 58.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25f4020789fd4e378951cda37135caa3d10bec732a2ec3e623392dec63beaaf0
|
|
| MD5 |
a0f43c118f9e73d5b2c6a457862874fb
|
|
| BLAKE2b-256 |
771c887ff09a76dae92d9b15f128889e23f280f7d7e680ae20b73703d05c4e04
|