A scalable memory system for AI agents using graph-based sharding and hierarchical clustering.
Project description
Lazzaro
Scalable Memory System Library for AI Agents
Lazzaro is a Python library that provides AI agents with long-term, scalable, and structured memory. Moving beyond simple vector databases, Lazzaro implements a graph-based memory architecture featuring semantic sharding, hierarchical clustering, and biological-inspired decay. It simulates human memory by maintaining an active context buffer, consolidating interactions into persistent structures, and evolving a multi-domain user profile.
Installation
Install the core library:
pip install lazzaro
Optional Dependencies
Enable specific providers or features:
- Google Gemini:
pip install google-generativeai - Together AI:
pip install together - LangChain:
pip install langchain-core - Autogen:
pip install pyautogen - Visualization:
pip install matplotlib plotly
Core Architecture
Lazzaro manages memory through a multi-layered graph system.
1. Memory Shards (Topic-Based Isolation)
Unlike traditional database sharding by ID, Lazzaro shards memories semantically. Each MemoryShard acts as an independent subgraph containing nodes and edges related to a specific topic (e.g., "coding", "personal health", "travel").
- Shard Inference: When new facts are extracted, Lazzaro uses an LLM to categorize them into existing or new shards.
- Retrieval Heuristic: To maintain low latency, Lazzaro prioritizes recently accessed shards and those with higher node density for initial search.
2. Vector Storage (LanceDB Persistence)
Lazzaro utilizes LanceDB as its primary persistence layer and high-performance vector engine.
- Full State Retention: Unlike basic vector stores, LanceDB persists the entire memory graph, including nodes, edges, and the evolved user profile.
- Fast Retrieval: Sub-millisecond vector search across thousands of nodes.
- Scalable Architecture: Optimized on-disk storage that enables multi-process synchronization and reliable data integrity.
- Automatic Sync: LanceDB is automatically synchronized during all graph operations, ensuring no data loss between sessions.
3. The Buffer Graph
The BufferGraph manages the global state of all shards and super-nodes. It handles:
- Node Integrity: Maintaining content, embeddings, salience, and access metrics.
- Edge Weighting: Tracking the strength of associations between memories.
3. Hierarchical Clustering (Super-Nodes)
When a shard grows beyond a configurable threshold, Lazzaro creates "Super-Nodes". These are synthetic nodes that represent the aggregate content of a cluster.
- Accelerated Search: Retrieval begins at the super-node level to quickly narrow down relevant subgraphs.
- Abstract Reasoning: Super-nodes allow agents to access high-level summaries of broad topics without loading every individual memory.
The Memory Lifecycle
Stage 1: Short-Term Buffer
Every interaction (user message and assistant response) is initially cached in a short-term episodic buffer. This provides immediate context for the current conversation.
Stage 2: Asynchronous Consolidation
Lazzaro runs a multi-stage background process to move buffer data into long-term storage:
- Atomic Fact Extraction: An LLM extracts discrete facts from the conversation stream.
- Deduplication: New facts are compared against the entire memory base using LanceDB vector search. If a near-identical match (similarity > 0.95) is found, the existing node's salience and access count are boosted.
- Graph Linking: New nodes are linked to each other (episodic link) and to semantically related existing nodes (associative link).
- Profile Update: Relevant facts are used to refine the multi-domain User Profile.
Stage 3: Temporal Decay and Pruning
Lazzaro prevents memory bloat through biological-inspired pruning:
- Sigmoidal Decay: Node salience and edge weights decrease over time. The decay follows a non-linear curve that flattens at 0.2, ensuring important memories persist longer while weak associations fade.
- Weak Edge Pruning: Edges with weights falling below a threshold (default 0.5) are automatically removed.
- Buffer Enforcement: If the total node count exceeds
max_buffer_size, the system archives the least salient nodes to maintain performance.
User Profile Evolution
Lazzaro maintains a structured Profile across five key domains:
- Preferences: Specific likes, dislikes, and technical choices.
- Personality Traits: The user's observed demeanor and values.
- Knowledge Domains: Areas where the user exhibits expertise or deep interest.
- Interaction Style: How the user prefers to communicate (e.g., concise, formal, technical).
- Key Experiences: Significant life events or project milestones.
Updates occur during consolidation, where an LLM synthesizes new interactions into existing profile fields.
Retrieval Engine
Retrieval is optimized for both speed and relevance:
- Shard Selection: Only the most relevant shards are searched based on the query.
- Hybrid Search: Combines LanceDB vector search for semantic relevance with hierarchical pathing and recency weighting.
- Associative Boosting: When a node is retrieved, its immediate neighbors in the graph receive a temporary "accessibility boost," pulling related memories into the current context.
- Query Caching: Frequent queries are cached to minimize LLM and embedding overhead.
Usage
Provider Configuration
from lazzaro.core.memory_system import MemorySystem
from lazzaro.core.providers import GeminiLLM, GeminiEmbedder
# Initialize providers
llm = GeminiLLM(api_key="API_KEY", model="gemini-1.5-flash")
embedder = GeminiEmbedder(api_key="API_KEY")
# Initialize Memory System
ms = MemorySystem(
llm_provider=llm,
embedding_provider=embedder,
enable_sharding=True,
enable_hierarchy=True,
max_buffer_size=100
)
# Chat with built-in memory retrieval
ms.start_conversation()
response = ms.chat("I'm working on a Rust project and I prefer using async-std.")
print(response)
# Finalize and trigger background consolidation
print(ms.end_conversation())
Visual Dashboard
For a high-fidelity, interactive experience, Lazzaro includes a custom web-based dashboard:
lazzaro-dashboard
The dashboard will be available at http://localhost:5299 and features:
- Live Force-Graph: Interactive visualization of your memory shards and node relationships.
- Real-time Metrics: Monitor LLM calls, embedding costs, and retrieval latency.
- Profile Explorer: View your evolved user persona domains in a sleek side drawer.
Integrations
LangChain
from lazzaro.integrations import LazzaroLangChainMemory
from langchain.chains import ConversationChain
memory = LazzaroLangChainMemory(memory_system=ms)
chain = ConversationChain(llm=chat_model, memory=memory)
LangGraph
from lazzaro.integrations import LazzaroLangGraph
lg = LazzaroLangGraph(ms)
builder.add_node("retrieve", lg.get_memory_node())
builder.add_node("record", lg.get_record_node())
CLI Reference
Launch the interactive shell:
lazzaro-cli
Command Table
| Command | Description |
|---|---|
/start |
Manual session initialization. |
/end |
Manual session termination and consolidation trigger. |
/stats |
Display node counts, shard density, and performance metrics. |
/profile |
View evolved user profile data. |
/memories [n] |
Inspect the n most recent memory nodes. |
/consolidate |
Force immediate graph-wide consolidation. |
/merge |
Manually trigger semantic deduplication of similar nodes. |
/prune [t] |
Remove edges with weights below threshold t (default: 0.5). |
/config |
View and modify runtime parameters. |
/save [file] |
Export current state to JSON. |
/load [file] |
Import state from JSON. |
Parameter Reference
| Parameter | Default | Description |
|---|---|---|
auto_consolidate |
True |
Extract facts after every N conversations. |
consolidate_every |
3 |
Conversation frequency for consolidation. |
max_buffer_size |
10 |
Total nodes allowed before archiving. |
enable_async |
True |
Background thread processing for consolidation. |
enable_sharding |
True |
Use topic-based subgraph isolation. |
prune_threshold |
0.5 |
Minimum weight to retain an edge. |
load_from_disk |
True |
Automatically restore state from LanceDB on startup. |
db_dir |
"db" |
Directory for LanceDB persistence. |
Persistence and Safety
- LanceDB Native Persistence: Lazzaro maintains its entire state (Graph + Vector + Profile) within LanceDB tables inside the
db/directory. - Atomic Updates: Database operations are atomic, preventing state corruption during unexpected shutdowns.
- Version Control: LanceDB's internal versioning allows for reliable multi-process access and synchronization.
- JSON Export: Human-readable snapshots can be exported using the
/savecommand orsave_state()method for easy debugging and porting.
Development
Run tests:
pytest tests/
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lazzaro-0.2.5.tar.gz.
File metadata
- Download URL: lazzaro-0.2.5.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2aafd93a9e2f972d9fb4b04ba6e0be4f24a3b81f9d5ca6b100f91843994b7a3c
|
|
| MD5 |
9367bee44370faed7f2fc31189b6df56
|
|
| BLAKE2b-256 |
b048426f6214a0cbca123415592bbcaaa771f86ab7cb4d0bdda50e7efc12e7fd
|
File details
Details for the file lazzaro-0.2.5-py3-none-any.whl.
File metadata
- Download URL: lazzaro-0.2.5-py3-none-any.whl
- Upload date:
- Size: 44.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d660b81fa1fadc3613fde1a621c11d03b0e82704e7482db9114cedfb74cf8816
|
|
| MD5 |
872dfb5b95b90f6328b2289c7ab7c37c
|
|
| BLAKE2b-256 |
32f6b91622c00044f341f977eafae1f9df2bc7810913afe0f57a614051a31101
|