A scalable memory system for AI agents using graph-based sharding and hierarchical clustering.

Project description

Lazzaro

Scalable Memory System Library for AI Agents

Lazzaro is a Python library that provides AI agents with long-term, scalable, and structured memory. Moving beyond simple vector databases, Lazzaro implements a graph-based memory architecture featuring semantic sharding, hierarchical clustering, and biological-inspired decay. It simulates human memory by maintaining an active context buffer, consolidating interactions into persistent structures, and evolving a multi-domain user profile.

Installation

Install the core library:

pip install lazzaro

Optional Dependencies

Enable specific providers or features:

Google Gemini: pip install google-generativeai
Together AI: pip install together
LangChain: pip install langchain-core
Autogen: pip install pyautogen
Visualization: pip install matplotlib plotly

Core Architecture

Lazzaro manages memory through a multi-layered graph system.

1. Memory Shards (Topic-Based Isolation)

Unlike traditional database sharding by ID, Lazzaro shards memories semantically. Each MemoryShard acts as an independent subgraph containing nodes and edges related to a specific topic (e.g., "coding", "personal health", "travel").

Shard Inference: When new facts are extracted, Lazzaro uses an LLM to categorize them into existing or new shards.
Retrieval Heuristic: To maintain low latency, Lazzaro prioritizes recently accessed shards and those with higher node density for initial search.

2. Vector Storage (LanceDB Persistence)

Lazzaro utilizes LanceDB as its primary persistence layer and high-performance vector engine.

Full State Retention: Unlike basic vector stores, LanceDB persists the entire memory graph, including nodes, edges, and the evolved user profile.
Fast Retrieval: Sub-millisecond vector search across thousands of nodes.
Scalable Architecture: Optimized on-disk storage that enables multi-process synchronization and reliable data integrity.
Automatic Sync: LanceDB is automatically synchronized during all graph operations, ensuring no data loss between sessions.

3. The Buffer Graph

The BufferGraph manages the global state of all shards and super-nodes. It handles:

Node Integrity: Maintaining content, embeddings, salience, and access metrics.
Edge Weighting: Tracking the strength of associations between memories.

3. Hierarchical Clustering (Super-Nodes)

When a shard grows beyond a configurable threshold, Lazzaro creates "Super-Nodes". These are synthetic nodes that represent the aggregate content of a cluster.

Accelerated Search: Retrieval begins at the super-node level to quickly narrow down relevant subgraphs.
Abstract Reasoning: Super-nodes allow agents to access high-level summaries of broad topics without loading every individual memory.

The Memory Lifecycle

Stage 1: Short-Term Buffer

Every interaction (user message and assistant response) is initially cached in a short-term episodic buffer. This provides immediate context for the current conversation.

Stage 2: Asynchronous Consolidation

Lazzaro runs a multi-stage background process to move buffer data into long-term storage:

Atomic Fact Extraction: An LLM extracts discrete facts from the conversation stream.
Deduplication: New facts are compared against the entire memory base using LanceDB vector search. If a near-identical match (similarity > 0.95) is found, the existing node's salience and access count are boosted.
Graph Linking: New nodes are linked to each other (episodic link) and to semantically related existing nodes (associative link).
Profile Update: Relevant facts are used to refine the multi-domain User Profile.

Stage 3: Temporal Decay and Pruning

Lazzaro prevents memory bloat through biological-inspired pruning:

Sigmoidal Decay: Node salience and edge weights decrease over time. The decay follows a non-linear curve that flattens at 0.2, ensuring important memories persist longer while weak associations fade.
Weak Edge Pruning: Edges with weights falling below a threshold (default 0.5) are automatically removed.
Buffer Enforcement: If the total node count exceeds max_buffer_size, the system archives the least salient nodes to maintain performance.

User Profile Evolution

Lazzaro maintains a structured Profile across five key domains:

Preferences: Specific likes, dislikes, and technical choices.
Personality Traits: The user's observed demeanor and values.
Knowledge Domains: Areas where the user exhibits expertise or deep interest.
Interaction Style: How the user prefers to communicate (e.g., concise, formal, technical).
Key Experiences: Significant life events or project milestones.

Updates occur during consolidation, where an LLM synthesizes new interactions into existing profile fields.

Retrieval Engine

Retrieval is optimized for both speed and relevance:

Shard Selection: Only the most relevant shards are searched based on the query.
Hybrid Search: Combines LanceDB vector search for semantic relevance with hierarchical pathing and recency weighting.
Associative Boosting: When a node is retrieved, its immediate neighbors in the graph receive a temporary "accessibility boost," pulling related memories into the current context.
Query Caching: Frequent queries are cached to minimize LLM and embedding overhead.

Usage

Provider Configuration

from lazzaro.core.memory_system import MemorySystem
from lazzaro.core.providers import GeminiLLM, GeminiEmbedder

# Initialize providers
llm = GeminiLLM(api_key="API_KEY", model="gemini-1.5-flash")
embedder = GeminiEmbedder(api_key="API_KEY")

# Initialize Memory System
ms = MemorySystem(
    llm_provider=llm, 
    embedding_provider=embedder,
    enable_sharding=True,
    enable_hierarchy=True,
    max_buffer_size=100
)

# Chat with built-in memory retrieval
ms.start_conversation()
response = ms.chat("I'm working on a Rust project and I prefer using async-std.")
print(response)

# Finalize and trigger background consolidation
print(ms.end_conversation())

Visual Dashboard

For a high-fidelity, interactive experience, Lazzaro includes a custom web-based dashboard:

lazzaro-dashboard

Lazzaro Dashboard Preview

The dashboard will be available at http://localhost:5299 and features:

Live Force-Graph: Interactive visualization of your memory shards and node relationships.
Real-time Metrics: Monitor LLM calls, embedding costs, and retrieval latency.
Profile Explorer: View your evolved user persona domains in a sleek side drawer.

Integrations

LangChain

from lazzaro.integrations import LazzaroLangChainMemory
from langchain.chains import ConversationChain

memory = LazzaroLangChainMemory(memory_system=ms)
chain = ConversationChain(llm=chat_model, memory=memory)

LangGraph

from lazzaro.integrations import LazzaroLangGraph

lg = LazzaroLangGraph(ms)
builder.add_node("retrieve", lg.get_memory_node())
builder.add_node("record", lg.get_record_node())

CLI Reference

Launch the interactive shell:

lazzaro-cli

Command Table

Command	Description
`/start`	Manual session initialization.
`/end`	Manual session termination and consolidation trigger.
`/stats`	Display node counts, shard density, and performance metrics.
`/profile`	View evolved user profile data.
`/memories [n]`	Inspect the `n` most recent memory nodes.
`/consolidate`	Force immediate graph-wide consolidation.
`/merge`	Manually trigger semantic deduplication of similar nodes.
`/prune [t]`	Remove edges with weights below threshold `t` (default: 0.5).
`/config`	View and modify runtime parameters.
`/save [file]`	Export current state to JSON.
`/load [file]`	Import state from JSON.

Parameter Reference

Parameter	Default	Description
`auto_consolidate`	`True`	Extract facts after every N conversations.
`consolidate_every`	`3`	Conversation frequency for consolidation.
`max_buffer_size`	`10`	Total nodes allowed before archiving.
`enable_async`	`True`	Background thread processing for consolidation.
`enable_sharding`	`True`	Use topic-based subgraph isolation.
`prune_threshold`	`0.5`	Minimum weight to retain an edge.
`load_from_disk`	`True`	Automatically restore state from LanceDB on startup.
`db_dir`	`"db"`	Directory for LanceDB persistence.

Persistence and Safety

LanceDB Native Persistence: Lazzaro maintains its entire state (Graph + Vector + Profile) within LanceDB tables inside the db/ directory.
Atomic Updates: Database operations are atomic, preventing state corruption during unexpected shutdowns.
Version Control: LanceDB's internal versioning allows for reliable multi-process access and synchronization.
JSON Export: Human-readable snapshots can be exported using the /save command or save_state() method for easy debugging and porting.

Development

Run tests:

pytest tests/

License

This project is licensed under the MIT License.

Project details

Release history Release notifications | RSS feed

0.2.5.3

Dec 25, 2025

0.2.5.2

Dec 25, 2025

0.2.5.1

Dec 25, 2025

This version

0.2.5

Dec 25, 2025

0.2.4

Dec 25, 2025

0.2.3

Dec 25, 2025

0.2.2

Dec 25, 2025

0.2.1

Dec 25, 2025

0.2.0

Dec 25, 2025

0.1.2.3

Dec 25, 2025

0.1.2.2

Dec 25, 2025

0.1.2.1

Dec 24, 2025

0.1.2

Dec 24, 2025

0.1.1

Dec 24, 2025

0.1.0

Dec 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazzaro-0.2.5.tar.gz (47.6 kB view details)

Uploaded Dec 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lazzaro-0.2.5-py3-none-any.whl (44.6 kB view details)

Uploaded Dec 25, 2025 Python 3

File details

Details for the file lazzaro-0.2.5.tar.gz.

File metadata

Download URL: lazzaro-0.2.5.tar.gz
Upload date: Dec 25, 2025
Size: 47.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`2aafd93a9e2f972d9fb4b04ba6e0be4f24a3b81f9d5ca6b100f91843994b7a3c`
MD5	`9367bee44370faed7f2fc31189b6df56`
BLAKE2b-256	`b048426f6214a0cbca123415592bbcaaa771f86ab7cb4d0bdda50e7efc12e7fd`

See more details on using hashes here.

File details

Details for the file lazzaro-0.2.5-py3-none-any.whl.

File metadata

Download URL: lazzaro-0.2.5-py3-none-any.whl
Upload date: Dec 25, 2025
Size: 44.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d660b81fa1fadc3613fde1a621c11d03b0e82704e7482db9114cedfb74cf8816`
MD5	`872dfb5b95b90f6328b2289c7ab7c37c`
BLAKE2b-256	`32f6b91622c00044f341f977eafae1f9df2bc7810913afe0f57a614051a31101`

See more details on using hashes here.

lazzaro 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Lazzaro

Installation

Optional Dependencies

Core Architecture

1. Memory Shards (Topic-Based Isolation)

2. Vector Storage (LanceDB Persistence)

3. The Buffer Graph

3. Hierarchical Clustering (Super-Nodes)

The Memory Lifecycle

Stage 1: Short-Term Buffer

Stage 2: Asynchronous Consolidation

Stage 3: Temporal Decay and Pruning

User Profile Evolution

Retrieval Engine

Usage

Provider Configuration

Visual Dashboard

Integrations

LangChain

LangGraph

CLI Reference

Command Table

Parameter Reference

Persistence and Safety

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes