Skip to main content

A scalable memory system for AI agents using graph-based sharding and hierarchical clustering.

Project description

Lazzaro

Scalable Memory System Library

Lazzaro is a Python library designed to give AI agents long-term, scalable, and structured memory. Unlike simple vector databases, Lazzaro uses a Graph-based approach combined with Memory Sharding and Hierarchical Clustering to mimic how human memory works: storing active context in a buffer, consolidating short-term interactions into long-term structures, and forgetting irrelevant details over time.

Installation

pip install lazzaro

How It Works

Lazzaro operates on a few core principles to manage memory scalability and relevance:

1. Architecture

  • Sharding: Memories are automatically categorized into shards (e.g., work, personal, health) based on content. This allows the system to retrieve only relevant slices of memory, keeping searches fast.
  • Buffer Graph: Active memories live in a dynamic graph structure where nodes are facts/thoughts and edges are relationships (associations).
  • Persistence: State is automatically persisted to local disk (db/lazzaro.pkl) using fast binary serialization.

2. Memory Lifecycle

  1. Short-Term Memory (STM): Every user interaction is initially stored in a temporary list.
  2. Consolidation: When a conversation ends (or periodically), Lazzaro runs a background process to:
    • Extract atomic facts from the conversation using an LLM.
    • Embed these facts and insert them into the appropriate Shard.
    • Link new memories to existing related memories (Graph edges).
  3. Forgetting: A buffer limit enforces strict discipline. Old, unused, or low-salience memories are "pruned" (archived/deleted) to keep the active graph lightweight.

3. Hierarchy & Super-Nodes

When a shard grows too large, Lazzaro automatically clusters related nodes under a Super-Node. This creates a hierarchical index, allowing retrieval to scan high-level topics first before diving into granular details, significantly improving retrieval performance at scale.

Usage

CLI (Interactive Mode)

The easiest way to use Lazzaro is via the command-line interface.

lazzaro-cli

Common Commands:

  • /start: Begin a new conversation session.
  • /end: End the current session and trigger background consolidation.
  • /stats: View current graph size, cache hit rates, and retrieval latency.
  • /set <param> <value>: Update configuration (e.g., /set max_buffer_size 50).
  • /save <filename>: Export current state to a JSON file.

Python API

Integrate Lazzaro into your own applications:

from lazzaro import MemorySystem
import os

# Initialize the system
# It will automatically load previous state from db/lazzaro.pkl if it exists
ms = MemorySystem(
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    enable_async=True,
    auto_consolidate=True
)

# 1. Start a session
ms.start_conversation()

# 2. Chat with memory context
# The system retrieves relevant memories and injects them into the context
# Use chat_stream to get a streaming response iterator
print("Assistant: ", end="", flush=True)
for token in ms.chat_stream("I'm working on the new physics engine today."):
     if token['type'] == 'token':
         print(token['content'], end="", flush=True)
print()

# 3. Add explicit memories (optional)
ms.add_to_short_term("Project deadline is next Friday.", memory_type="fact")

# 4. End session to trigger consolidation
# This extracts facts, updates the graph, and saves to disk
print(ms.end_conversation())

Framework Integration

Using LangChain

Lazzaro allows you to bring your own LLM backend. Here is how to use ChatOpenAI (or any other LangChain chat model) as the reasoning engine for Lazzaro.

from lazzaro import MemorySystem
from lazzaro.core.interfaces import LLMProvider
from langchain_openai import ChatOpenAI
from typing import List, Dict

class LangChainAdapter(LLMProvider):
    def __init__(self, model_name: str = "gpt-4"):
        self.model = ChatOpenAI(model=model_name, temperature=0.7)
    
    def completion(self, messages: List[Dict[str, str]], response_format: Dict = None) -> str:
        # 1. Convert Lazzaro messages ({'role': '...', 'content': '...'}) 
        #    to LangChain format if necessary, or pass a simple prompt.
        #    For robust chat, we just use the last user message as the prompt here,
        #    but you could build a full ChatPromptTemplate.
        last_message = messages[-1]['content']
        
        # 2. Handle JSON enforcement if requested (Lazzaro uses this for extraction)
        if response_format and response_format.get("type") == "json_object":
             # In a real app, use .with_structured_output() or prompt engineering
             last_message += "\nIMPORTANT: Return valid JSON only."

        # 3. Invoke the LangChain model
        response = self.model.invoke(last_message)
        return response.content
    
    def completion_stream(self, messages: List[Dict[str, str]], response_format: Dict = None):
         # Implement streaming if desired
         pass

# Initialize Lazzaro with your custom adapter
ms = MemorySystem(
    openai_api_key="...",  # Required for default EmbeddingProvider (unless replaced)
    llm_provider=LangChainAdapter(model_name="gpt-4-turbo"),
    # embedding_provider=MyEmbeddingAdapter()  # Optional: Replace embedder too
)

ms.start_conversation()
print(ms.chat("Hello! I'm using LangChain under the hood."))

Configuration

Lazarus is highly configurable. You can adjust these settings during initialization or via the CLI.

Parameter Default Description
auto_consolidate True Automatically extract facts and update graph after every N conversations.
consolidate_every 3 Frequency of full consolidation runs (in number of conversations).
max_buffer_size 10 Maximum number of active nodes in the graph before older ones are pruned.
enable_async True Run consolidation and embedding tasks in background threads for responsiveness.
enable_sharding True Organize memories into semantic topics (work, personal) or date-based shards.
enable_hierarchy True Create "Super-Nodes" to summarize large clusters of memories.
load_from_disk True Automatically reload the last saved state on initialization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazzaro-0.1.0.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazzaro-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file lazzaro-0.1.0.tar.gz.

File metadata

  • Download URL: lazzaro-0.1.0.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6ca52e2f21ed9da92ffb21bab7c9b5e6be538993eb09a5e6f9afc370ef3b0964
MD5 68129ac6a22cd304953bfcc535264f32
BLAKE2b-256 c89a68c6228fbe1a245ed37a737dec0ba788f25373cac63ca83b23a7c0ef3d14

See more details on using hashes here.

File details

Details for the file lazzaro-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lazzaro-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d149192054dd3cd696c8abefe8882ca742f11d9b93a6348257113b6793cbebaa
MD5 edd5eb36010fddecc4fa7d93f35898ce
BLAKE2b-256 3d64537704d331056d5de783bba4cb41de38f2a99e7fa40ef2d1d85c15e091cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page