Skip to main content

A scalable memory system for AI agents using graph-based sharding and hierarchical clustering.

Project description

Lazzaro

Scalable Memory System Library

Lazzaro is a Python library designed to give AI agents long-term, scalable, and structured memory. Unlike simple vector databases, Lazzaro uses a Graph-based approach combined with Memory Sharding and Hierarchical Clustering to mimic how human memory works: storing active context in a buffer, consolidating short-term interactions into long-term structures, and forgetting irrelevant details over time.

Installation

pip install lazzaro

How It Works

Lazzaro operates on a few core principles to manage memory scalability and relevance:

1. Architecture

  • Sharding: Memories are automatically categorized into shards (e.g., work, personal, health) based on content. This allows the system to retrieve only relevant slices of memory, keeping searches fast.
  • Buffer Graph: Active memories live in a dynamic graph structure where nodes are facts/thoughts and edges are relationships (associations).
  • Persistence: State is automatically persisted to local disk (db/lazzaro.pkl) using fast binary serialization.

2. Memory Lifecycle

  1. Short-Term Memory (STM): Every user interaction is initially stored in a temporary list.
  2. Consolidation: When a conversation ends (or periodically), Lazzaro runs a background process to:
    • Extract atomic facts from the conversation using an LLM.
    • Embed these facts and insert them into the appropriate Shard.
    • Link new memories to existing related memories (Graph edges).
  3. Forgetting: A buffer limit enforces strict discipline. Old, unused, or low-salience memories are "pruned" (archived/deleted) to keep the active graph lightweight.

3. Hierarchy & Super-Nodes

When a shard grows too large, Lazzaro automatically clusters related nodes under a Super-Node. This creates a hierarchical index, allowing retrieval to scan high-level topics first before diving into granular details, significantly improving retrieval performance at scale.

Usage

CLI (Interactive Mode)

The easiest way to use Lazzaro is via the command-line interface.

lazzaro-cli

Common Commands:

  • /start: Begin a new conversation session.
  • /end: End the current session and trigger background consolidation.
  • /stats: View current graph size, cache hit rates, and retrieval latency.
  • /set <param> <value>: Update configuration (e.g., /set max_buffer_size 50).
  • /save <filename>: Export current state to a JSON file.

Python API

Integrate Lazzaro into your own applications:

from lazzaro import MemorySystem
import os

# Initialize the system
# It will automatically load previous state from db/lazzaro.pkl if it exists
ms = MemorySystem(
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    enable_async=True,
    auto_consolidate=True
)

# 1. Start a session
ms.start_conversation()

# 2. Chat with memory context
# The system retrieves relevant memories and injects them into the context
# Use chat_stream to get a streaming response iterator
print("Assistant: ", end="", flush=True)
for token in ms.chat_stream("I'm working on the new physics engine today."):
     if token['type'] == 'token':
         print(token['content'], end="", flush=True)
print()

# 3. Add explicit memories (optional)
ms.add_to_short_term("Project deadline is next Friday.", memory_type="fact")

# 4. End session to trigger consolidation
# This extracts facts, updates the graph, and saves to disk
print(ms.end_conversation())

Framework Integration

Using LangChain

Lazzaro allows you to bring your own LLM backend. Here is how to use ChatOpenAI (or any other LangChain chat model) as the reasoning engine for Lazzaro.

from lazzaro import MemorySystem
from lazzaro.core.interfaces import LLMProvider
from langchain_openai import ChatOpenAI
from typing import List, Dict

class LangChainAdapter(LLMProvider):
    def __init__(self, model_name: str = "gpt-4"):
        self.model = ChatOpenAI(model=model_name, temperature=0.7)
    
    def completion(self, messages: List[Dict[str, str]], response_format: Dict = None) -> str:
        # 1. Convert Lazzaro messages ({'role': '...', 'content': '...'}) 
        #    to LangChain format if necessary, or pass a simple prompt.
        #    For robust chat, we just use the last user message as the prompt here,
        #    but you could build a full ChatPromptTemplate.
        last_message = messages[-1]['content']
        
        # 2. Handle JSON enforcement if requested (Lazzaro uses this for extraction)
        if response_format and response_format.get("type") == "json_object":
             # In a real app, use .with_structured_output() or prompt engineering
             last_message += "\nIMPORTANT: Return valid JSON only."

        # 3. Invoke the LangChain model
        response = self.model.invoke(last_message)
        return response.content
    
    def completion_stream(self, messages: List[Dict[str, str]], response_format: Dict = None):
         # Implement streaming if desired
         pass

# Initialize Lazzaro with your custom adapter
ms = MemorySystem(
    openai_api_key="...",  # Required for default EmbeddingProvider (unless replaced)
    llm_provider=LangChainAdapter(model_name="gpt-4-turbo"),
    # embedding_provider=MyEmbeddingAdapter()  # Optional: Replace embedder too
)

ms.start_conversation()
print(ms.chat("Hello! I'm using LangChain under the hood."))

Configuration

Lazarus is highly configurable. You can adjust these settings during initialization or via the CLI.

Parameter Default Description
auto_consolidate True Automatically extract facts and update graph after every N conversations.
consolidate_every 3 Frequency of full consolidation runs (in number of conversations).
max_buffer_size 10 Maximum number of active nodes in the graph before older ones are pruned.
enable_async True Run consolidation and embedding tasks in background threads for responsiveness.
enable_sharding True Organize memories into semantic topics (work, personal) or date-based shards.
enable_hierarchy True Create "Super-Nodes" to summarize large clusters of memories.
load_from_disk True Automatically reload the last saved state on initialization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazzaro-0.1.2.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazzaro-0.1.2-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file lazzaro-0.1.2.tar.gz.

File metadata

  • Download URL: lazzaro-0.1.2.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4bee159bf264ad627b93e56ba26c7e3fb508df1dec60bee40ae5dfecb1b22b69
MD5 d6aac2bf912090350ed75bf47cac0321
BLAKE2b-256 570db84aadd96901ab93d5eb6be320f3dc08d10f4390995da2eeed1c8bb85b23

See more details on using hashes here.

File details

Details for the file lazzaro-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: lazzaro-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 524cbbcbe42da7c41546f2447cb07ef332e9ef653e330cb0e7737e57e04f5960
MD5 59a2df3a30323b0864e9c4a046799c28
BLAKE2b-256 f3ed6d6612723fcc0569a5f394e5f8814b3a5118bc9c750bf51619565b17a6c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page