Skip to main content

A scalable memory system for AI agents using graph-based sharding and hierarchical clustering.

Project description

Lazzaro

Scalable Memory System Library

Lazzaro is a Python library designed to give AI agents long-term, scalable, and structured memory. Unlike simple vector databases, Lazzaro uses a Graph-based approach combined with Memory Sharding and Hierarchical Clustering to mimic how human memory works: storing active context in a buffer, consolidating short-term interactions into long-term structures, and forgetting irrelevant details over time.

Installation

pip install lazzaro

How It Works

Lazzaro operates on a few core principles to manage memory scalability and relevance:

1. Architecture

  • Sharding: Memories are automatically categorized into shards (e.g., work, personal, health) based on content. This allows the system to retrieve only relevant slices of memory, keeping searches fast.
  • Buffer Graph: Active memories live in a dynamic graph structure where nodes are facts/thoughts and edges are relationships (associations).
  • Persistence: State is automatically persisted to local disk (db/lazzaro.pkl) using fast binary serialization.

2. Memory Lifecycle

  1. Short-Term Memory (STM): Every user interaction is initially stored in a temporary list.
  2. Consolidation: When a conversation ends (or periodically), Lazzaro runs a background process to:
    • Extract atomic facts from the conversation using an LLM.
    • Embed these facts and insert them into the appropriate Shard.
    • Link new memories to existing related memories (Graph edges).
  3. Forgetting: A buffer limit enforces strict discipline. Old, unused, or low-salience memories are "pruned" (archived/deleted) to keep the active graph lightweight.

3. Hierarchy & Super-Nodes

When a shard grows too large, Lazzaro automatically clusters related nodes under a Super-Node. This creates a hierarchical index, allowing retrieval to scan high-level topics first before diving into granular details, significantly improving retrieval performance at scale.

Usage

CLI (Interactive Mode)

The easiest way to use Lazzaro is via the command-line interface.

lazzaro-cli

Common Commands:

  • /start: Begin a new conversation session.
  • /end: End the current session and trigger background consolidation.
  • /stats: View current graph size, cache hit rates, and retrieval latency.
  • /set <param> <value>: Update configuration (e.g., /set max_buffer_size 50).
  • /save <filename>: Export current state to a JSON file.

Python API

Integrate Lazzaro into your own applications:

from lazzaro import MemorySystem
import os

# Initialize the system
# It will automatically load previous state from db/lazzaro.pkl if it exists
ms = MemorySystem(
    openai_api_key=os.getenv("OPENAI_API_KEY"),
    enable_async=True,
    auto_consolidate=True
)

# 1. Start a session
ms.start_conversation()

# 2. Chat with memory context
# The system retrieves relevant memories and injects them into the context
# Use chat_stream to get a streaming response iterator
print("Assistant: ", end="", flush=True)
for token in ms.chat_stream("I'm working on the new physics engine today."):
     if token['type'] == 'token':
         print(token['content'], end="", flush=True)
print()

# 3. Add explicit memories (optional)
ms.add_to_short_term("Project deadline is next Friday.", memory_type="fact")

# 4. End session to trigger consolidation
# This extracts facts, updates the graph, and saves to disk
print(ms.end_conversation())

Framework Integration

Using LangChain

Lazzaro allows you to bring your own LLM backend. Here is how to use ChatOpenAI (or any other LangChain chat model) as the reasoning engine for Lazzaro.

from lazzaro import MemorySystem
from lazzaro.core.interfaces import LLMProvider
from langchain_openai import ChatOpenAI
from typing import List, Dict

class LangChainAdapter(LLMProvider):
    def __init__(self, model_name: str = "gpt-4"):
        self.model = ChatOpenAI(model=model_name, temperature=0.7)
    
    def completion(self, messages: List[Dict[str, str]], response_format: Dict = None) -> str:
        # 1. Convert Lazzaro messages ({'role': '...', 'content': '...'}) 
        #    to LangChain format if necessary, or pass a simple prompt.
        #    For robust chat, we just use the last user message as the prompt here,
        #    but you could build a full ChatPromptTemplate.
        last_message = messages[-1]['content']
        
        # 2. Handle JSON enforcement if requested (Lazzaro uses this for extraction)
        if response_format and response_format.get("type") == "json_object":
             # In a real app, use .with_structured_output() or prompt engineering
             last_message += "\nIMPORTANT: Return valid JSON only."

        # 3. Invoke the LangChain model
        response = self.model.invoke(last_message)
        return response.content
    
    def completion_stream(self, messages: List[Dict[str, str]], response_format: Dict = None):
         # Implement streaming if desired
         pass

# Initialize Lazzaro with your custom adapter
ms = MemorySystem(
    openai_api_key="...",  # Required for default EmbeddingProvider (unless replaced)
    llm_provider=LangChainAdapter(model_name="gpt-4-turbo"),
    # embedding_provider=MyEmbeddingAdapter()  # Optional: Replace embedder too
)

ms.start_conversation()
print(ms.chat("Hello! I'm using LangChain under the hood."))

Configuration

Lazarus is highly configurable. You can adjust these settings during initialization or via the CLI.

Parameter Default Description
auto_consolidate True Automatically extract facts and update graph after every N conversations.
consolidate_every 3 Frequency of full consolidation runs (in number of conversations).
max_buffer_size 10 Maximum number of active nodes in the graph before older ones are pruned.
enable_async True Run consolidation and embedding tasks in background threads for responsiveness.
enable_sharding True Organize memories into semantic topics (work, personal) or date-based shards.
enable_hierarchy True Create "Super-Nodes" to summarize large clusters of memories.
load_from_disk True Automatically reload the last saved state on initialization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazzaro-0.1.2.1.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazzaro-0.1.2.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file lazzaro-0.1.2.1.tar.gz.

File metadata

  • Download URL: lazzaro-0.1.2.1.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.2.1.tar.gz
Algorithm Hash digest
SHA256 828fdedae94105f14179b020b4021a2d3c77421929396563435dccb12bd45141
MD5 acf3cd83f84996456cc15a604aa7ca73
BLAKE2b-256 5ecb980ad84c04b1f40d27e7ec5dcaddd92da3873a941279fb49500c77a0f7ae

See more details on using hashes here.

File details

Details for the file lazzaro-0.1.2.1-py3-none-any.whl.

File metadata

  • Download URL: lazzaro-0.1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lazzaro-0.1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 39dea7f3515aaa2bb606de36c36cf3b1e9a9e9d4908ac2e2ce01499b83b291a4
MD5 f12589b2a92c8688307b747a527688c0
BLAKE2b-256 a9b28ce77e82fcf951aa83fc7bdf9cae3224324e498262c39a7f2cc40215eb7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page