Skip to main content

An asynchronous, non-blocking hierarchical memory framework for LLM agents that eliminates main-thread latency and mitigates context degradation via multi-tiered background compression and deterministic entity ledgers.

Project description

Sawtooth Memory

Automated Test Suite PyPI version Python Support License: MIT

A high-performance, non-blocking hierarchical memory framework for LLM Agents.

The Problem

Standard LLM memory systems (like LangChain's ConversationSummaryMemory) process conversation history sequentially on the main application thread. Every time a user sends a message, the entire application freezes while the system waits for an LLM to generate a new historical summary. Furthermore, these summaries suffer from the "Lost in the Middle" hallucination effect, frequently deleting specific UUIDs, names, or rules to save tokens.

The Solution

Sawtooth Memory eliminates this latency and data loss. It immediately stores the user's message and returns control to the application in milliseconds, offloading the heavy summarization to an asynchronous background worker. To prevent hallucinations, it extracts critical facts into an immutable ledger before summarizing.


Documentation

For deep architectural deep-dives, comprehensive API specifications, and advanced lifecycle configurations, please refer to the official documentation:

View Detailed Architecture & API Reference (DOCUMENTATION.md)


Architecture & Data Flow

1. The Non-Blocking Execution Model

  Standard Memory (Blocking)            Sawtooth Memory (Async)
  ──────────────────────────            ───────────────────────

  [ Application ]                       [ Application ]
         │                                     │
         ▼                                     ▼
  [ Save Context ]                      [ ContextManager ]
         │                                     │
         ▼                                     ├───────────────────┐ (Instant Return)
  [ LLM Summarizes ]                           ▼                   ▼
  (App freezes for 5-10s)               [ Next User Turn ]  [ Background Worker ]
         │                                                         │
         ▼                                                         ▼
  [ Next User Turn ]                                        [ LLM Summarizes ]

2. The Hierarchical Memory Stack

When your agent is ready to respond, Sawtooth stitches together an optimized context payload from distinct layers, ensuring critical facts are never summarized away.

    Agent Loop
        │
        ▼
┌─────────────────────┐
│   ContextManager    │
│  ┌───────────────┐  │
│  │ L0 System     │  │  immutable persona + tool schemas
│  │ L2 Archive    │  │  compressed narrative memory
│  │ L1.5 Entities │  │  exact IDs, paths, UUIDs
│  │ L1 Working    │  │  recent raw conversation
│  └───────────────┘  │
└──────────┬──────────┘
           │
           ▼
     build_prompt()
           │
           ▼
        LLM API

Performance Benchmarks

By moving compression to the background, Sawtooth achieves massive latency reductions on the main thread while maintaining 100% recall accuracy.

Local GPU Benchmark (NVIDIA RTX 5060 | Model: phi4-mini | 20-Message Conversation)

Performance Metric Standard Summary Memory Sawtooth Hierarchical Architectural Advantage
Main Thread Latency 64.15 seconds 5.70 seconds 11.3x Faster Execution
Final Prompt Payload 506 tokens 454 tokens 10% Lower Token Cost
UUID / Fact Recall Variable / Hallucinates 100% Retained Guaranteed via L1.5 Ledger

For full methodology, cloud comparisons, and reproducibility steps, view our Read the Performance Benchmarks.


Installation

pip install sawtooth-memory

Optional dependencies for cloud providers:

pip install langchain-openai langchain-anthropic langchain-google-genai

Quickstart

1. The Standard Agent Loop

Initialize the ContextManager and let the background worker handle the heavy lifting. Sawtooth is universally compatible with local air-gapped models (Ollama) and cloud APIs.

import asyncio
from sawtooth_memory import ContextManager, ContextManagerConfig
from sawtooth_memory.config import OllamaConfig

async def main():
    config = ContextManagerConfig(
        soft_limit_tokens=1000,
        hard_limit_tokens=2000,
        ollama=OllamaConfig(base_url="http://localhost:11434", model="phi4")
    )

    async with ContextManager(system_prompt="You are a helpful assistant.", config=config) as cm:

        # 1. Instantly ingest messages (Main thread is never blocked)
        await cm.add_message("user", "My transaction ID is txn_998877_alpha")
        await cm.add_message("assistant", "I have noted your transaction ID.")

        # 2. Build the optimized prompt to send to your main LLM
        prompt = cm.build_prompt()
        print(prompt)

if __name__ == "__main__":
    asyncio.run(main())

2. Recall Explainability Traces

Sawtooth eliminates the "black-box" of agent memory by providing deterministic audit trails. You can query the memory system to see exactly why a fact was retained in the prompt.

trace = cm.explain_prompt()

import json
print(json.dumps(trace, indent=2))

Output:

{
  "system_prompt": "You are a helpful assistant.",
  "l2_summary_lineage": [
    "User initiated troubleshooting for router.",
    "User provided MAC address."
  ],
  "l1_5_entities": [
    {
      "key": "user_transaction_id",
      "value": "txn_998877_alpha",
      "origin": "Anchored via L1.5 explicit instruction"
    }
  ],
  "l1_active_messages": 4,
  "total_tokens": 342
}

3. Integrations: LangGraph

Sawtooth provides a native SawtoothMemorySaver adapter, acting as a drop-in checkpointer replacement for LangGraph architectures.

from langgraph.graph import StateGraph
from sawtooth_memory.integrations.langgraph import SawtoothMemorySaver

graph_builder = StateGraph(State)
# ... add nodes and edges ...

memory_saver = SawtoothMemorySaver(cm)
graph = graph_builder.compile(checkpointer=memory_saver)

Roadmap

  • Phase 1: Core Architecture

  • L1/L2 Hierarchical Buffer

  • Asynchronous Background Worker

  • Local (Ollama) & Cloud compatibility

  • Phase 2: Observability & Telemetry

  • EventBus Subsystem

  • Explainability Traces

  • Persistent JSONL Auditing Journal

  • Performance Benchmarking Harness

  • Phase 3: Advanced Architectures (Up Next)

  • Multi-Agent Memory Pooling (Shared contextual state)

  • Semantic Vector L3 Archival Memory (RAG integration)

  • Redis/Postgres Adapter for Distributed Deployments


Contributing

We welcome pull requests. See our CONTRIBUTING.md for guidelines on how to run the test suite and ensure code quality.


License

This project is licensed under the MIT License - see the LICENSE.md file for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sawtooth_memory-0.2.0.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sawtooth_memory-0.2.0-py3-none-any.whl (44.4 kB view details)

Uploaded Python 3

File details

Details for the file sawtooth_memory-0.2.0.tar.gz.

File metadata

  • Download URL: sawtooth_memory-0.2.0.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for sawtooth_memory-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c225f42b46eb9e7f2fa8144411389dc580611928a5966bef281b1d4a3cd2a8df
MD5 be9dcabb47327723a98d6119b9c0103c
BLAKE2b-256 39615fcfa1b31e9d5cbe77636ca626b2ec95de7e463db9c3fb06746c1f272fff

See more details on using hashes here.

File details

Details for the file sawtooth_memory-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sawtooth_memory-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 735df7a7b28432ca0bdc421b5cca23c3e52dac7af139d54d8b1dd9bf447d7eec
MD5 5de14624ef96fb04c84219ee3240e252
BLAKE2b-256 3dcfdfa9bd5596abbe82386995502116bce9b8edc18b9e33d141d4177763c6bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page