Skip to main content

Gleanr - Session-scoped memory layer for AI agents

Project description

Gleanr — Agent Context Management System

Session-scoped memory for AI agents that actually remembers.

Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's internal state—what it decided, what constraints it discovered, what failed, and what the user prefers.

from gleanr import Gleanr

# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()

# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")

# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]

Why Gleanr?

Current LLM applications treat agent memory as a search problem. But agent memory is not knowledge retrieval:

Aspect Knowledge Retrieval (RAG) Agent Memory (Gleanr)
Scope External corpus Internal session state
Lifespan Persistent Session-bound with decay
Trigger Explicit queries Every turn, automatically
Content Documents, facts Decisions, constraints, outcomes

After 30-40 turns, agents without proper memory:

  • Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
  • Repeat failed approaches
  • Lose track of user preferences
  • Contradict themselves

Gleanr solves this by automatically tracking what matters and recalling it when relevant.

Key Features

  • Automatic marker detection — Identifies decisions, constraints, failures, and goals in conversation
  • Token-budgeted recall — Always fits in your context window
  • Episode management — Groups related turns, triggers reflection on close
  • LLM reflection with consolidation — Extracts durable facts and keeps them accurate as requirements evolve
  • Fact supersession — When facts change, old versions are superseded (not deleted), maintaining an audit trail
  • Deduplication — Embedding-based duplicate detection prevents redundant facts
  • Contradiction detection — Consolidation prompt identifies and resolves conflicting facts
  • Observability — Built-in reflection tracing for debugging and monitoring
  • Pluggable storage — SQLite for persistence, in-memory for testing
  • Provider agnostic — Works with OpenAI, Anthropic, Ollama, or any embedder
  • Evaluation harness — Automated testing across 6 scenarios with latency profiling

Installation

# Core package (in-memory storage, no provider dependencies)
pip install gleanr

# With specific extras
pip install "gleanr[sqlite]"         # SQLite storage backend
pip install "gleanr[openai]"         # OpenAI provider
pip install "gleanr[anthropic]"      # Anthropic provider
pip install "gleanr[all]"            # All optional dependencies

For development:

git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"

Quick Start

1. Basic Usage

import asyncio
from gleanr import Gleanr, GleanrConfig
from gleanr.storage import InMemoryBackend

async def main():
    # Setup
    storage = InMemoryBackend()

    gleanr = Gleanr(
        session_id="demo",
        storage=storage,
        embedder=your_embedder,  # See Providers section
    )
    await gleanr.initialize()

    # Conversation
    await gleanr.ingest("user", "I need help building a REST API")
    await gleanr.ingest("assistant", "I'll help you build a REST API. Decision: We'll use FastAPI for its automatic OpenAPI docs.")

    # Later in conversation...
    context = await gleanr.recall("What framework are we using?")
    print(context[0].content)  # "Decision: We'll use FastAPI..."

    await gleanr.close()

asyncio.run(main())

2. With SQLite Persistence

from gleanr.storage import get_sqlite_backend

SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")

gleanr = Gleanr(
    session_id="user_123",
    storage=storage,
    embedder=embedder,
)

Sessions persist across restarts. Resume anytime with the same session_id.

3. With Reflection and Consolidation

from gleanr import GleanrConfig
from gleanr.core.config import ReflectionConfig

config = GleanrConfig(
    reflection=ReflectionConfig(
        enabled=True,
        min_episode_turns=2,
        max_facts_per_episode=10,
        dedup_similarity_threshold=0.95,  # Prevent duplicate facts
    )
)

gleanr = Gleanr(
    session_id="demo",
    storage=storage,
    embedder=embedder,
    reflector=your_reflector,  # LLM-based fact extractor
    config=config,
)

When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, consolidation kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:

Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
         → Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
                        → KEEP "API style is REST"

The old "PostgreSQL" fact is preserved with a superseded_by pointer for audit trail, but only the current "MySQL" fact appears in recall results.

Short episode carry-forward: If an episode has fewer turns than min_episode_turns, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.

Memory Model

Gleanr uses a three-level memory hierarchy:

L0: Raw Turns

Every message in the conversation. Short-lived, used for immediate context.

L1: Episodes

Groups of related turns around a goal or task. Automatically detected via:

  • Turn count thresholds
  • Time gaps between messages
  • Topic boundaries
  • Tool result patterns

L2: Semantic Facts

Extracted from episodes via LLM reflection. Captures:

  • Decisions — Choices made and their rationale
  • Constraints — Limitations discovered
  • Failures — What didn't work (to avoid repeating)
  • Goals — User objectives

4. Observability (Reflection Tracing)

from gleanr import Gleanr, ReflectionTrace

def on_trace(trace: ReflectionTrace):
    print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
    print(f"  Input: {trace.input_turn_count} turns")
    if trace.prior_facts:
        print(f"  Prior facts: {len(trace.prior_facts)}")
    print(f"  Saved: {len(trace.saved_facts)} facts")
    print(f"  Superseded: {len(trace.superseded_facts)} facts")
    print(f"  Elapsed: {trace.elapsed_ms}ms")

gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()

Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use trace.to_dict() for JSON serialization.

Markers

Gleanr uses markers to signal importance. They're auto-detected or manually specified:

# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected

# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])

Built-in marker types:

  • decision — Choices made
  • constraint — Limitations/requirements
  • failure — Things that didn't work
  • goal — Objectives to achieve
  • custom:* — Your own markers

Marked content gets priority in recall and influences fact extraction.

Recall

Recall is automatic and token-budgeted:

context = await gleanr.recall(
    query="authentication",
    token_budget=2000,  # Max tokens to return
)

for item in context:
    print(f"[{item.role}] {item.content}")
    print(f"  Score: {item.score}, Markers: {item.markers}")

Recall prioritizes:

  1. High-relevance semantic matches
  2. Marked content (decisions, constraints, etc.)
  3. Current episode turns
  4. L2 facts from past episodes

Providers

Embeddings

OpenAI:

from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")

Anthropic:

from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")

Ollama (local):

# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)

Custom:

from gleanr.providers import Embedder

class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        ...

    @property
    def dimension(self) -> int:
        return 384

Reflection

Reflection requires an LLM to extract facts:

from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")

Or implement your own:

from gleanr.providers import Reflector

class MyReflector(Reflector):
    async def reflect(self, episode, turns) -> list[Fact]:
        # Call your LLM to extract facts
        ...

Test Agent

Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.

Setup

# Install example dependencies
pip install -e ".[examples]"

# Start Ollama (if not running)
ollama serve

# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text

Run

# Start a new session
python -m examples.test_agent.run --session my_test

# Resume an existing session
python -m examples.test_agent.run --session my_test

# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug

Commands:

  • /stats — Show session statistics (turns, episodes, facts)
  • /recall <query> — Test recall directly
  • /episode — Close current episode (triggers reflection)
  • /debug — Toggle debug mode
  • /help — Show all commands
  • /quit — Exit

Evaluation Harness

Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.

Quick Test

# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick

Full Evaluation

# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run

# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick

# Custom configuration
python -m examples.evaluation.run \
    --scenario progressive_requirements \
    --turns 10,20,30,40 \
    --iterations 5 \
    --max-concurrent 3 \
    --verbose

# List all scenarios
python -m examples.evaluation.run --list-scenarios

Available Scenarios

Scenario Tests
decision_tracking Recall of architectural decisions over time
constraint_awareness Recall of constraints when relevant
failure_memory Avoiding repeated failures
multi_fact_tracking Independent recall of multiple facts
goal_tracking Persistence of goals and objectives
progressive_requirements Fact updates via consolidation — probes check updated facts, not originals

Reports are saved as JSON and Markdown in ./evaluation_output/.

Configuration

from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig

config = GleanrConfig(
    auto_detect_markers=True,  # Auto-detect decision/constraint/etc.

    episode_boundary=EpisodeBoundaryConfig(
        max_turns=6,                # Close episode after N turns
        max_time_gap_seconds=1800,  # Close after 30min gap
        close_on_tool_result=True,  # Close after tool completion
    ),

    recall=RecallConfig(
        default_token_budget=4000,
        current_episode_budget_pct=0.2,  # 20% budget for current episode
        min_relevance_threshold=0.3,     # Filter low-relevance facts from recall
    ),

    reflection=ReflectionConfig(
        enabled=True,
        min_episode_turns=2,
        max_facts_per_episode=10,
        max_active_facts=100,                     # Archive excess facts automatically
        consolidation_similarity_threshold=0.15,   # Scoping threshold for prior facts
        consolidation_max_unscoped_facts=50,       # Skip scoping below this count
        dedup_similarity_threshold=0.95,           # Duplicate detection threshold
    ),
)

API Reference

Gleanr Class

class Gleanr:
    async def initialize() -> None
    async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
    async def recall(query: str, token_budget: int = None) -> list[ContextItem]
    async def close_episode(reason: str = "manual") -> str | None
    async def get_session_stats() -> SessionStats
    async def close() -> None

Models

@dataclass
class Turn:
    id: str
    session_id: str
    episode_id: str
    role: Role
    content: str
    markers: list[str]
    token_count: int
    created_at: datetime

@dataclass
class ContextItem:
    content: str
    role: Role
    markers: list[str]
    score: float
    token_count: int
    source_type: str  # "turn", "fact"
    source_id: str

Design Philosophy

Gleanr follows these principles:

  1. Store conclusions, not evidence — Don't store raw RAG results or chain-of-thought. Store what was decided and why.

  2. Memory is always-on — Unlike tools that are invoked, memory recall happens every turn automatically.

  3. Token budgets are hard limits — Never exceed the budget. Gracefully degrade by dropping lower-priority items.

  4. Episodes are mandatory — All turns belong to episodes. This enables reflection and provides natural grouping.

  5. Reflection is optional but valuable — The system works without it (L1 episodes remain functional), but L2 facts dramatically improve long-term recall.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=gleanr

# Type checking
mypy gleanr

Roadmap

  • Consolidating reflection — Facts update as requirements change
  • Deduplication — Embedding-based duplicate prevention
  • Contradiction detection — Resolve conflicting facts during consolidation
  • Observability — Reflection tracing with full input/output visibility
  • Evaluation harness — Automated accuracy and latency testing
  • L3 Themes — Cross-episode patterns and user profiles
  • Async reflection queue — Non-blocking fact extraction
  • Multi-agent support — Shared memory across agents
  • Cloud storage backends — Redis, PostgreSQL

License

MIT License — See LICENSE for details.

Contributing

Contributions welcome! Please read the design docs in PLAN.md to understand the architecture before submitting PRs.


Gleanr — Because agents should remember what matters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gleanr-0.3.0.tar.gz (117.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gleanr-0.3.0-py3-none-any.whl (66.7 kB view details)

Uploaded Python 3

File details

Details for the file gleanr-0.3.0.tar.gz.

File metadata

  • Download URL: gleanr-0.3.0.tar.gz
  • Upload date:
  • Size: 117.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for gleanr-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b280c467080e91ea9f8628c60814a58159e559ae3c02e1894b9cc69593d74496
MD5 1d58c799b282c60159538ebb613dff7f
BLAKE2b-256 a6371b3a6f3be8a90036d895a12b0ac4b8fcee1b2445650c799101a79b09ac37

See more details on using hashes here.

File details

Details for the file gleanr-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gleanr-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 66.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for gleanr-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 671778b0f264950778529dcab100635b453683f51724945cc09352f5e00bb9d4
MD5 da1c37f8393d9ab0b7f349d1d210cc4d
BLAKE2b-256 60d49d438247bba4da55fb67a80653a8cbe85572367f3ec5eacf4bbf73175d74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page