Skip to main content

Gleanr - Session-scoped memory layer for AI agents

Project description

Gleanr — Agent Context Management System

Session-scoped memory for AI agents that actually remembers.

Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's internal state—what it decided, what constraints it discovered, what failed, and what the user prefers.

from gleanr import Gleanr

# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()

# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")

# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]

Why Gleanr?

Current LLM applications treat agent memory as a search problem. But agent memory is not knowledge retrieval:

Aspect Knowledge Retrieval (RAG) Agent Memory (Gleanr)
Scope External corpus Internal session state
Lifespan Persistent Session-bound with decay
Trigger Explicit queries Every turn, automatically
Content Documents, facts Decisions, constraints, outcomes

After 30-40 turns, agents without proper memory:

  • Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
  • Repeat failed approaches
  • Lose track of user preferences
  • Contradict themselves

Gleanr solves this by automatically tracking what matters and recalling it when relevant.

Key Features

  • Automatic marker detection — Identifies decisions, constraints, failures, and goals in conversation
  • Token-budgeted recall — Always fits in your context window
  • Episode management — Groups related turns, triggers reflection on close
  • LLM reflection with consolidation — Extracts durable facts from episodes and keeps them accurate as requirements evolve
  • Staleness management — Consolidation detects changes first; facts describe current state, never carry stale references. Old versions are superseded (not deleted), maintaining an audit trail
  • Two-level deduplication — Store-level dedup supersedes paraphrases after reflection; recall-time dedup filters near-duplicates before budget allocation
  • Contradiction detection — Consolidation prompt identifies changes first and resolves conflicting facts
  • Observability — Built-in reflection tracing for debugging and monitoring
  • Pluggable storage — SQLite for persistence, in-memory for testing
  • Provider agnostic — Works with OpenAI, Anthropic, Ollama, or any embedder
  • Evaluation harness — Automated testing across 6 scenarios with latency profiling

Installation

# Core package (in-memory storage, no provider dependencies)
pip install gleanr

# With specific extras
pip install "gleanr[sqlite]"         # SQLite storage backend
pip install "gleanr[openai]"         # OpenAI provider
pip install "gleanr[anthropic]"      # Anthropic provider
pip install "gleanr[all]"            # All optional dependencies

For development:

git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"

Quick Start

1. Basic Usage

import asyncio
from gleanr import Gleanr
from gleanr.storage import InMemoryBackend

async def main():
    gleanr = Gleanr(
        session_id="demo",
        storage=InMemoryBackend(),
        embedder=your_embedder,    # See Providers section
        reflector=your_reflector,  # LLM-based fact extraction
    )
    await gleanr.initialize()

    # Ingest conversation turns
    await gleanr.ingest("user", "I need help building a REST API")
    await gleanr.ingest("assistant", "Decision: We'll use FastAPI for its automatic OpenAPI docs.")

    # Recall relevant context (token-budgeted)
    context = await gleanr.recall("What framework are we using?")
    print(context[0].content)  # "We'll use FastAPI..."

    await gleanr.close()

asyncio.run(main())

Defaults work out of the box — no config needed. Gleanr automatically detects episode boundaries, extracts facts via reflection, deduplicates, and manages staleness.

2. With SQLite Persistence

from gleanr.storage import get_sqlite_backend

SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")

gleanr = Gleanr(
    session_id="user_123",
    storage=storage,
    embedder=embedder,
    reflector=reflector,
)

Sessions persist across restarts. Resume anytime with the same session_id.

3. How Reflection Works

When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, consolidation kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:

Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
         → Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
                        → KEEP "API style is REST"

The old "PostgreSQL" fact is preserved with a superseded_by pointer for audit trail, but only the current "MySQL" fact appears in recall results.

Short episode carry-forward: If an episode has fewer turns than min_episode_turns, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.

Memory Model

Gleanr uses a three-level memory hierarchy:

L0: Raw Turns

Every message in the conversation. Short-lived, used for immediate context.

L1: Episodes

Groups of related turns around a goal or task. Automatically detected via:

  • Turn count thresholds
  • Time gaps between messages
  • Topic boundaries
  • Tool result patterns

L2: Semantic Facts

Extracted from episodes via LLM reflection. Captures:

  • Decisions — Choices made and their rationale
  • Constraints — Limitations discovered
  • Failures — What didn't work (to avoid repeating)
  • Goals — User objectives

4. Observability (Reflection Tracing)

from gleanr import Gleanr, ReflectionTrace

def on_trace(trace: ReflectionTrace):
    print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
    print(f"  Input: {trace.input_turn_count} turns")
    if trace.prior_facts:
        print(f"  Prior facts: {len(trace.prior_facts)}")
    print(f"  Saved: {len(trace.saved_facts)} facts")
    print(f"  Superseded: {len(trace.superseded_facts)} facts")
    print(f"  Elapsed: {trace.elapsed_ms}ms")

gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()

Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use trace.to_dict() for JSON serialization.

Markers

Gleanr uses markers to signal importance. They're auto-detected or manually specified:

# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected

# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])

Built-in marker types:

  • decision — Choices made
  • constraint — Limitations/requirements
  • failure — Things that didn't work
  • goal — Objectives to achieve
  • custom:* — Your own markers

Marked content gets priority in recall and influences fact extraction.

Recall

Recall is automatic and token-budgeted:

context = await gleanr.recall(
    query="authentication",
    token_budget=2000,  # Max tokens to return
)

for item in context:
    print(f"[{item.role}] {item.content}")
    print(f"  Score: {item.score}, Markers: {item.markers}")

Recall prioritizes:

  1. High-relevance semantic matches
  2. Marked content (decisions, constraints, etc.)
  3. Current episode turns
  4. L2 facts from past episodes

Providers

Embeddings

OpenAI:

from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")

Anthropic:

from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")

Ollama (local):

# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)

Custom:

from gleanr.providers import Embedder

class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        ...

    @property
    def dimension(self) -> int:
        return 384

Reflection

Reflection requires an LLM to extract facts:

from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")

Or implement your own:

from gleanr.providers import Reflector

class MyReflector(Reflector):
    async def reflect(self, episode, turns) -> list[Fact]:
        # Call your LLM to extract facts
        ...

Test Agent

Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.

Setup

# Install example dependencies
pip install -e ".[examples]"

# Start Ollama (if not running)
ollama serve

# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text

Run

# Start a new session
python -m examples.test_agent.run --session my_test

# Resume an existing session
python -m examples.test_agent.run --session my_test

# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug

Commands:

  • /stats — Show session statistics (turns, episodes, facts)
  • /recall <query> — Test recall directly
  • /episode — Close current episode (triggers reflection)
  • /debug — Toggle debug mode
  • /help — Show all commands
  • /quit — Exit

Evaluation Harness

Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.

Quick Test

# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick

Full Evaluation

# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run

# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick

# Custom configuration
python -m examples.evaluation.run \
    --scenario progressive_requirements \
    --turns 10,20,30,40 \
    --iterations 5 \
    --max-concurrent 3 \
    --verbose

# List all scenarios
python -m examples.evaluation.run --list-scenarios

Available Scenarios

Scenario Tests
decision_tracking Recall of architectural decisions over time
constraint_awareness Recall of constraints when relevant
failure_memory Avoiding repeated failures
multi_fact_tracking Independent recall of multiple facts
goal_tracking Persistence of goals and objectives
progressive_requirements Fact updates via consolidation — probes check updated facts, not originals

Reports are saved as JSON and Markdown in ./evaluation_output/.

Configuration

Defaults work for most use cases. You only need GleanrConfig if you want to tune behavior.

Common Tuning

from gleanr import GleanrConfig
from gleanr.core.config import RecallConfig, ReflectionConfig

config = GleanrConfig(
    recall=RecallConfig(
        default_token_budget=4000,     # Match to your LLM's context window
    ),
    reflection=ReflectionConfig(
        max_facts_per_episode=10,      # Increase for dense conversations
    ),
)
Setting Default When to change
recall.default_token_budget 4000 Your LLM can handle more/less context
reflection.max_facts_per_episode 10 Episodes are very dense or very sparse
episode_boundary.max_turns 6 Episodes are closing too early/late

Full Reference

All configuration options
from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig

config = GleanrConfig(
    auto_detect_markers=True,

    episode_boundary=EpisodeBoundaryConfig(
        max_turns=6,                # Close episode after N turns
        max_time_gap_seconds=1800,  # Close after 30min gap
        close_on_tool_result=True,  # Close after tool completion
    ),

    recall=RecallConfig(
        default_token_budget=4000,
        current_episode_budget_pct=0.2,  # Budget fraction for current episode
        min_relevance_threshold=0.3,     # Min embedding similarity for facts
        max_fact_candidates=20,          # Top-K facts after relevance filter
        facts_only_recall=True,          # Skip raw turns when facts exist
        current_episode_boost=0.2,       # Additive boost for current episode turns
        recall_dedup_threshold=0.85,     # Filter near-duplicate facts at recall
    ),

    reflection=ReflectionConfig(
        min_episode_turns=2,
        max_facts_per_episode=10,
        min_confidence=0.7,                       # Min confidence to save a fact
        max_active_facts=100,                     # Archive excess by confidence
        dedup_similarity_threshold=0.90,          # Save-time duplicate detection
        store_dedup_threshold=0.80,               # Post-reflection paraphrase dedup
        consolidation_similarity_threshold=0.15,  # Scoping for large fact sets
        consolidation_max_unscoped_facts=100,     # Send all facts below this count
        background=True,                          # Async reflection after episode close
    ),
)

API Reference

Gleanr Class

class Gleanr:
    async def initialize() -> None
    async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
    async def recall(query: str, token_budget: int = None) -> list[ContextItem]
    async def close_episode(reason: str = "manual") -> str | None
    async def get_session_stats() -> SessionStats
    async def close() -> None

Models

@dataclass
class Turn:
    id: str
    session_id: str
    episode_id: str
    role: Role
    content: str
    markers: list[str]
    token_count: int
    created_at: datetime

@dataclass
class ContextItem:
    content: str
    role: Role
    markers: list[str]
    score: float
    token_count: int
    source_type: str  # "turn", "fact"
    source_id: str

Design Philosophy

Gleanr follows these principles:

  1. Store conclusions, not evidence — Don't store raw RAG results or chain-of-thought. Store what was decided and why.

  2. Memory is always-on — Unlike tools that are invoked, memory recall happens every turn automatically.

  3. Token budgets are hard limits — Never exceed the budget. Gracefully degrade by dropping lower-priority items.

  4. Episodes are mandatory — All turns belong to episodes. This enables reflection and provides natural grouping.

  5. Reflection is essential — L2 facts are the maintained, current-truth representation of session state. Without reflection, recall degrades significantly over long conversations.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=gleanr

# Type checking
mypy gleanr

Roadmap

  • Consolidating reflection — Facts update as requirements change
  • Deduplication — Embedding-based duplicate prevention
  • Contradiction detection — Resolve conflicting facts during consolidation
  • Observability — Reflection tracing with full input/output visibility
  • Evaluation harness — Automated accuracy and latency testing
  • L3 Themes — Cross-episode patterns and user profiles
  • Async reflection queue — Non-blocking fact extraction (background mode)
  • Multi-agent support — Shared memory across agents
  • Cloud storage backends — Redis, PostgreSQL

License

MIT License — See LICENSE for details.

Contributing

Contributions welcome! Please read the design docs in PLAN.md to understand the architecture before submitting PRs.


Gleanr — Because agents should remember what matters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gleanr-0.5.0.tar.gz (123.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gleanr-0.5.0-py3-none-any.whl (68.9 kB view details)

Uploaded Python 3

File details

Details for the file gleanr-0.5.0.tar.gz.

File metadata

  • Download URL: gleanr-0.5.0.tar.gz
  • Upload date:
  • Size: 123.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gleanr-0.5.0.tar.gz
Algorithm Hash digest
SHA256 a301f11b64339e5a4381c4b2dfce68f73568a3d008ec49fc7d05fcadc692a6d3
MD5 a0262d8bcba98dc5764c9fd5e9304f4d
BLAKE2b-256 5ca4f6a4f5dab9bb36bf3a8ee6b0a58c0878bd04a7db2613af9c5a8aa3a724cb

See more details on using hashes here.

File details

Details for the file gleanr-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: gleanr-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 68.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gleanr-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd8664b88c69daa170d85ac453f4649496087c629d15cd83a8b498a52f47026d
MD5 fb306e265bb5cb27eaf67b982fae3429
BLAKE2b-256 1a867854c867b49a1e1f95f1c3dfcedd09bda29cad47605faf9646490659a69c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page