Gleanr - Session-scoped memory layer for AI agents

These details have not been verified by PyPI

Project links

Project description

Gleanr — Agent Context Management System

Session-scoped memory for AI agents that actually remembers.

Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's internal state—what it decided, what constraints it discovered, what failed, and what the user prefers.

from gleanr import Gleanr

# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()

# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")

# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]

Why Gleanr?

Current LLM applications treat agent memory as a search problem. But agent memory is not knowledge retrieval:

Aspect	Knowledge Retrieval (RAG)	Agent Memory (Gleanr)
Scope	External corpus	Internal session state
Lifespan	Persistent	Session-bound with decay
Trigger	Explicit queries	Every turn, automatically
Content	Documents, facts	Decisions, constraints, outcomes

After 30-40 turns, agents without proper memory:

Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
Repeat failed approaches
Lose track of user preferences
Contradict themselves

Gleanr solves this by automatically tracking what matters and recalling it when relevant.

Key Features

Automatic marker detection — Identifies decisions, constraints, failures, and goals in conversation
Token-budgeted recall — Always fits in your context window
Episode management — Groups related turns, triggers reflection on close
LLM reflection with consolidation — Extracts durable facts from episodes and keeps them accurate as requirements evolve
Staleness management — Consolidation detects changes first; facts describe current state, never carry stale references. Old versions are superseded (not deleted), maintaining an audit trail
Two-level deduplication — Store-level dedup supersedes paraphrases after reflection; recall-time dedup filters near-duplicates before budget allocation
Contradiction detection — Consolidation prompt identifies changes first and resolves conflicting facts
Observability — Built-in reflection tracing for debugging and monitoring
Pluggable storage — SQLite for persistence, in-memory for testing
Provider agnostic — Works with OpenAI, Anthropic, Ollama, or any embedder
Evaluation harness — Automated testing across 6 scenarios with latency profiling

Installation

# Core package (in-memory storage, no provider dependencies)
pip install gleanr

# With specific extras
pip install "gleanr[sqlite]"         # SQLite storage backend
pip install "gleanr[openai]"         # OpenAI provider
pip install "gleanr[anthropic]"      # Anthropic provider
pip install "gleanr[all]"            # All optional dependencies

For development:

git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"

Quick Start

1. Basic Usage

import asyncio
from gleanr import Gleanr
from gleanr.storage import InMemoryBackend

async def main():
    gleanr = Gleanr(
        session_id="demo",
        storage=InMemoryBackend(),
        embedder=your_embedder,    # See Providers section
        reflector=your_reflector,  # LLM-based fact extraction
    )
    await gleanr.initialize()

    # Ingest conversation turns
    await gleanr.ingest("user", "I need help building a REST API")
    await gleanr.ingest("assistant", "Decision: We'll use FastAPI for its automatic OpenAPI docs.")

    # Recall relevant context (token-budgeted)
    context = await gleanr.recall("What framework are we using?")
    print(context[0].content)  # "We'll use FastAPI..."

    await gleanr.close()

asyncio.run(main())

Defaults work out of the box — no config needed. Gleanr automatically detects episode boundaries, extracts facts via reflection, deduplicates, and manages staleness.

2. With SQLite Persistence

from gleanr.storage import get_sqlite_backend

SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")

gleanr = Gleanr(
    session_id="user_123",
    storage=storage,
    embedder=embedder,
    reflector=reflector,
)

Sessions persist across restarts. Resume anytime with the same session_id.

3. How Reflection Works

When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, consolidation kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:

Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
         → Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
                        → KEEP "API style is REST"

The old "PostgreSQL" fact is preserved with a superseded_by pointer for audit trail, but only the current "MySQL" fact appears in recall results.

Short episode carry-forward: If an episode has fewer turns than min_episode_turns, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.

Memory Model

Gleanr uses a three-level memory hierarchy:

L0: Raw Turns

Every message in the conversation. Short-lived, used for immediate context.

L1: Episodes

Groups of related turns around a goal or task. Automatically detected via:

Turn count thresholds
Time gaps between messages
Topic boundaries
Tool result patterns

L2: Semantic Facts

Extracted from episodes via LLM reflection. Captures:

Decisions — Choices made and their rationale
Constraints — Limitations discovered
Failures — What didn't work (to avoid repeating)
Goals — User objectives

4. Observability (Reflection Tracing)

from gleanr import Gleanr, ReflectionTrace

def on_trace(trace: ReflectionTrace):
    print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
    print(f"  Input: {trace.input_turn_count} turns")
    if trace.prior_facts:
        print(f"  Prior facts: {len(trace.prior_facts)}")
    print(f"  Saved: {len(trace.saved_facts)} facts")
    print(f"  Superseded: {len(trace.superseded_facts)} facts")
    print(f"  Elapsed: {trace.elapsed_ms}ms")

gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()

Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use trace.to_dict() for JSON serialization.

Markers

Gleanr uses markers to signal importance. They're auto-detected or manually specified:

# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected

# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])

Built-in marker types:

decision — Choices made
constraint — Limitations/requirements
failure — Things that didn't work
goal — Objectives to achieve
custom:* — Your own markers

Marked content gets priority in recall and influences fact extraction.

Recall

Recall is automatic and token-budgeted:

context = await gleanr.recall(
    query="authentication",
    token_budget=2000,  # Max tokens to return
)

for item in context:
    print(f"[{item.role}] {item.content}")
    print(f"  Score: {item.score}, Markers: {item.markers}")

Recall prioritizes:

High-relevance semantic matches
Marked content (decisions, constraints, etc.)
Current episode turns
L2 facts from past episodes

Providers

Embeddings

OpenAI:

from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")

Anthropic:

from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")

Ollama (local):

# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)

Custom:

from gleanr.providers import Embedder

class MyEmbedder(Embedder):
    async def embed(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        ...

    @property
    def dimension(self) -> int:
        return 384

Reflection

Reflection requires an LLM to extract facts:

from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")

Or implement your own:

from gleanr.providers import Reflector

class MyReflector(Reflector):
    async def reflect(self, episode, turns) -> list[Fact]:
        # Call your LLM to extract facts
        ...

Test Agent

Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.

Setup

# Install example dependencies
pip install -e ".[examples]"

# Start Ollama (if not running)
ollama serve

# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text

Run

# Start a new session
python -m examples.test_agent.run --session my_test

# Resume an existing session
python -m examples.test_agent.run --session my_test

# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug

Commands:

/stats — Show session statistics (turns, episodes, facts)
/recall <query> — Test recall directly
/episode — Close current episode (triggers reflection)
/debug — Toggle debug mode
/help — Show all commands
/quit — Exit

Evaluation Harness

Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.

Quick Test

# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick

Full Evaluation

# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run

# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick

# Custom configuration
python -m examples.evaluation.run \
    --scenario progressive_requirements \
    --turns 10,20,30,40 \
    --iterations 5 \
    --max-concurrent 3 \
    --verbose

# List all scenarios
python -m examples.evaluation.run --list-scenarios

Available Scenarios

Scenario	Tests
`decision_tracking`	Recall of architectural decisions over time
`constraint_awareness`	Recall of constraints when relevant
`failure_memory`	Avoiding repeated failures
`multi_fact_tracking`	Independent recall of multiple facts
`goal_tracking`	Persistence of goals and objectives
`progressive_requirements`	Fact updates via consolidation — probes check updated facts, not originals

Reports are saved as JSON and Markdown in ./evaluation_output/.

Configuration

Defaults work for most use cases. You only need GleanrConfig if you want to tune behavior.

Common Tuning

from gleanr import GleanrConfig
from gleanr.core.config import RecallConfig, ReflectionConfig

config = GleanrConfig(
    recall=RecallConfig(
        default_token_budget=4000,     # Match to your LLM's context window
    ),
    reflection=ReflectionConfig(
        max_facts_per_episode=10,      # Increase for dense conversations
    ),
)

Setting	Default	When to change
`recall.default_token_budget`	4000	Your LLM can handle more/less context
`reflection.max_facts_per_episode`	10	Episodes are very dense or very sparse
`episode_boundary.max_turns`	6	Episodes are closing too early/late

Full Reference

All configuration options

from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig

config = GleanrConfig(
    auto_detect_markers=True,

    episode_boundary=EpisodeBoundaryConfig(
        max_turns=6,                # Close episode after N turns
        max_time_gap_seconds=1800,  # Close after 30min gap
        close_on_tool_result=True,  # Close after tool completion
    ),

    recall=RecallConfig(
        default_token_budget=4000,
        current_episode_budget_pct=0.2,  # Budget fraction for current episode
        min_relevance_threshold=0.3,     # Min embedding similarity for facts
        max_fact_candidates=20,          # Top-K facts after relevance filter
        facts_only_recall=True,          # Skip raw turns when facts exist
        current_episode_boost=0.2,       # Additive boost for current episode turns
        recall_dedup_threshold=0.85,     # Filter near-duplicate facts at recall
    ),

    reflection=ReflectionConfig(
        min_episode_turns=2,
        max_facts_per_episode=10,
        min_confidence=0.7,                       # Min confidence to save a fact
        max_active_facts=100,                     # Archive excess by confidence
        dedup_similarity_threshold=0.90,          # Save-time duplicate detection
        store_dedup_threshold=0.80,               # Post-reflection paraphrase dedup
        consolidation_similarity_threshold=0.15,  # Scoping for large fact sets
        consolidation_max_unscoped_facts=100,     # Send all facts below this count
        background=True,                          # Async reflection after episode close
    ),
)

API Reference

Gleanr Class

class Gleanr:
    async def initialize() -> None
    async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
    async def recall(query: str, token_budget: int = None) -> list[ContextItem]
    async def close_episode(reason: str = "manual") -> str | None
    async def get_session_stats() -> SessionStats
    async def close() -> None

Models

@dataclass
class Turn:
    id: str
    session_id: str
    episode_id: str
    role: Role
    content: str
    markers: list[str]
    token_count: int
    created_at: datetime

@dataclass
class ContextItem:
    content: str
    role: Role
    markers: list[str]
    score: float
    token_count: int
    source_type: str  # "turn", "fact"
    source_id: str

Design Philosophy

Gleanr follows these principles:

Store conclusions, not evidence — Don't store raw RAG results or chain-of-thought. Store what was decided and why.
Memory is always-on — Unlike tools that are invoked, memory recall happens every turn automatically.
Token budgets are hard limits — Never exceed the budget. Gracefully degrade by dropping lower-priority items.
Episodes are mandatory — All turns belong to episodes. This enables reflection and provides natural grouping.
Reflection is essential — L2 facts are the maintained, current-truth representation of session state. Without reflection, recall degrades significantly over long conversations.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=gleanr

# Type checking
mypy gleanr

Roadmap

Consolidating reflection — Facts update as requirements change
Deduplication — Embedding-based duplicate prevention
Contradiction detection — Resolve conflicting facts during consolidation
Observability — Reflection tracing with full input/output visibility
Evaluation harness — Automated accuracy and latency testing
L3 Themes — Cross-episode patterns and user profiles
Async reflection queue — Non-blocking fact extraction (background mode)
Multi-agent support — Shared memory across agents
Cloud storage backends — Redis, PostgreSQL

License

MIT License — See LICENSE for details.

Contributing

Contributions welcome! Please read the design docs in PLAN.md to understand the architecture before submitting PRs.

Gleanr — Because agents should remember what matters.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

Apr 4, 2026

0.4.0

Mar 31, 2026

0.3.0

Mar 17, 2026

0.2.0

Mar 6, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gleanr-0.5.0.tar.gz (123.5 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gleanr-0.5.0-py3-none-any.whl (68.9 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file gleanr-0.5.0.tar.gz.

File metadata

Download URL: gleanr-0.5.0.tar.gz
Upload date: Apr 4, 2026
Size: 123.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gleanr-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`a301f11b64339e5a4381c4b2dfce68f73568a3d008ec49fc7d05fcadc692a6d3`
MD5	`a0262d8bcba98dc5764c9fd5e9304f4d`
BLAKE2b-256	`5ca4f6a4f5dab9bb36bf3a8ee6b0a58c0878bd04a7db2613af9c5a8aa3a724cb`

See more details on using hashes here.

File details

Details for the file gleanr-0.5.0-py3-none-any.whl.

File metadata

Download URL: gleanr-0.5.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 68.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gleanr-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd8664b88c69daa170d85ac453f4649496087c629d15cd83a8b498a52f47026d`
MD5	`fb306e265bb5cb27eaf67b982fae3429`
BLAKE2b-256	`1a867854c867b49a1e1f95f1c3dfcedd09bda29cad47605faf9646490659a69c`

See more details on using hashes here.

gleanr 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gleanr — Agent Context Management System

Why Gleanr?

Key Features

Installation

Quick Start

1. Basic Usage

2. With SQLite Persistence

3. How Reflection Works

Memory Model

L0: Raw Turns

L1: Episodes

L2: Semantic Facts

4. Observability (Reflection Tracing)

Markers

Recall

Providers

Embeddings

Reflection

Test Agent

Setup

Run

Evaluation Harness

Quick Test

Full Evaluation

Available Scenarios

Configuration

Common Tuning

Full Reference

API Reference

Gleanr Class

Models

Design Philosophy

Development

Roadmap

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes