Gleanr - Session-scoped memory layer for AI agents
Project description
Gleanr — Agent Context Management System
Session-scoped memory for AI agents that actually remembers.
Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's internal state—what it decided, what constraints it discovered, what failed, and what the user prefers.
from gleanr import Gleanr
# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()
# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")
# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]
Why Gleanr?
Current LLM applications treat agent memory as a search problem. But agent memory is not knowledge retrieval:
| Aspect | Knowledge Retrieval (RAG) | Agent Memory (Gleanr) |
|---|---|---|
| Scope | External corpus | Internal session state |
| Lifespan | Persistent | Session-bound with decay |
| Trigger | Explicit queries | Every turn, automatically |
| Content | Documents, facts | Decisions, constraints, outcomes |
After 30-40 turns, agents without proper memory:
- Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
- Repeat failed approaches
- Lose track of user preferences
- Contradict themselves
Gleanr solves this by automatically tracking what matters and recalling it when relevant.
Key Features
- Automatic marker detection — Identifies decisions, constraints, failures, and goals in conversation
- Token-budgeted recall — Always fits in your context window
- Episode management — Groups related turns, triggers reflection on close
- LLM reflection with consolidation — Extracts durable facts from episodes and keeps them accurate as requirements evolve
- Staleness management — Consolidation detects changes first; facts describe current state, never carry stale references. Old versions are superseded (not deleted), maintaining an audit trail
- Two-level deduplication — Store-level dedup supersedes paraphrases after reflection; recall-time dedup filters near-duplicates before budget allocation
- Contradiction detection — Consolidation prompt identifies changes first and resolves conflicting facts
- Observability — Built-in reflection tracing for debugging and monitoring
- Pluggable storage — SQLite for persistence, in-memory for testing
- Provider agnostic — Works with OpenAI, Anthropic, Ollama, or any embedder
- Evaluation harness — Automated testing across 6 scenarios with latency profiling
Installation
# Core package (in-memory storage, no provider dependencies)
pip install gleanr
# With specific extras
pip install "gleanr[sqlite]" # SQLite storage backend
pip install "gleanr[openai]" # OpenAI provider
pip install "gleanr[anthropic]" # Anthropic provider
pip install "gleanr[all]" # All optional dependencies
For development:
git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"
Quick Start
1. Basic Usage
import asyncio
from gleanr import Gleanr
from gleanr.storage import InMemoryBackend
async def main():
gleanr = Gleanr(
session_id="demo",
storage=InMemoryBackend(),
embedder=your_embedder, # See Providers section
reflector=your_reflector, # LLM-based fact extraction
)
await gleanr.initialize()
# Ingest conversation turns
await gleanr.ingest("user", "I need help building a REST API")
await gleanr.ingest("assistant", "Decision: We'll use FastAPI for its automatic OpenAPI docs.")
# Recall relevant context (token-budgeted)
context = await gleanr.recall("What framework are we using?")
print(context[0].content) # "We'll use FastAPI..."
await gleanr.close()
asyncio.run(main())
Defaults work out of the box — no config needed. Gleanr automatically detects episode boundaries, extracts facts via reflection, deduplicates, and manages staleness.
2. With SQLite Persistence
from gleanr.storage import get_sqlite_backend
SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")
gleanr = Gleanr(
session_id="user_123",
storage=storage,
embedder=embedder,
reflector=reflector,
)
Sessions persist across restarts. Resume anytime with the same session_id.
3. How Reflection Works
When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, consolidation kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:
Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
→ Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
→ KEEP "API style is REST"
The old "PostgreSQL" fact is preserved with a superseded_by pointer for audit trail, but only the current "MySQL" fact appears in recall results.
Short episode carry-forward: If an episode has fewer turns than min_episode_turns, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.
Memory Model
Gleanr uses a three-level memory hierarchy:
L0: Raw Turns
Every message in the conversation. Short-lived, used for immediate context.
L1: Episodes
Groups of related turns around a goal or task. Automatically detected via:
- Turn count thresholds
- Time gaps between messages
- Topic boundaries
- Tool result patterns
L2: Semantic Facts
Extracted from episodes via LLM reflection. Captures:
- Decisions — Choices made and their rationale
- Constraints — Limitations discovered
- Failures — What didn't work (to avoid repeating)
- Goals — User objectives
4. Observability (Reflection Tracing)
from gleanr import Gleanr, ReflectionTrace
def on_trace(trace: ReflectionTrace):
print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
print(f" Input: {trace.input_turn_count} turns")
if trace.prior_facts:
print(f" Prior facts: {len(trace.prior_facts)}")
print(f" Saved: {len(trace.saved_facts)} facts")
print(f" Superseded: {len(trace.superseded_facts)} facts")
print(f" Elapsed: {trace.elapsed_ms}ms")
gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()
Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use trace.to_dict() for JSON serialization.
Markers
Gleanr uses markers to signal importance. They're auto-detected or manually specified:
# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected
# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])
Built-in marker types:
decision— Choices madeconstraint— Limitations/requirementsfailure— Things that didn't workgoal— Objectives to achievecustom:*— Your own markers
Marked content gets priority in recall and influences fact extraction.
Recall
Recall is automatic and token-budgeted:
context = await gleanr.recall(
query="authentication",
token_budget=2000, # Max tokens to return
)
for item in context:
print(f"[{item.role}] {item.content}")
print(f" Score: {item.score}, Markers: {item.markers}")
Recall prioritizes:
- High-relevance semantic matches
- Marked content (decisions, constraints, etc.)
- Current episode turns
- L2 facts from past episodes
Providers
Embeddings
OpenAI:
from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")
Anthropic:
from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")
Ollama (local):
# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)
Custom:
from gleanr.providers import Embedder
class MyEmbedder(Embedder):
async def embed(self, texts: list[str]) -> list[list[float]]:
# Your implementation
...
@property
def dimension(self) -> int:
return 384
Reflection
Reflection requires an LLM to extract facts:
from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")
Or implement your own:
from gleanr.providers import Reflector
class MyReflector(Reflector):
async def reflect(self, episode, turns) -> list[Fact]:
# Call your LLM to extract facts
...
Test Agent
Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.
Setup
# Install example dependencies
pip install -e ".[examples]"
# Start Ollama (if not running)
ollama serve
# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text
Run
# Start a new session
python -m examples.test_agent.run --session my_test
# Resume an existing session
python -m examples.test_agent.run --session my_test
# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug
Commands:
/stats— Show session statistics (turns, episodes, facts)/recall <query>— Test recall directly/episode— Close current episode (triggers reflection)/debug— Toggle debug mode/help— Show all commands/quit— Exit
Evaluation Harness
Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.
Quick Test
# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick
Full Evaluation
# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run
# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick
# Custom configuration
python -m examples.evaluation.run \
--scenario progressive_requirements \
--turns 10,20,30,40 \
--iterations 5 \
--max-concurrent 3 \
--verbose
# List all scenarios
python -m examples.evaluation.run --list-scenarios
Available Scenarios
| Scenario | Tests |
|---|---|
decision_tracking |
Recall of architectural decisions over time |
constraint_awareness |
Recall of constraints when relevant |
failure_memory |
Avoiding repeated failures |
multi_fact_tracking |
Independent recall of multiple facts |
goal_tracking |
Persistence of goals and objectives |
progressive_requirements |
Fact updates via consolidation — probes check updated facts, not originals |
Reports are saved as JSON and Markdown in ./evaluation_output/.
Configuration
Defaults work for most use cases. You only need GleanrConfig if you want to tune behavior.
Common Tuning
from gleanr import GleanrConfig
from gleanr.core.config import RecallConfig, ReflectionConfig
config = GleanrConfig(
recall=RecallConfig(
default_token_budget=4000, # Match to your LLM's context window
),
reflection=ReflectionConfig(
max_facts_per_episode=10, # Increase for dense conversations
),
)
| Setting | Default | When to change |
|---|---|---|
recall.default_token_budget |
4000 | Your LLM can handle more/less context |
reflection.max_facts_per_episode |
10 | Episodes are very dense or very sparse |
episode_boundary.max_turns |
6 | Episodes are closing too early/late |
Full Reference
All configuration options
from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig
config = GleanrConfig(
auto_detect_markers=True,
episode_boundary=EpisodeBoundaryConfig(
max_turns=6, # Close episode after N turns
max_time_gap_seconds=1800, # Close after 30min gap
close_on_tool_result=True, # Close after tool completion
),
recall=RecallConfig(
default_token_budget=4000,
current_episode_budget_pct=0.2, # Budget fraction for current episode
min_relevance_threshold=0.3, # Min embedding similarity for facts
max_fact_candidates=20, # Top-K facts after relevance filter
facts_only_recall=True, # Skip raw turns when facts exist
current_episode_boost=0.2, # Additive boost for current episode turns
recall_dedup_threshold=0.85, # Filter near-duplicate facts at recall
),
reflection=ReflectionConfig(
min_episode_turns=2,
max_facts_per_episode=10,
min_confidence=0.7, # Min confidence to save a fact
max_active_facts=100, # Archive excess by confidence
dedup_similarity_threshold=0.90, # Save-time duplicate detection
store_dedup_threshold=0.80, # Post-reflection paraphrase dedup
consolidation_similarity_threshold=0.15, # Scoping for large fact sets
consolidation_max_unscoped_facts=100, # Send all facts below this count
background=True, # Async reflection after episode close
),
)
API Reference
Gleanr Class
class Gleanr:
async def initialize() -> None
async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
async def recall(query: str, token_budget: int = None) -> list[ContextItem]
async def close_episode(reason: str = "manual") -> str | None
async def get_session_stats() -> SessionStats
async def close() -> None
Models
@dataclass
class Turn:
id: str
session_id: str
episode_id: str
role: Role
content: str
markers: list[str]
token_count: int
created_at: datetime
@dataclass
class ContextItem:
content: str
role: Role
markers: list[str]
score: float
token_count: int
source_type: str # "turn", "fact"
source_id: str
Design Philosophy
Gleanr follows these principles:
-
Store conclusions, not evidence — Don't store raw RAG results or chain-of-thought. Store what was decided and why.
-
Memory is always-on — Unlike tools that are invoked, memory recall happens every turn automatically.
-
Token budgets are hard limits — Never exceed the budget. Gracefully degrade by dropping lower-priority items.
-
Episodes are mandatory — All turns belong to episodes. This enables reflection and provides natural grouping.
-
Reflection is essential — L2 facts are the maintained, current-truth representation of session state. Without reflection, recall degrades significantly over long conversations.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=gleanr
# Type checking
mypy gleanr
Roadmap
- Consolidating reflection — Facts update as requirements change
- Deduplication — Embedding-based duplicate prevention
- Contradiction detection — Resolve conflicting facts during consolidation
- Observability — Reflection tracing with full input/output visibility
- Evaluation harness — Automated accuracy and latency testing
- L3 Themes — Cross-episode patterns and user profiles
- Async reflection queue — Non-blocking fact extraction (background mode)
- Multi-agent support — Shared memory across agents
- Cloud storage backends — Redis, PostgreSQL
License
MIT License — See LICENSE for details.
Contributing
Contributions welcome! Please read the design docs in PLAN.md to understand the architecture before submitting PRs.
Gleanr — Because agents should remember what matters.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gleanr-0.5.0.tar.gz.
File metadata
- Download URL: gleanr-0.5.0.tar.gz
- Upload date:
- Size: 123.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a301f11b64339e5a4381c4b2dfce68f73568a3d008ec49fc7d05fcadc692a6d3
|
|
| MD5 |
a0262d8bcba98dc5764c9fd5e9304f4d
|
|
| BLAKE2b-256 |
5ca4f6a4f5dab9bb36bf3a8ee6b0a58c0878bd04a7db2613af9c5a8aa3a724cb
|
File details
Details for the file gleanr-0.5.0-py3-none-any.whl.
File metadata
- Download URL: gleanr-0.5.0-py3-none-any.whl
- Upload date:
- Size: 68.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd8664b88c69daa170d85ac453f4649496087c629d15cd83a8b498a52f47026d
|
|
| MD5 |
fb306e265bb5cb27eaf67b982fae3429
|
|
| BLAKE2b-256 |
1a867854c867b49a1e1f95f1c3dfcedd09bda29cad47605faf9646490659a69c
|