Gleanr - Session-scoped memory layer for AI agents
Project description
Gleanr — Agent Context Management System
Session-scoped memory for AI agents that actually remembers.
Gleanr is a Python SDK that gives your AI agents persistent, structured memory across conversations. Unlike RAG systems that retrieve external knowledge, Gleanr manages the agent's internal state—what it decided, what constraints it discovered, what failed, and what the user prefers.
from gleanr import Gleanr
# Initialize with your session
gleanr = Gleanr(session_id="user_123", storage=storage, embedder=embedder)
await gleanr.initialize()
# Ingest conversation turns
await gleanr.ingest("user", "Let's use PostgreSQL for the database")
await gleanr.ingest("assistant", "Decision: We'll use PostgreSQL for its robust JSON support")
# Recall relevant context (token-budgeted)
context = await gleanr.recall("What database are we using?", token_budget=2000)
# Returns: [ContextItem(content="Decision: We'll use PostgreSQL...", markers=["decision"], ...)]
Why Gleanr?
Current LLM applications treat agent memory as a search problem. But agent memory is not knowledge retrieval:
| Aspect | Knowledge Retrieval (RAG) | Agent Memory (Gleanr) |
|---|---|---|
| Scope | External corpus | Internal session state |
| Lifespan | Persistent | Session-bound with decay |
| Trigger | Explicit queries | Every turn, automatically |
| Content | Documents, facts | Decisions, constraints, outcomes |
After 30-40 turns, agents without proper memory:
- Forget decisions made earlier ("Didn't we decide to use PostgreSQL?")
- Repeat failed approaches
- Lose track of user preferences
- Contradict themselves
Gleanr solves this by automatically tracking what matters and recalling it when relevant.
Key Features
- Automatic marker detection — Identifies decisions, constraints, failures, and goals in conversation
- Token-budgeted recall — Always fits in your context window
- Episode management — Groups related turns, triggers reflection on close
- LLM reflection with consolidation — Extracts durable facts and keeps them accurate as requirements evolve
- Fact supersession — When facts change, old versions are superseded (not deleted), maintaining an audit trail
- Deduplication — Embedding-based duplicate detection prevents redundant facts
- Contradiction detection — Consolidation prompt identifies and resolves conflicting facts
- Observability — Built-in reflection tracing for debugging and monitoring
- Pluggable storage — SQLite for persistence, in-memory for testing
- Provider agnostic — Works with OpenAI, Anthropic, Ollama, or any embedder
- Evaluation harness — Automated testing across 6 scenarios with latency profiling
Installation
# Core package (in-memory storage, no provider dependencies)
pip install gleanr
# With specific extras
pip install "gleanr[sqlite]" # SQLite storage backend
pip install "gleanr[openai]" # OpenAI provider
pip install "gleanr[anthropic]" # Anthropic provider
pip install "gleanr[all]" # All optional dependencies
For development:
git clone https://github.com/Saket-Kr/gleanr.git
cd gleanr
pip install -e ".[dev]"
Quick Start
1. Basic Usage
import asyncio
from gleanr import Gleanr, GleanrConfig
from gleanr.storage import InMemoryBackend
async def main():
# Setup
storage = InMemoryBackend()
gleanr = Gleanr(
session_id="demo",
storage=storage,
embedder=your_embedder, # See Providers section
)
await gleanr.initialize()
# Conversation
await gleanr.ingest("user", "I need help building a REST API")
await gleanr.ingest("assistant", "I'll help you build a REST API. Decision: We'll use FastAPI for its automatic OpenAPI docs.")
# Later in conversation...
context = await gleanr.recall("What framework are we using?")
print(context[0].content) # "Decision: We'll use FastAPI..."
await gleanr.close()
asyncio.run(main())
2. With SQLite Persistence
from gleanr.storage import get_sqlite_backend
SQLiteBackend = get_sqlite_backend()
storage = SQLiteBackend("./agent_memory.db")
gleanr = Gleanr(
session_id="user_123",
storage=storage,
embedder=embedder,
)
Sessions persist across restarts. Resume anytime with the same session_id.
3. With Reflection and Consolidation
from gleanr import GleanrConfig
from gleanr.core.config import ReflectionConfig
config = GleanrConfig(
reflection=ReflectionConfig(
enabled=True,
min_episode_turns=2,
max_facts_per_episode=10,
dedup_similarity_threshold=0.95, # Prevent duplicate facts
)
)
gleanr = Gleanr(
session_id="demo",
storage=storage,
embedder=embedder,
reflector=your_reflector, # LLM-based fact extractor
config=config,
)
When episodes close, Gleanr reflects on the conversation and extracts durable facts. On subsequent episodes, consolidation kicks in — existing facts are sent alongside new turns, and the reflector returns actions (keep/update/add/remove) to keep facts accurate:
Episode 1 → Reflects → "Database is PostgreSQL", "API style is REST"
Episode 2 → User says "switch to MySQL"
→ Consolidates → UPDATE "Database is MySQL" (supersedes PostgreSQL fact)
→ KEEP "API style is REST"
The old "PostgreSQL" fact is preserved with a superseded_by pointer for audit trail, but only the current "MySQL" fact appears in recall results.
Short episode carry-forward: If an episode has fewer turns than min_episode_turns, those turns are buffered and included in the next episode's reflection. No data is ever silently dropped.
Memory Model
Gleanr uses a three-level memory hierarchy:
L0: Raw Turns
Every message in the conversation. Short-lived, used for immediate context.
L1: Episodes
Groups of related turns around a goal or task. Automatically detected via:
- Turn count thresholds
- Time gaps between messages
- Topic boundaries
- Tool result patterns
L2: Semantic Facts
Extracted from episodes via LLM reflection. Captures:
- Decisions — Choices made and their rationale
- Constraints — Limitations discovered
- Failures — What didn't work (to avoid repeating)
- Goals — User objectives
4. Observability (Reflection Tracing)
from gleanr import Gleanr, ReflectionTrace
def on_trace(trace: ReflectionTrace):
print(f"Reflection on episode {trace.episode_id} ({trace.mode})")
print(f" Input: {trace.input_turn_count} turns")
if trace.prior_facts:
print(f" Prior facts: {len(trace.prior_facts)}")
print(f" Saved: {len(trace.saved_facts)} facts")
print(f" Superseded: {len(trace.superseded_facts)} facts")
print(f" Elapsed: {trace.elapsed_ms}ms")
gleanr = Gleanr(session_id="demo", storage=storage, embedder=embedder, reflector=reflector)
gleanr.set_trace_callback(on_trace)
await gleanr.initialize()
Traces capture the full reflection pipeline: input turns, prior facts, scoped facts, raw LLM output (actions or facts), saved facts, and superseded facts. Use trace.to_dict() for JSON serialization.
Markers
Gleanr uses markers to signal importance. They're auto-detected or manually specified:
# Auto-detected from content
await gleanr.ingest("assistant", "Decision: We'll use React for the frontend")
# Marker "decision" auto-detected
# Manually specified
await gleanr.ingest("user", "Important: Never use eval() in this codebase", markers=["constraint"])
Built-in marker types:
decision— Choices madeconstraint— Limitations/requirementsfailure— Things that didn't workgoal— Objectives to achievecustom:*— Your own markers
Marked content gets priority in recall and influences fact extraction.
Recall
Recall is automatic and token-budgeted:
context = await gleanr.recall(
query="authentication",
token_budget=2000, # Max tokens to return
)
for item in context:
print(f"[{item.role}] {item.content}")
print(f" Score: {item.score}, Markers: {item.markers}")
Recall prioritizes:
- High-relevance semantic matches
- Marked content (decisions, constraints, etc.)
- Current episode turns
- L2 facts from past episodes
Providers
Embeddings
OpenAI:
from gleanr.providers.openai import OpenAIEmbedder
embedder = OpenAIEmbedder(api_key="sk-...")
Anthropic:
from gleanr.providers.anthropic import AnthropicEmbedder
embedder = AnthropicEmbedder(api_key="sk-ant-...")
Ollama (local):
# See examples/test_agent/llm.py for implementation
embedder = OllamaEmbedder(client)
Custom:
from gleanr.providers import Embedder
class MyEmbedder(Embedder):
async def embed(self, texts: list[str]) -> list[list[float]]:
# Your implementation
...
@property
def dimension(self) -> int:
return 384
Reflection
Reflection requires an LLM to extract facts:
from gleanr.providers.openai import OpenAIReflector
reflector = OpenAIReflector(api_key="sk-...")
Or implement your own:
from gleanr.providers import Reflector
class MyReflector(Reflector):
async def reflect(self, episode, turns) -> list[Fact]:
# Call your LLM to extract facts
...
Test Agent
Gleanr includes a fully functional test agent powered by Ollama for interactive experimentation.
Setup
# Install example dependencies
pip install -e ".[examples]"
# Start Ollama (if not running)
ollama serve
# Pull required models
ollama pull mistral:7b-instruct
ollama pull nomic-embed-text
Run
# Start a new session
python -m examples.test_agent.run --session my_test
# Resume an existing session
python -m examples.test_agent.run --session my_test
# Debug mode (shows recall items and Gleanr timings)
python -m examples.test_agent.run --session my_test --debug
Commands:
/stats— Show session statistics (turns, episodes, facts)/recall <query>— Test recall directly/episode— Close current episode (triggers reflection)/debug— Toggle debug mode/help— Show all commands/quit— Exit
Evaluation Harness
Gleanr ships with an automated evaluation framework for measuring memory accuracy and latency across multi-turn conversations.
Quick Test
# Sanity check — 1 iteration, 10 turns
python -m examples.evaluation.run --quick
Full Evaluation
# Default: 80 sessions across 8 turn counts (10-80), decision_tracking scenario
python -m examples.evaluation.run
# Test consolidation accuracy with the progressive_requirements scenario
python -m examples.evaluation.run --scenario progressive_requirements --quick
# Custom configuration
python -m examples.evaluation.run \
--scenario progressive_requirements \
--turns 10,20,30,40 \
--iterations 5 \
--max-concurrent 3 \
--verbose
# List all scenarios
python -m examples.evaluation.run --list-scenarios
Available Scenarios
| Scenario | Tests |
|---|---|
decision_tracking |
Recall of architectural decisions over time |
constraint_awareness |
Recall of constraints when relevant |
failure_memory |
Avoiding repeated failures |
multi_fact_tracking |
Independent recall of multiple facts |
goal_tracking |
Persistence of goals and objectives |
progressive_requirements |
Fact updates via consolidation — probes check updated facts, not originals |
Reports are saved as JSON and Markdown in ./evaluation_output/.
Configuration
from gleanr import GleanrConfig
from gleanr.core.config import EpisodeBoundaryConfig, RecallConfig, ReflectionConfig
config = GleanrConfig(
auto_detect_markers=True, # Auto-detect decision/constraint/etc.
episode_boundary=EpisodeBoundaryConfig(
max_turns=6, # Close episode after N turns
max_time_gap_seconds=1800, # Close after 30min gap
close_on_tool_result=True, # Close after tool completion
),
recall=RecallConfig(
default_token_budget=4000,
current_episode_budget_pct=0.2, # 20% budget for current episode
min_relevance_threshold=0.3, # Filter low-relevance facts from recall
),
reflection=ReflectionConfig(
enabled=True,
min_episode_turns=2,
max_facts_per_episode=10,
max_active_facts=100, # Archive excess facts automatically
consolidation_similarity_threshold=0.15, # Scoping threshold for prior facts
consolidation_max_unscoped_facts=50, # Skip scoping below this count
dedup_similarity_threshold=0.95, # Duplicate detection threshold
),
)
API Reference
Gleanr Class
class Gleanr:
async def initialize() -> None
async def ingest(role: str, content: str, markers: list[str] = None) -> Turn
async def recall(query: str, token_budget: int = None) -> list[ContextItem]
async def close_episode(reason: str = "manual") -> str | None
async def get_session_stats() -> SessionStats
async def close() -> None
Models
@dataclass
class Turn:
id: str
session_id: str
episode_id: str
role: Role
content: str
markers: list[str]
token_count: int
created_at: datetime
@dataclass
class ContextItem:
content: str
role: Role
markers: list[str]
score: float
token_count: int
source_type: str # "turn", "fact"
source_id: str
Design Philosophy
Gleanr follows these principles:
-
Store conclusions, not evidence — Don't store raw RAG results or chain-of-thought. Store what was decided and why.
-
Memory is always-on — Unlike tools that are invoked, memory recall happens every turn automatically.
-
Token budgets are hard limits — Never exceed the budget. Gracefully degrade by dropping lower-priority items.
-
Episodes are mandatory — All turns belong to episodes. This enables reflection and provides natural grouping.
-
Reflection is optional but valuable — The system works without it (L1 episodes remain functional), but L2 facts dramatically improve long-term recall.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=gleanr
# Type checking
mypy gleanr
Roadmap
- Consolidating reflection — Facts update as requirements change
- Deduplication — Embedding-based duplicate prevention
- Contradiction detection — Resolve conflicting facts during consolidation
- Observability — Reflection tracing with full input/output visibility
- Evaluation harness — Automated accuracy and latency testing
- L3 Themes — Cross-episode patterns and user profiles
- Async reflection queue — Non-blocking fact extraction
- Multi-agent support — Shared memory across agents
- Cloud storage backends — Redis, PostgreSQL
License
MIT License — See LICENSE for details.
Contributing
Contributions welcome! Please read the design docs in PLAN.md to understand the architecture before submitting PRs.
Gleanr — Because agents should remember what matters.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gleanr-0.3.0.tar.gz.
File metadata
- Download URL: gleanr-0.3.0.tar.gz
- Upload date:
- Size: 117.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b280c467080e91ea9f8628c60814a58159e559ae3c02e1894b9cc69593d74496
|
|
| MD5 |
1d58c799b282c60159538ebb613dff7f
|
|
| BLAKE2b-256 |
a6371b3a6f3be8a90036d895a12b0ac4b8fcee1b2445650c799101a79b09ac37
|
File details
Details for the file gleanr-0.3.0-py3-none-any.whl.
File metadata
- Download URL: gleanr-0.3.0-py3-none-any.whl
- Upload date:
- Size: 66.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
671778b0f264950778529dcab100635b453683f51724945cc09352f5e00bb9d4
|
|
| MD5 |
da1c37f8393d9ab0b7f349d1d210cc4d
|
|
| BLAKE2b-256 |
60d49d438247bba4da55fb67a80653a8cbe85572367f3ec5eacf4bbf73175d74
|