Skip to main content

Intelligent context management for LangGraph chatbots using the Librarian select-then-hydrate architecture

Project description

๐Ÿ“š LangGraph Librarian

Intelligent context management for LangGraph chatbots.

The Librarian uses LLM reasoning to select the most relevant messages from conversation history โ€” dramatically reducing token usage while improving answer quality. Instead of feeding your agent the entire conversation (Brute Force) or relying on keyword similarity (RAG), the Librarian reasons about what's relevant.

How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    LangGraph Librarian Pipeline                    โ”‚
โ”‚                                                                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚   โ”‚ select & hydrate โ”‚โ”€โ”€โ–ถโ”‚  Agent  โ”‚โ”€โ”€โ–ถโ”‚ index new messages   โ”‚    โ”‚
โ”‚   โ”‚ (reads index,    โ”‚   โ”‚ (YOUR   โ”‚   โ”‚ (summarizes new msgs โ”‚    โ”‚
โ”‚   โ”‚  picks relevant  โ”‚   โ”‚  CODE)  โ”‚   โ”‚  for next turn)      โ”‚    โ”‚
โ”‚   โ”‚  messages)       โ”‚   โ”‚         โ”‚   โ”‚                      โ”‚    โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚        ONLINE โšก              โ”‚              OFFLINE ๐Ÿ”„             โ”‚
โ”‚   User waits for this    User sees       Runs after response       โ”‚
โ”‚                          response here   (zero latency impact)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The Select-Then-Hydrate Architecture

  1. Index (async, post-response): Each message is summarized into a โ‰ค3-sentence entry (~104 tokens vs ~328 tokens original). This runs after the agent responds, so the user never waits for it.

  2. Select (online): An LLM reasons over the summary index to pick which messages are relevant to the current query. This uses reasoning, not keyword matching - it can identify temporal dependencies like "Message 3 modifies Message 1".

  3. Hydrate (online, no LLM): Fetches the full original content for selected messages + always includes the most recent turns for conversational coherence.

Benchmarks

Tested on synthetic chat datasets with varying noise levels (150โ€“1000 words per message):

Metric Librarian Brute Force Vector RAG
Answer Success 82.2% 77.8% 57.8%
Context Success 80.0% 100% 68.9%
Context Tokens ~800 ~2180 ~1400
Cost (tokens) ~2800 ~4520 ~2950

The Librarian outperforms Brute Force on answer quality (by removing distracting noise) and beats RAG by 11% on context retrieval accuracy.

Installation

pip install langgraph-librarian

You'll also need a LangChain LLM provider:

# Pick one (or more):
pip install langchain-google-genai   # Google Gemini
pip install langchain-openai         # OpenAI
pip install langchain-anthropic      # Anthropic

Quick Start

from langchain_core.messages import HumanMessage, AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph_librarian import create_librarian_graph, LibrarianConfig

# 1. Configure models
config = LibrarianConfig(
    indexer_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),    # cheap & fast
    selector_model=ChatGoogleGenerativeAI(model="gemini-2.5-flash"),   # reasoning
)

# 2. Define your agent (only change: use state["curated_messages"])
def my_agent(state):
    context = state.get("curated_messages", state["messages"])
    llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
    response = llm.invoke(list(context))
    return {"messages": [response]}

# 3. Create and run
graph = create_librarian_graph(agent_fn=my_agent, config=config)
result = graph.invoke(
    {"messages": conversation, "query": "What was the budget?", "index": []},
    config={"configurable": {"librarian": config}},
)

Integration Patterns

Pattern 1: Wrap Your Agent (Easiest)

graph = create_librarian_graph(agent_fn=my_agent, config=config)

Creates a complete graph: select_and_hydrate โ†’ your_agent โ†’ index_messages

Pattern 2: Add Nodes to Your Graph (Flexible)

from langgraph_librarian import create_librarian_node, index_messages

builder = StateGraph(MyState)
builder.add_node("librarian", create_librarian_node(config))
builder.add_node("agent", my_agent)
builder.add_node("indexer", index_messages)
builder.add_edge(START, "librarian")
builder.add_edge("librarian", "agent")
builder.add_edge("agent", "indexer")
builder.add_edge("indexer", END)

Pattern 3: With Checkpointer (Persistence)

from langgraph.checkpoint.memory import MemorySaver

graph = create_librarian_graph(
    agent_fn=my_agent,
    config=config,
    checkpointer=MemorySaver(),  # Index persists automatically!
)

# Each thread maintains its own index
result = graph.invoke(
    {"messages": msgs, "query": q, "index": []},
    config={"configurable": {"thread_id": "user-123", "librarian": config}},
)

API Reference

Graph Builders

create_librarian_graph(agent_fn, config, *, checkpointer=None)

Creates a compiled LangGraph StateGraph that wraps your agent with Librarian nodes.

Parameters:

Parameter Type Description
agent_fn Callable[[dict], dict] Your agent node function. Receives the full state dict and should return a state update. The curated context is at state["curated_messages"].
config LibrarianConfig | None Librarian configuration. Can also be passed at runtime via the configurable.
checkpointer Any Optional LangGraph checkpointer for state persistence. The index in state is automatically persisted across turns.

Returns: A compiled StateGraph with the flow: START โ†’ select_and_hydrate โ†’ agent โ†’ index_messages โ†’ END

Example:

from langgraph_librarian import create_librarian_graph, LibrarianConfig
from langchain_google_genai import ChatGoogleGenerativeAI

config = LibrarianConfig(
    indexer_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
)

def my_agent(state):
    context = state.get("curated_messages", state["messages"])
    # ... generate response using context ...
    return {"messages": [response]}

graph = create_librarian_graph(agent_fn=my_agent, config=config)

create_librarian_node(config)

Returns a standalone select_and_hydrate callable for use as a node in your own StateGraph.

Parameters:

Parameter Type Description
config LibrarianConfig | None Librarian configuration.

Returns: Callable[[dict, RunnableConfig \| None], dict] โ€” a function suitable for builder.add_node().

Note: When using this, you must also add the index_messages node separately in your graph (after your agent node) for async indexing to work.

Example:

from langgraph_librarian import create_librarian_node, index_messages

librarian_node = create_librarian_node(config)

builder = StateGraph(MyState)
builder.add_node("librarian", librarian_node)
builder.add_node("agent", my_agent)
builder.add_node("indexer", index_messages)
builder.add_edge(START, "librarian")
builder.add_edge("librarian", "agent")
builder.add_edge("agent", "indexer")
builder.add_edge("indexer", END)

Node Functions

select_and_hydrate(state, config=None)

LangGraph node that performs the online selection and hydration step.

  1. Reads the pre-built summary index from state
  2. Calls the selector LLM to reason over the index and pick relevant message IDs
  3. Hydrates selected messages with full content + always includes recent turns
  4. On first turn (no index), passes through all messages unchanged

Parameters:

Parameter Type Description
state dict Current graph state. Reads messages, query, index.
config RunnableConfig | None LangGraph config. Expects LibrarianConfig at config["configurable"]["librarian"].

Returns: State update dict with:

Key Type Description
curated_messages list[BaseMessage] Selected + recent messages for the agent
selected_ids list[int] IDs selected by the Librarian
librarian_metadata dict Metrics (see Metadata Fields)

index_messages(state, config=None)

LangGraph node that performs offline incremental indexing.

Summarizes only new messages (messages not yet in the index), making it safe and efficient to run after every turn. Short messages (< short_message_threshold chars) are used as-is without an LLM call.

Parameters:

Parameter Type Description
state dict Current graph state. Reads messages, index.
config RunnableConfig | None LangGraph config. Expects LibrarianConfig at config["configurable"]["librarian"].

Returns: State update dict with:

Key Type Description
index list[IndexEntry] Updated index with new entries appended
librarian_metadata dict Indexing metrics (indexing_new_count, indexing_total_count, indexing_latency_ms)

Configuration

LibrarianConfig

Dataclass controlling the Librarian pipeline behavior.

Field Type Default Description
indexer_model BaseChatModel Required LLM for summarizing messages. Should be fast and cheap (e.g., Gemini Flash, GPT-4o-mini).
selector_model BaseChatModel | None None LLM for selection reasoning. Should be reasoning-capable. Falls back to indexer_model if not set.
always_include_recent_turns int 2 Number of recent user turns to always include regardless of selection. Ensures conversational coherence.
short_message_threshold int 200 Messages shorter than this (in characters) skip summarization and are used as-is.
summary_prompt str | None None Custom prompt for the summarization step. Uses a battle-tested default if not set.
selection_prompt str | None None Custom prompt template for the selection step. Must contain {query} and {index} placeholders.
store BaseIndexStore | None None Optional external index store. If not set, index is persisted via the LangGraph checkpointer (recommended).

Methods:

Method Returns Description
get_indexer_model() BaseChatModel Returns the indexer model. Raises ValueError if not configured.
get_selector_model() BaseChatModel Returns the selector model, falling back to the indexer model.

Example:

LibrarianConfig(
    indexer_model=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
    selector_model=ChatGoogleGenerativeAI(model="gemini-2.5-flash"),
    always_include_recent_turns=3,
    short_message_threshold=150,
    summary_prompt="Summarize in exactly 2 sentences. Focus on decisions and constraints.",
    selection_prompt="Custom template with {query} and {index} placeholders...",
)

State Types

LibrarianState (TypedDict)

The graph state schema. Extend your own state with these fields.

Field Type Reducer Description
messages Sequence[BaseMessage] add_messages Full conversation history
query str โ€” Current user query
index list[IndexEntry] replace Summary index (persisted across turns)
selected_ids list[int] โ€” IDs selected by the Librarian
curated_messages Sequence[BaseMessage] โ€” โ† Use this in your agent
librarian_metadata dict[str, Any] merge Metrics and debug info

IndexEntry (dataclass)

A single message summary in the index.

Field Type Default Description
id int โ€” Zero-based position in the original message array
role str โ€” Message role ("human", "ai", "tool", etc.)
summary str โ€” โ‰ค3-sentence summary of the message content
original_tokens int 0 Estimated token count of the original message
indexing_latency_ms float 0.0 Time taken to generate this summary (ms)

Storage

BaseIndexStore (Protocol)

Protocol for custom index storage backends. Implement this to create your own store.

Method Signature Description
load (session_id: str) โ†’ list[IndexEntry] | None Load an existing index. Returns None if not found.
save (session_id: str, entries: list[IndexEntry]) โ†’ None Save an index.
delete (session_id: str) โ†’ None Delete an index.

FileIndexStore

JSON file-based index persistence. Each session's index is stored as a separate .librarian-index.json file.

Parameters:

Parameter Type Description
directory str Directory path to store index files

Example:

from langgraph_librarian import LibrarianConfig, FileIndexStore

config = LibrarianConfig(
    indexer_model=my_llm,
    store=FileIndexStore("/path/to/indexes"),
)

Metadata Fields

The librarian_metadata dict in state contains these keys after each step:

After select_and_hydrate:

Key Type Description
selection_skipped bool True if selection was skipped (no index or no query)
reason str Why selection was skipped (only when selection_skipped=True)
selected_ids list[int] IDs selected by the Librarian
selection_reasoning str Raw reasoning output from the selector LLM
selection_latency_ms float Time taken for selection + hydration (ms)
original_count int Number of messages in full history
curated_count int Number of curated messages
original_tokens int Estimated tokens in full history
curated_tokens int Estimated tokens in curated context
token_reduction_pct float Percentage of tokens saved

After index_messages:

Key Type Description
indexing_new_count int Number of newly indexed messages
indexing_total_count int Total entries in the index
indexing_latency_ms float Total time for indexing new messages (ms)

Development

git clone https://github.com/Pinkert7/langgraph-librarian.git
cd langgraph-librarian
pip install -e ".[test]"
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langgraph_librarian-0.1.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langgraph_librarian-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file langgraph_librarian-0.1.0.tar.gz.

File metadata

  • Download URL: langgraph_librarian-0.1.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for langgraph_librarian-0.1.0.tar.gz
Algorithm Hash digest
SHA256 331d849f29467c191d925af5821575e6d9f2f55c5808a540c2aaa9ba7381dba2
MD5 cbbf92d6b859ae29c43440e4cbb5d53e
BLAKE2b-256 4952d0ecd7a21e8f7be3fa3add5e6b37922b05d9455ec06f46e9d8b88a5ddbb8

See more details on using hashes here.

File details

Details for the file langgraph_librarian-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langgraph_librarian-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5342c46c9d919f6dd3581ac0f7a0a07c58c2e2c5736e9a795980c760c1a8ae44
MD5 06008f56428c7c031feae3fcd46e7f33
BLAKE2b-256 9670f05b38093ea095627f6fcbacfa48a36b3a37914ebd956a9c8a7f5e7c81a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page