Skip to main content

Runtime monitor for LLM agent interaction protocols based on session type theory

Project description

llmcontract

A runtime monitor for LLM agent interaction protocols based on session type theory.

llmcontract lets you define communication protocols using a concise DSL inspired by session types, then monitor agent interactions at runtime to catch protocol violations the moment they happen.

Installation

pip install -e .

Protocol DSL

Protocols are written as strings using this syntax:

Syntax Meaning
!label Send action
?label Receive action
!{a, b} Internal choice (sender chooses)
?{a, b} External choice (receiver chooses)
. Sequence
rec X. ...X... Recursion
end Terminal state

Examples

A flight booking protocol — a strict linear sequence:

!SearchFlights.?FlightResults.!PresentOptions.?UserApproval.!BookFlight.?BookingConfirmation.end

A card payment protocol — with branching and recursion:

!CreateCard.?{CardCreated.rec X.!Transaction.?{TransactionOK.X, SessionEnd}, CardError}.end

Usage

from llmcontract import Monitor, Ok, Violation, Blocked

protocol = "!SearchFlights.?FlightResults.!BookFlight.?BookingConfirmation.end"
m = Monitor(protocol)

m.send("SearchFlights")    # Ok()
m.receive("FlightResults") # Ok()
m.send("BookFlight")       # Ok()
m.receive("BookingConfirmation") # Ok()
assert m.is_terminal

Catching violations

m = Monitor("!Ping.?Pong.end")
m.send("Ping")       # Ok()
m.send("Pong")       # Violation(expected=['?Pong'], got='!Pong')
m.send("Anything")   # Blocked('monitor halted after a previous violation')

Working with choices

protocol = "!CreateCard.?{CardCreated.!Done.end, CardError.end}"
m = Monitor(protocol)
m.send("CreateCard")       # Ok()
m.receive("CardError")     # Ok() — the receiver chose this branch
assert m.is_terminal

Recursion

protocol = "rec X.!Ping.?Pong.X"
m = Monitor(protocol)
for _ in range(100):
    m.send("Ping")     # Ok()
    m.receive("Pong")  # Ok()

Handling natural-language input: Unrecognized

When the projection layer (typically over user chat) can't classify an event into a known label, it can emit the sentinel UNRECOGNIZED instead. The monitor treats this as a soft signal — distinct from Violation — without halting or advancing state, so the outer loop can drive a clarification turn:

from llmcontract import Monitor, Ok, Unrecognized, UNRECOGNIZED

m = Monitor("?{Yes.end, No.end}")
result = m.receive(UNRECOGNIZED)         # projection couldn't decide
assert isinstance(result, Unrecognized)  # not a Violation
# state preserved; ask the agent to ask the user to clarify, then:
m.receive("Yes")                         # Ok()

A protocol can also handle Unrecognized explicitly as a first-class branch — useful for "ask again" loops:

protocol = "rec Loop.!Ask.?{Yes.end, No.end, Unrecognized.Loop}"
m = Monitor(protocol)
m.send("Ask")
m.receive(UNRECOGNIZED)  # Ok — protocol routes back to Loop
m.send("Ask")
m.receive("Yes")         # Ok — terminal

The distinction matters at the system boundary: Violation means the agent broke the rules; Unrecognized means we don't have enough information to decide yet. Different responses (halt vs. clarify) come naturally from the typed result.

Integration Layer

For real agent loops, llmcontract provides a client wrapper and tool middleware that share a single monitor — so the full interaction is tracked automatically.

Client Wrapper

Wraps any LLM client call. Checks !Send before calling the LLM and ?Receive after getting the response. SDK-agnostic — you provide a small adapter function.

from llmcontract import Monitor, MonitoredClient, LLMResponse, ToolCall

monitor = Monitor(
    "rec Loop.!Request.?{ToolCall.!ToolResult.Loop, FinalAnswer.end}"
)

# Adapt your SDK's response to LLMResponse
def adapt(raw):
    if raw.tool_calls:
        return LLMResponse(tool_calls=[
            ToolCall(name=tc.function.name, arguments=tc.arguments, id=tc.id)
            for tc in raw.tool_calls
        ])
    return LLMResponse(content=raw.content)

client = MonitoredClient(
    llm_call=openai.chat.completions.create,
    response_adapter=adapt,
    monitor=monitor,
    send_label="Request",
    receive_label=lambda r: "ToolCall" if r.has_tool_calls else "FinalAnswer",
)

response = client.call(model="gpt-4", messages=[...])
# Automatically fires !Request then ?ToolCall or ?FinalAnswer

Tool Middleware

Wraps tool execution. When the LLM requests a tool, the middleware checks ?Receive (tool requested) and !Send (result returned) against the protocol.

from llmcontract import ToolMiddleware

middleware = ToolMiddleware(
    monitor=monitor,  # same monitor as the client
    tools={
        "search": search_fn,
        "book": book_fn,
    },
)

# Process all tool calls from a response
results = middleware.process(response)
# Each tool call checks ?receive and !send against the protocol

Combined Agent Loop

from llmcontract import (
    Monitor, MonitoredClient, ToolMiddleware,
    LLMResponse, ToolCall, ProtocolViolationError,
)

protocol = "rec Loop.!Request.?{ToolCall.!ToolResult.Loop, FinalAnswer.end}"
monitor = Monitor(protocol)

client = MonitoredClient(
    llm_call=my_llm_fn,
    response_adapter=my_adapter,
    monitor=monitor,
    send_label="Request",
    receive_label=lambda r: "ToolCall" if r.has_tool_calls else "FinalAnswer",
)

while True:
    try:
        response = client.call(messages=messages)
    except ProtocolViolationError as e:
        print(f"Protocol violated: {e}")
        break

    if not response.has_tool_calls:
        break  # FinalAnswer — protocol complete

    # Execute tools, send results back
    for tc in response.tool_calls:
        result = tools[tc.name](**tc.arguments)
        monitor.send("ToolResult")  # record the send
        messages.append(tool_result_msg(tc.id, result))

Langfuse Integration

Track protocol compliance in Langfuse — every send/receive is recorded as a guardrail observation with a pass/fail score.

pip install llmsessioncontract[langfuse]
from langfuse import get_client
from llmcontract.integration.langfuse import LangfuseMonitor

langfuse = get_client()

with langfuse.start_as_current_observation(name="agent-run") as trace:
    monitor = LangfuseMonitor(
        protocol="!Request.?{ToolCall.!ToolResult.end, FinalAnswer.end}",
        langfuse=langfuse,
    )

    monitor.send("Request")       # guardrail: ok ✓
    monitor.receive("ToolCall")   # guardrail: ok ✓
    monitor.send("ToolResult")    # guardrail: ok ✓
    monitor.send("ExtraCall")     # guardrail: VIOLATION ✗

langfuse.flush()

Each step appears as a guardrail observation in your Langfuse trace with:

  • Input: the action attempted, direction, label, protocol
  • Output: passed: true/false, violation details if applicable
  • Score: protocol_compliance (boolean) for filtering and analytics

Claude Code Plugin

A Claude Code plugin ships with this repo: protocol-builder walks you through designing a session-type protocol conversationally, validates it as you go, and emits a ready-to-paste Python integration snippet.

# Install in Claude Code
/plugin marketplace add chrisbartoloburlo/llmcontract
/plugin install protocol-builder@llmcontract

# Then in any conversation
/protocol-builder

The skill validates each draft DSL against llmcontract's parser, so anything it produces is guaranteed to load with Monitor(...). Source lives under skills/protocol-builder/.

Case Studies

  • llmcontract-tau2 — Standalone replay of tau2-bench's shipped trajectories through Monitor. Headline: 11/1755 (0.6%) of trajectories that tau2 scored as passing violate the documented "obtain user confirmation before mutating the database" policy. Discussion upstream: tau2-bench#298.

Research

This work is based on the theory developed in:

Christian Bartolo Burlò, Adrian Francalanza, Alceste Scalas. "On the Monitorability of Session Types, in Theory and in Practice". 35th European Conference on Object-Oriented Programming (ECOOP 2021), pp. 20:1–20:30, Schloss Dagstuhl, 2021. [PDF] [Google Scholar]

Architecture

DSL string ──▶ Parser ──▶ AST ──▶ FSM Compiler ──▶ Automaton ──▶ Monitor
  • Parser (llmcontract.dsl.parser) — hand-written recursive descent parser that produces an AST
  • AST (llmcontract.dsl.ast) — frozen dataclasses: Send, Receive, InternalChoice, ExternalChoice, Sequence, Recursion, RecVar, End
  • FSM Compiler (llmcontract.monitor.automaton) — compiles the AST into a finite state automaton with transitions keyed by (direction, label)
  • Monitor (llmcontract.monitor.monitor) — steps through the automaton on each send/receive call, returning Ok, Violation, or Blocked
  • MonitoredClient (llmcontract.integration.client) — wraps any LLM client call with automatic protocol checks
  • ToolMiddleware (llmcontract.integration.middleware) — intercepts tool execution with protocol checks
  • LangfuseMonitor (llmcontract.integration.langfuse) — records protocol events as Langfuse guardrail observations

Tests

pip install -e ".[dev]"
pytest llmcontract/tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsessioncontract-0.2.2.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmsessioncontract-0.2.2-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file llmsessioncontract-0.2.2.tar.gz.

File metadata

  • Download URL: llmsessioncontract-0.2.2.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for llmsessioncontract-0.2.2.tar.gz
Algorithm Hash digest
SHA256 2d583d0eac19748e48c3c31624f34e3ec5e279165da9a3122a2769020c9df8a2
MD5 bb0d6d8b536da8ee317222873062057b
BLAKE2b-256 140c297ef8857386c2638d24ba7dfa39b970b08a4a2bafd2d1bdf4e6b17d6e0f

See more details on using hashes here.

File details

Details for the file llmsessioncontract-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llmsessioncontract-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d47cfaf92b335c6fc0cde8b7ce5813fa38a5ba0ab6f41ebf6f6a061421e5cf0c
MD5 e5507d25da6cfe9781ea81ec7ef1e381
BLAKE2b-256 20f3df856010cee09e795a7c14809d98d8373f5c98c60c624713adc96f98994e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page