Monty-backed code-interpreter middleware for deepagents
Project description
langchain-monty
LangChain middleware that gives a deepagents agent an eval_python tool backed by pydantic-monty — Pydantic's Rust-implemented, sandboxed Python interpreter.
The interpreter starts in microseconds, runs in-process, and has zero access to the host filesystem, network, or environment. The only way code running inside the sandbox can reach the outside world is through host tools you explicitly allowlist via the ptc= parameter.
This is the Python analog of langchain-quickjs, which does the same thing with a QuickJS JavaScript VM.
Installation
pip install langchain-monty
Requires Python 3.12+.
Quick start
from deepagents import create_deep_agent
from langchain_monty import MontyCodeInterpreterMiddleware
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
middleware=[MontyCodeInterpreterMiddleware()],
)
result = agent.invoke({"messages": [{"role": "user", "content": "What is 2 ** 32?"}]})
The middleware adds an eval_python tool to the agent and appends a usage guide to the system prompt. The agent can call eval_python with any Python code; the result of the final expression is returned, along with any captured stdout.
Programmatic tool calling (ptc)
By default the interpreter is pure-compute: it has no access to host tools. Pass ptc= with a list of BaseTool objects and/or str tool names to expose those tools inside the sandbox:
from langchain_core.tools import tool
from deepagents import create_deep_agent
from langchain_monty import MontyCodeInterpreterMiddleware
@tool
async def search(query: str) -> str:
"""Search the document index.
Returns a JSON array of results. Each result is a dict with:
- title (str): document title
- url (str): source URL
- snippet (str): matching excerpt
"""
...
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
tools=[search],
middleware=[MontyCodeInterpreterMiddleware(ptc=[search])],
)
Deferred tool names
ptc entries can also be plain strings. String entries register the name in the allowlist but are resolved at runtime from runtime.tools — useful for tools injected by other middleware (e.g. FilesystemMiddleware contributes ls, read_file, write_file, edit_file, glob, grep):
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
middleware=[
MontyCodeInterpreterMiddleware(
ptc=[my_api_tool, "read_file", "ls", "grep"],
),
],
)
BaseTool entries have their schemas shown in the system prompt immediately. str entries are noted as runtime-resolved and their schemas are rendered when they are resolved from the runtime.
Inside the sandbox, the agent can now write:
results = search("LangGraph 0.6 release notes")
[r["title"] for r in results if "breaking" in r["title"].lower()]
Each host-tool call surfaces on the Python side as a FunctionSnapshot. The middleware drives an event loop — invoking the LangChain tool through its normal machinery (so HumanInTheLoopMiddleware, retries, traces, and Command-returning tools all keep working), then resuming Monty with the result. Tools not in the allowlist return an error to the interpreter rather than executing.
Building tools for the sandbox
Monty has no type introspection and the LLM writes code before it has seen any data. The only signal it has about what a host function returns is the tool's docstring, which the middleware surfaces verbatim in both the system prompt and the eval_python tool description. Following these conventions keeps generated code correct on the first attempt.
1. Document the return shape precisely
Name every field, give its type, and note optional or nullable fields. Vague descriptions produce hallucinated field names and silent empty results.
# Bad — the LLM will guess field names and get them wrong
@tool
async def get_compensation_history() -> str:
"""Retrieve salary history records."""
...
# Good — the LLM knows exactly what to expect
@tool
async def get_compensation_history() -> str:
"""
Retrieve salary change history for all employees.
Returns a JSON array. Each record contains:
- employee_id (str): matches employee_id in the roster
- effective_year (int): year the change took effect
- previous_salary (float): salary before the change
- new_salary (float): salary after the change
- raise_pct (float): percentage change (can be negative)
- rating_at_time (float | null): performance rating that drove the raise
"""
...
2. Return JSON-serializable data
Return str (a JSON-encoded payload) or a plain Python type (list, dict, int, float, bool, None). Pydantic models, dataclasses, and other objects will be passed through json.dumps / json.loads before Monty receives them, which may lose information or raise if the object is not serializable.
# Preferred — explicit JSON encoding, no surprises
@tool
async def get_employee_roster() -> str:
records = fetch_employees()
return json.dumps([r.model_dump() for r in records])
3. Name join keys explicitly
When multiple tools return related datasets, call out the join key in every docstring. The LLM needs to know which field to use without inspecting actual data.
"""...
Join with get_compensation_history() on employee_id.
"""
4. Document edge cases
Note nulls, mixed currencies, date formats, and any filtering the tool applies (e.g. active-only). Silent nulls in generated code produce population_n: 0 results with no error.
"""...
- currency (str): ISO 4217 code; records may mix currencies — normalize
before computing ratios across the full population.
- is_active (bool): False records are included; filter with
`[e for e in roster if e['is_active']]` if you only want current employees.
"""
5. Keep field names stable
The LLM hard-codes field names in generated code. Renaming a field is a silent, undetectable breakage — code runs without error but produces empty or wrong results because .get('old_name') returns None.
Full example
import json
from langchain_core.tools import tool
from langchain_monty import MontyCodeInterpreterMiddleware
@tool
async def get_employee_roster() -> str:
"""
Retrieve the full employee roster.
Returns a JSON array. Each record contains:
- employee_id (str): unique identifier, join key for all other datasets
- department (str): e.g. "Engineering", "Sales"
- title (str): job title
- seniority_level (int): 0 (IC) – 3 (VP)
- hire_date (str): ISO 8601 date
- location (str): office city
- gender (str | null): self-reported; null if not disclosed
- age (int): age in years at last review cycle
- current_salary (float): USD annual base salary
- manager_id (str | null): employee_id of direct manager
- is_active (bool): False for departed employees
"""
return json.dumps(fetch_roster())
middleware = MontyCodeInterpreterMiddleware(ptc=[get_employee_roster])
Resource limits
Use MontyLimits to control per-call resource budgets:
from langchain_monty import MontyCodeInterpreterMiddleware, MontyLimits
limits = MontyLimits(
max_duration_secs=10.0, # wall-clock time (default 5.0)
max_memory_bytes=128_000_000, # heap cap (default 64 MB)
max_stack_depth=512, # recursion limit (default 256)
max_allocations=2_000_000, # allocation count (default 1 000 000)
)
middleware = MontyCodeInterpreterMiddleware(limits=limits)
Constructor reference
| Parameter | Type | Default | Description |
|---|---|---|---|
ptc |
Sequence[BaseTool | str] | None |
None |
Tools the interpreter may call. BaseTool entries are available immediately — their schemas appear in the system prompt. str entries are deferred: the name is registered in the allowlist and resolved at runtime from runtime.tools (useful for tools injected by other middleware). None means pure-compute only. |
limits |
MontyLimits | None |
None |
Per-call resource budgets. Uses defaults when None. |
skills_backend |
BackendProtocol | BackendFactory | None |
None |
Deepagents backend that supplies Monty-compatible Python helpers. Callables are exposed as skill_<module>_<name> inside the interpreter. |
system_prompt |
str | None |
Built-in block | System-prompt block appended to every model call. Pass None to keep the tool but add no prompt text. |
tool_description |
str | None |
Built-in template | Description rendered on the eval_python tool. Supports {available_host_tools}, {max_duration_secs}, {max_memory_bytes}, {max_stack_depth} placeholders. |
iteration_budget |
int |
64 |
Hard cap on host-tool round-trips per eval_python call. Exceeding it returns an IterationBudgetExceeded error. |
Return shape
eval_python always returns a JSON object with three fields:
{
"result": <value of final expression, or null>,
"stdout": "<captured stdout>",
"error": null
}
On failure:
{
"result": null,
"stdout": "",
"error": {
"type": "ZeroDivisionError",
"message": "division by zero"
},
"attempted_code": "1 / 0"
}
The attempted_code field is populated only when error is set, to aid debugging.
Three error classes the agent can act on differently:
- Parse/compile errors — syntax or unsupported-feature errors (e.g. classes). The agent should fix the code.
- Resource-exhaustion errors — duration, memory, stack, or allocation limits exceeded. The agent should reduce scope.
IterationBudgetExceeded— the interpreter made too many host-tool calls in one invocation. The agent should restructure its code.
Sandbox capabilities
Monty implements a Python subset. Currently supported stdlib modules:
sys, os, typing, asyncio, re, datetime, json, dataclasses
Not supported (yet): class definitions, real imports beyond the listed modules.
The sandbox has no access to the host filesystem, network, subprocesses, or environment variables. All communication with the outside world goes through explicitly allowlisted host tools.
Async support
The tool is always called eval_python. Internally the middleware registers both a sync and an async implementation; LangChain dispatches to the async path automatically when you use agent.ainvoke(...):
result = await agent.ainvoke({"messages": [{"role": "user", "content": "go"}]})
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check src tests
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_monty-0.1.1.tar.gz.
File metadata
- Download URL: langchain_monty-0.1.1.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82876fa3548a50d455b74ae34c4cb32385cbad7af3b6bc1bdb1516821772c196
|
|
| MD5 |
ed33dc816eb8383db3be6265f410c3db
|
|
| BLAKE2b-256 |
4a98400e03b5127dc9dbbc0b1353f96e709950f12a32ebe16c86f1133ad5f8b6
|
File details
Details for the file langchain_monty-0.1.1-py3-none-any.whl.
File metadata
- Download URL: langchain_monty-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
238ab7619e88ad292499b68420c693488de518684a31b5c98bd28e07b2e1427a
|
|
| MD5 |
a92973a06fea3b1f7f8cb99e221f254b
|
|
| BLAKE2b-256 |
6e331e86242a89ec6364649e0996d8a8c1dec0234b07d1c86c08920f2d5c1771
|