Skip to main content

JavaScript REPL middleware for deepagents

Project description

langchain-quickjs

A deepagents middleware that gives an agent a persistent, sandboxed JavaScript REPL tool, backed by quickjs-rs (QuickJS embedded via PyO3 + rquickjs).

Instead of issuing N serial tool calls, the model can write one block of JavaScript that orchestrates work in-loop — variables and functions defined in one call survive into the next, Promise.all runs concurrent work, and (opt-in) agent tools are callable from inside the REPL as await tools.<name>(...).

from deepagents import create_deep_agent
from langchain_quickjs import REPLMiddleware

agent = create_deep_agent(
    model="claude-sonnet-4-6",
    middleware=[REPLMiddleware()],
)

Why

Tool calling is fine for small, discrete requests. It falls apart when the model needs to:

  • loop over a list and call a tool per item
  • run two independent tool calls concurrently
  • compute something between calls (aggregate, filter, dedupe, format)
  • reuse intermediate state across several turns

Each of those currently costs one round-trip to the model per step. With a REPL, all of it happens in one eval call. This enables programmatic tool calling where the model writes JavaScript that invokes the agent's own tools.

Install

uv add langchain-quickjs

langchain-quickjs depends on quickjs-rs, a PyO3 extension module that ships prebuilt wheels for macOS, Linux, and Windows on CPython 3.11+.

Quick start

from deepagents import create_deep_agent
from langchain_quickjs import REPLMiddleware

agent = create_deep_agent(
    model="claude-sonnet-4-6",
    middleware=[REPLMiddleware()],
)

# Use `ainvoke` — PTC bridges register as async QuickJS host functions,
# and sync `invoke` on a REPL with async bridges raises ConcurrentEvalError.
result = await agent.ainvoke({"messages": [{"role": "user", "content": "..."}]})

The middleware:

  1. registers an eval tool (configurable name) that runs JS in a persistent context;
  2. appends a short system-prompt snippet explaining the tool's semantics (sandbox, timeout, memory limit);
  3. gives every LangGraph thread_id its own QuickJS Runtime, so two conversations can't see each other's globals.

What the REPL is

Persistence

The REPL is module-flavoured: top-level let/const/function persist across eval calls in the same thread. By default (snapshot_between_turns=True), state also persists across turns in the same LangGraph thread_id by snapshotting after each run and restoring before the next.

Set snapshot_between_turns=False to reset REPL state after each turn. Snapshot payloads are capped by max_snapshot_bytes (defaults to memory_limit); oversized snapshots are dropped instead of persisted.

// call 1
const fib = (n) => (n < 2 ? n : fib(n - 1) + fib(n - 2));

// call 2
fib(10)  // 55

Sandbox

The REPL runs in a QuickJS context with no ambient capabilities. There is no filesystem, no network, no fetch, no require, no real clock (Date.now() is whatever QuickJS provides, not wall-clock for security-sensitive uses), no process, no import of anything you didn't explicitly install.

Escape hatches, if you want them, go through explicit middleware:

  • PTC — to call into the agent's own tools (see below).
  • Skills — to pre-install JS/TS modules the agent can import.

Console capture

console.log / console.warn / console.error are captured by default and returned as a <stdout> block alongside the result, separately truncated. Disable with capture_console=False if you'd rather the guest see no console at all.

console.log("hi", 2);
1 + 1
<stdout>
hi 2
</stdout>
<result>2</result>

Timeouts and memory

Each call has a per-call wall-clock timeout (default 5 s). Breaching it produces:

<error type="Timeout">...</error>

The runtime has a shared memory limit across every context under it (default 64 MiB). OOM surfaces as:

<error type="OutOfMemory">...</error>

PTC host-function calls are also budgeted per eval call (default 256 tools.* invocations). Exceeding the budget surfaces as:

<error type="PTCCallBudgetExceeded">...</error>

Set max_ptc_calls=None only in trusted environments. Disabling the budget allows unbounded PTC-call loops and increases DoS risk.

Top-level await works on the async path — the promise settles before the call returns. An un-resolvable top-level promise (no host work in flight, no resolver) surfaces as <error type="Deadlock">.

Result formatting

Every eval renders into one wire format consumed by the model:

Outcome Rendered as
Marshalable value <result>{json-ish}</result>
Function or unmarshalable <result kind="handle">[Function] arity=2</result>
JS-level throw <error type="TypeError">{message}\n{stack}</error>
Timeout / deadlock / OOM <error type="Timeout" | "Deadlock" | "OutOfMemory">...</error>
console.* output separate <stdout>...</stdout> block

Results and stdout are independently truncated to max_result_chars (default 4000) before being sent back to the model.

Numeric rendering follows Node's REPL convention — whole-valued floats (42.0) render as integers (42) so the model isn't confused by JS's single numeric type.

Programmatic tool calling (PTC)

PTC is the reason to use this middleware over a plain code-interpreter tool. When configured, each exposed tool is available inside the REPL as:

async tools.<camelCaseName>(input: {...}): Promise<string>

So an agent with a search_web tool and a summarize tool can do:

const results = await Promise.all([
  tools.searchWeb({ query: "deepagents" }),
  tools.searchWeb({ query: "quickjs" }),
]);

await tools.summarize({ text: results.join("\n\n") })

...in one eval call — three tool invocations, zero round-trips to the model between them.

Enabling it

REPLMiddleware()                              # disabled (default)
REPLMiddleware(ptc=["search_web"])            # explicit allowlist
REPLMiddleware(ptc=[search_tool])             # explicit tool object allowlist

The REPL's own tool is always excluded from PTC; tools.eval("tools.eval(...)") would be pointless recursion, and if the model wants nested code it can just write nested code in one call.

What the model sees

When PTC is on, the system-prompt snippet grows an API Reference — tools namespace section listing every exposed tool as a TypeScript-ish signature derived from the tool's args schema:

/** Search the web for the given query. */
async tools.searchWeb(input: {
  /** The query string. */
  query: string;
  /** Max results. */
  limit?: number;
}): Promise<string>

Enums, anyOf unions, nested objects, and arrays are all supported by the schema renderer. Opaque types fall back to Record<string, unknown> — the description is usually enough.

How it works (so you can debug it)

  • Each PTC-exposed tool gets a QuickJS host-function bridge registered under a generated __tools_* global symbol. The bridge is async, so the guest sees tools.x(...) as returning a Promise.
  • globalThis.tools is rebuilt every turn from the currently-exposed name set. So if an upstream middleware filters tools on a per-turn basis, the tools namespace follows along.
  • When the bridge invokes a tool, it forwards the ToolRuntime captured from the outer eval call — so subagent tools like task see graph state, store, context, and a synthesised child tool_call_id.
  • Tool return values are coerced to strings: strings pass through, ToolMessages get unwrapped, a Command has its last-message content extracted, everything else gets json.dumps'd.

Skills: importable JS/TS modules

If your agent uses SkillsMiddleware (from deepagents), any skill whose frontmatter includes a module: key becomes dynamically importable inside the REPL:

const helpers = await import("@/skills/my-helpers");
helpers.greet("world")

Under the hood:

  • At eval time, the middleware scans the source for literal "@/skills/<name>" specifiers.
  • For each referenced skill, it fetches the skill directory through your BackendProtocol, packages every typescript file into a module scope, and installs it under the bare specifier.
  • Installs are cached per-Runtime — each skill loads at most once, and a broken skill is cached as an error so it doesn't re-hit the backend every eval.
  • If a skill referenced in source isn't available or fails to install, the eval call short-circuits with <error type="SkillNotAvailable">...</error> — the model sees a clean failure instead of a guest-side ReferenceError.
  • Skills are isolated: one skill's scope can't bare-import another. Bundle shared code into each skill or re-export through a single skill.

Enable it by passing the same BackendProtocol your SkillsMiddleware uses:

REPLMiddleware(skills_backend=my_backend)

There's a hard cap of 1 MiB per skill bundle. If you hit it, split the skill or prune generated code.

Configuration reference

REPLMiddleware(
    memory_limit=64 * 1024 * 1024,  # bytes, shared across contexts
    timeout=5.0,                     # per-call seconds
    max_ptc_calls=256,     # per-eval `tools.*` bridge calls, None disables (DoS risk)
    tool_name="eval",                # what the model calls it
    max_result_chars=4000,           # result/stdout truncation, each
    capture_console=True,            # install console.log/warn/error bridge
    snapshot_between_turns=True,     # snapshot in after_agent, restore in before_agent
    max_snapshot_bytes=None,         # defaults to `memory_limit`; larger snapshots are dropped
    ptc=None,                        # None | list[str] | list[BaseTool]
    skills_backend=None,             # BackendProtocol for @/skills/<name> imports
)

Errors the model can see

Type Cause
SyntaxError, TypeError, ReferenceError, ... User-code error. Re-surfaces the JS error name verbatim.
Timeout Call exceeded timeout=.
OutOfMemory Runtime hit memory_limit=.
PTCCallBudgetExceeded Uncaught tools.* call-budget overflow in one eval (max_ptc_calls=).
Deadlock Top-level promise never resolved with no async host work in flight.
ConcurrentEval Shouldn't happen under locks; defensive mapping for QuickJS ConcurrentEvalError.
SkillNotAvailable Source referenced @/skills/<name> we couldn't resolve or install.

asyncio.CancelledError propagates out cleanly when JS declines to catch a HostCancellationError — so LangGraph cancellation semantics work end-to-end.

License

MIT. See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_quickjs-0.1.0.tar.gz (202.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_quickjs-0.1.0-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_quickjs-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_quickjs-0.1.0.tar.gz
  • Upload date:
  • Size: 202.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for langchain_quickjs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e3dde78cf1449761422c7d8de05608926789a5b49b30bc3723471249059d6dc0
MD5 ebee97deb40610318108ea24dd7b8199
BLAKE2b-256 f7deb4a24d1edb6cf41a9869c5e469882b40e7f82ca25f42e4ef2d3aee88941e

See more details on using hashes here.

File details

Details for the file langchain_quickjs-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_quickjs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a32d1a163dda95567407467c2d5502df520fbbee300f75ab6f489c49eb6ad61
MD5 f00901f047e28198b5e5d90bc37d29d9
BLAKE2b-256 15d46b1eade2183d41631bb851d67e20f167425fb05ba644503603f8ef8a7691

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page