Skip to main content

Untrusted-code containment for a trusted agent โ€” named gVisor boxes with a Python-native policy

Project description

temenos

A secure runtime for AI agents. ๐Ÿ›๏ธ

Your agent runs on the host โ€” the code it executes runs in a gVisor box.

PyPI Python License: Apache 2.0 Built on gVisor + MCP leak-tested

cd ~/code/my-repo
temenos claude        # Claude runs on the host; everything it *executes* runs in a box

That one command keeps Claude Code where it works best โ€” on the host, with its auth, updates, and model API intact โ€” while banning every native host-touching tool (Bash, Read, Write, Edit, WebFetch, โ€ฆ) and routing its only execution path through a box: a rootless gVisor sandbox with a small, Python-native policy.

A shell that tries to rm -rf ~, read ~/.ssh/id_rsa, or overwrite /usr/bin is contained โ€” not because the model promised to behave, but because the sandbox boundary won't let it (and --no-net cuts egress too). The agent is trusted; the code it runs is not. ๐Ÿ›ก๏ธ

And because that boundary is structural โ€” a banned tool, not a model on its best behavior โ€” it holds the same whether you supervise one agent by hand or run a thousand in allow-all mode. Same box, any scale. Scale it up when you need to.

temenos (ฯ„ฮญฮผฮตฮฝฮฟฯ‚): a bounded precinct โ€” a space set apart with a clear edge.


โœจ Highlights

  • ๐Ÿ›๏ธ Agent on the host, execution in a box. No broken updates, no API keys plumbed into a container, no re-auth. Only the code the agent runs is sandboxed.
  • ๐Ÿ One box or a thousand. The multi-box BoxManager is the same code path whether it's one repo, your overnight swarm, or a multi-tenant platform. Allow-all stays safe because the dangerous capability is removed, not merely discouraged.
  • ๐Ÿ”’ Real isolation, not a syscall allowlist. gVisor is a userspace kernel โ€” the host filesystem is invisible beyond what policy mounts, and most kernel-CVE surface is intercepted before it reaches the host. Network is one flag (--no-net to fully isolate).
  • ๐Ÿšซ Sole-execution-path, enforced. temenos claude denies native tools and exposes only mcp__temenos__exec/read/write/list over MCP, with --strict-mcp-config so a stray .mcp.json can't re-open a host-capable server.
  • ๐Ÿ“ฆ Boxes are first-class. Named, persistent, checkpointed, inspectable โ€” temenos exec, temenos shell, temenos diff, temenos audit. Everything lives in a .temenos/<box>/ you can rm -rf.
  • ๐Ÿ’พ Durable by default. Background checkpoint (gVisor fscheckpoint, ~30 ms) + restore on next use โ€” re-run temenos claude in a repo and you resume where you left off.
  • ๐Ÿ A clean core API. Policy โ†’ Box โ†’ ExecResult. The CLI and MCP server are thin layers over the same Box you can use directly from Python. Core has zero runtime deps.
  • ๐Ÿงช Leak-tested. A containment battery (tests/leak/) is the acceptance gate: no host write, host secrets invisible, egress blocked when isolated, /proc escape blocked, memory cap OOM-kills.

๐Ÿค” What it is

A runtime that gives trusted agents an untrusted-code execution surface โ€” whether that's one agent or a swarm of them. You point a harness (Claude Code today; any MCP-capable agent in principle) at a box and remove its host-touching tools. The agent keeps editing your real files and calling its model โ€” but every bash/python/file/network action it takes happens inside gVisor, under a policy you set, observable and reversible. Run that for one repo, or run it fifty times in parallel under one daemon โ€” same boundary either way.

๐Ÿšซ What it is NOT

Notโ€ฆ Because
A Docker / container runtime It doesn't package or ship services. It wraps gVisor to confine an agent's execution and mounts your real repo live โ€” the unit is a task, not an image.
A VM-per-task sandbox The agent stays on the host (auth, updates, model API intact). Spinning a VM per task throws all that away; temenos boxes only what runs.
A seccomp / AppArmor filter gVisor is a full userspace kernel, not a syscall allowlist bolted onto the host kernel โ€” a categorically larger isolation boundary.
A defense against a malicious agent The threat model trusts the agent binary. temenos contains the untrusted code the agent runs, not the agent itself.
A network firewall v1 network is a toggle: full passthrough by default (no filtering) or off (--no-net, isolated). Filtered per-host egress is post-v1 โ€” the load-bearing gap for adversarial fleets (see limits).

โš–๏ธ How it compares

temenos Docker container VM per task firejail / bubblewrap prompt guardrails
Isolation boundary userspace kernel (gVisor) shared host kernel + ns hardware shared kernel + seccomp/ns none
Agent stays on host (auth/updates intact) โœ… โš ๏ธ (boxed โ†’ loses host context) โŒ โš ๏ธ partial โœ…
Sole-execution-path for an agent โœ… built-in (deny natives + MCP) ๐Ÿ”ง DIY ๐Ÿ”ง DIY ๐Ÿ”ง DIY โŒ (trust the model)
Fleet control plane (N boxes, one daemon) โœ… BoxManager ๐Ÿ”ง DIY (compose/k8s) ๐Ÿ”ง DIY โŒ โŒ
Kernel-CVE surface low high low high n/a
Per-task object (named, checkpointed, inspectable) โœ… โœ… (containers) โš ๏ธ heavy โŒ โŒ
Setup per task low (rootless, a box dir) medium high low none

In short: containers and VMs isolate whole programs you ship; firejail filters syscalls on the host kernel; prompt-level guardrails ask nicely. temenos isolates the code trusted agents run, keeps the agents on the host, and makes each box a first-class, inspectable object you can run one of โ€” or a fleet of. It builds on gVisor and the Model Context Protocol. ๐Ÿ™‚

๐Ÿงฉ How it works

   you โ”€โ”€โ–บ claude (host)            (ร—N agents, in a swarm)
              โ”‚  native tools BANNED (--disallowedTools, --strict-mcp-config)
              โ”‚  only mcp__temenos__* ALLOWED
              โ–ผ
        temenos daemon  โ”€โ”€HTTP /mcp/<box-id>โ”€โ”€โ–บ  Box (gVisor / runsc)
        (one per user,                            โ€ข host /usr,/etc bound read-only
         supervises every box)                    โ€ข repo mounted (live-writable by default)
                                                  โ€ข network on by default (--no-net isolates)
                                                  โ€ข writes land in an overlay

A box = a Policy + a gVisor runtime + a data dir. One daemon per user auto-spawns on first use and supervises every box, serving a REST control plane (the CLI) and a per-box MCP data plane (the agents). Boxes are keyed by the hash of their data dir, so two repos' default boxes โ€” or fifty swarm agents โ€” never collide. For the full design, decisions, and verification log, see plan.md.

๐Ÿ“ฆ Install

temenos is Linux + gVisor for v1; a macOS (Seatbelt) backend is designed โ€” see macos_plan.md.

1. gVisor (runsc) โ€” the sandbox. (official guide)

ARCH=$(uname -m)
wget https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}/runsc
chmod +x runsc && sudo mv runsc /usr/local/bin/

2. temenos

pip install "temenos[all]"        # daemon + MCP + CLI
# or from a checkout:
git clone https://github.com/farizrahman4u/temenos && cd temenos
pip install -e ".[all,dev]"

The core library has zero runtime deps; [all] pulls FastAPI/uvicorn/mcp/httpx for the daemon and CLI. The bare temenos image โ€ฆ commands work without extras.

3. (optional) mmdebstrap โ€” to build clean box base images (so boxes can apt/pip/npm install into a writable system). Without it you boot against the host's read-only /usr.

sudo apt-get install mmdebstrap

4. Check your host:

$ temenos doctor
gVisor (runsc):     yes
  platform:         ptrace          # kvm on bare metal, systrap on most VMs, ptrace on WSL2
mmdebstrap:         yes
systemd-run:        yes             # required to ENFORCE memory/cpu limits (see Limits)

๐Ÿš€ Quickstart

One box โ€” your repo

cd ~/code/my-repo

temenos create                       # makes .temenos/default in this repo (+ .gitignore)
temenos exec default -- python3 -c "print(6*7)"
temenos exec -it default -- python3  # interactive REPL (PTY); also vim, bash, etc.
temenos shell default                # an interactive shell inside the box
temenos ls                           # boxes the daemon is running
temenos audit default                # what ran in the box
temenos diff default                 # files under the box's write paths
temenos rm default                   # stop + delete the box

A bare box name resolves project-first (.temenos/<name>, walking up from CWD), then global (~/.local/share/temenos/boxes/<name>); a project box shadows a global one of the same name (with a warning).

Attach Claude Code:

temenos claude                       # box 'default' in this repo (network on by default)
temenos claude --box review --no-net # a separate box, fully network-isolated
temenos claude --dry-run             # print the exact claude invocation, don't launch
temenos claude -- --model opus       # args after `--` go to claude

The repo mounts live-writable, so the agent's edits land in your real files โ€” the sandbox contains execution, not the trusted agent's edits. --ephemeral flips the repo to read-only.

Many boxes โ€” a swarm

Fan a task across dozens of agents and approving each tool call by hand is a non-starter, so you run them allow-all. The structural boundary is what makes that safe: an agent can yolo freely because there's nothing dangerous to allow โ€” every action lands in a policy'd box. The CLI and MCP server are thin layers over the same Box/BoxManager you can drive directly:

from temenos import Box, Policy
from temenos.manager import BoxManager

# one box, directly โ€” filesystem locked by default (no host writes, tight limits);
# network is on by default, so pass network=False to isolate it
with Box("demo", Policy(write=["/home/me/out"], network=False)) as box:
    box.write_file("/home/me/out/run.py", "print(6 * 7)\n")
    print(box.exec(["python3", "/home/me/out/run.py"]).stdout)   # "42\n"
    box.exec(["cat", "/etc/shadow"]).ok                          # -> False (host invisible)

# a fleet โ€” one contained box per agent, via the registry the daemon owns
mgr = BoxManager()
ids = [mgr.create(f"/srv/boxes/agent-{i}", Policy()) for i in range(50)]
for bid in ids:
    print(mgr.get(bid).exec(["echo", "hi"]).stdout.strip())
mgr.shutdown()    # checkpoints (where enabled) + tears down the whole fleet

Policy is frozen; restrict() derives child policies that can only narrow (widening raises PolicyViolation). gVisor is the density that makes a per-agent box cheap โ€” a VM each is too heavy, a plain container a weaker boundary. (mgr.map(...) fan-out sugar is on the roadmap; the loop above works today.) Runnable: examples/python_api.py.

A fleet under one daemon

temenos serve --port 8839     # REST control + per-box MCP (/mcp/<box-id>), supervising every box

BoxManager is also the multi-tenant control plane โ€” a "tenant" and an "agent" are the same abstraction, so "run my swarm" and "run many customers' agents on untrusted code" are the same code, not two products. The isolation invariant โ€” no writable mount is ever shared across boxes โ€” holds today; tenant-scoped tokens and aggregate quotas are the platform-tier roadmap.

๐Ÿ“š Documentation

Full docs live in docs/:

๐Ÿง  The one design decision

The agent runs on the host; only what it executes runs in a box. That single split is what makes temenos both usable and safe: the agent keeps its identity, updates, and model access (so it actually works), while every command it issues crosses a hard sandbox edge (so it can't hurt you). Everything else โ€” the MCP data plane, the banned-natives wiring, the checkpointing box, the multi-box registry โ€” exists to make that split airtight and the "code" the agent runs the sole execution path. And because that boundary is structural, not a promise, the split holds identically whether you supervise one agent by hand or run a hundred in allow-all mode: the box is the enforcement, not the human.

๐Ÿ—‚๏ธ CLI reference

Command What it does
temenos doctor gVisor/platform/mmdebstrap/systemd capability check
temenos image build NAME [--from mmdebstrap|minimal|host-copy|download] build a box base image
temenos image ls ยท rm NAME list / remove images
temenos serve [--port] run the per-user daemon (auto-spawned otherwise)
temenos create [NAME] [flags] create/ensure a box in this project
temenos ls list running boxes (project boxes marked)
temenos exec [-it] NAME -- CMDโ€ฆ run a command in a box (-it = interactive PTY)
temenos shell NAME interactive shell in a box (PTY)
temenos rm NAME [--keep-data] stop + delete a box
temenos audit NAME ยท diff NAME audit log / write-set manifest
temenos claude [--box N] [flags] [-- claude-args] attach Claude with natives banned
temenos version print version

Box-creation flags (on create and claude): --image NAME, --net/--no-net, --scratch disk\|memory, --force-memory, --ephemeral-fs (never checkpoint), --no-autosave (checkpoint only on close), --ephemeral (repo read-only), --volume HOST:TARGET[:ro\|rw], --memory MB, --cpu SECONDS, --global.

๐Ÿ›ก๏ธ Threat model & honest limits

The agent is trusted (you installed it; it authenticates as you; it isn't trying to escape). The code it runs is untrusted โ€” model-authored shell/python that may be buggy, prompt-injected, or hostile. temenos's job is the sole-execution-path guarantee: every bit of that code goes through a box, and a box can't touch the host beyond its policy. That guarantee is what lets you take humans out of the loop at fleet scale.

Property Status (v1, gVisor)
Filesystem escape blocked โ€” host invisible beyond policy mounts; /proc/1/root is the box
Host writes outside policy blocked โ€” /usr,/etc read-only; writes go to an overlay
Network exfiltration blocked with --no-net (isolated netns) โ€” but network is on by default (see limits)
Cross-box crosstalk blocked โ€” no writable mount is ever shared between boxes
Kernel-CVE surface mostly blocked โ€” gVisor intercepts syscalls in userspace
Memory/CPU/pid exhaustion enforced via a per-box systemd scope (needs delegation โ€” below)

Limits you should know about:

  • Network is on by default, and it's a toggle, not a firewall. The default is full host passthrough โ€” no filtering (localhost, LAN, cloud metadata, arbitrary egress); --no-net (network=False) fully isolates a box. This is the load-bearing gap for adversarial fleets: a swarm of network-on boxes is an exfiltration surface multiplied by N. Run untrusted/multi-tenant boxes with --no-net; filtered per-host egress is post-v1.
  • Resource limits need systemd user-cgroup delegation. Without it, limits degrade to unenforced with a warning (temenos doctor shows the mode) โ€” don't run adversarial work there.
  • Per-tenant authz/quotas are in progress. The box-per-owner isolation invariant holds today; tenant-scoped tokens and aggregate quotas are the platform-tier roadmap.
  • WSL2 uses the ptrace platform (no /dev/kvm). Slower, but the security model โ€” the gVisor sentry โ€” is identical to kvm/systrap.
  • Side channels between co-resident boxes are out of scope for v1.
  • Not a defense against a malicious agent binary โ€” see the threat model.

Run tests/leak/ against your host and re-run it when your harness upgrades (new tools are new holes). A config isn't "supported" until it's green.

๐Ÿ—๏ธ Architecture

Layer 3  surfaces      server/ (FastAPI REST + per-box MCP) ยท cli.py
Layer 2ยฝ registry      manager.py (BoxManager: ids, fleet lifecycle, checkpoint loop)
Layer 2  box           box.py (exec/read/write/list, audit, checkpoint)
Layer 1  backend       backends/ (gVisor: OCI bundle, held-run+exec, overlay, systemd scope)
Layer 0  data          policy.py ยท result.py ยท storage.py ยท exceptions.py  (pure, no OS calls)

BoxManager (Layer 2ยฝ) is the hinge: it's the local-swarm registry and the multi-tenant control plane โ€” one piece of code, two reach. Lower layers never import higher ones; REST, MCP, and the CLI are all the same Policy โ†’ Box โ†’ ExecResult path. Delete server/ and the core still works.

๐Ÿงช Development

pip install -e ".[all,dev]"
PYTHONPATH=. pytest                       # full suite
PYTHONPATH=. pytest tests/leak/ -v        # the containment gate (needs gVisor)
TEMENOS_NET_TESTS=1 pytest tests/test_image_mmdebstrap.py   # opt-in network e2e

Tests that need gVisor / mmdebstrap / network are gated and skip cleanly without them.

๐Ÿ“ Status

Pre-1.0 (0.2.0). v1 is feature-complete and leak-tested on Linux + gVisor; the API may still shift before 1.0. Roadmap, ordered by where the value is:

  1. Fleet fan-out ergonomics โ€” mgr.map(...) over N boxes, batch lifecycle, aggregate audit.
  2. Filtered network egress โ€” per-host SNI/allowlist proxy, so swarm boxes get contained network instead of all-or-nothing (the biggest gap for adversarial fleets).
  3. Per-tenant authz & quotas โ€” tenant-scoped tokens, aggregate caps + backpressure.
  4. macOS (Seatbelt) backend โ€” see macos_plan.md.
  5. True diff-vs-original, a remote (over-the-daemon) attach, persisted audit logs. (Local interactive PTY shells already work โ€” temenos shell / temenos exec -it.)

๐Ÿ™ Credits

temenos stands on:

temenos's contribution is the composition: trusted agents on the host, untrusted-code boxes underneath, one daemon that scales it from a single repo to a fleet, and the wiring that makes each box the sole execution path.

๐Ÿ“„ License

Apache-2.0 ยฉ temenos contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temenos-0.2.0.tar.gz (108.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

temenos-0.2.0-py3-none-any.whl (59.4 kB view details)

Uploaded Python 3

File details

Details for the file temenos-0.2.0.tar.gz.

File metadata

  • Download URL: temenos-0.2.0.tar.gz
  • Upload date:
  • Size: 108.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for temenos-0.2.0.tar.gz
Algorithm Hash digest
SHA256 40bc2f2e6972a2ebc13dfdb211159c166c69b9345bc0d177b6cfcd5f03451c29
MD5 b2889584dd5a986cef3db579d21ceccb
BLAKE2b-256 967fd4a94f414e96b53c087c13e28909e5391034df098cb0719f0054e48404ad

See more details on using hashes here.

File details

Details for the file temenos-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: temenos-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 59.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for temenos-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe1b4155b7aee46ade10c39e65ff6e8accebb3568654edb3f7bb27a70f80ebbb
MD5 a6085bc9a96ceed005176a5465c61939
BLAKE2b-256 3662db147d37fff897b3ff1453a0cc24e100938ebe229944a137eb26a2a19d07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page