Untrusted-code containment for a trusted agent — named gVisor boxes with a Python-native policy

These details have not been verified by PyPI

Project links

Project description

temenos

A secure runtime for AI agents. 🏛️

Your agent runs on the host — the code it executes runs in a gVisor box.

cd ~/code/my-repo
temenos claude        # Claude runs on the host; everything it *executes* runs in a box

That one command keeps Claude Code where it works best — on the host, with its auth, updates, and model API intact — while banning every native host-touching tool (Bash, Read, Write, Edit, WebFetch, …) and routing its only execution path through a box: a rootless gVisor sandbox with a small, Python-native policy.

A shell that tries to rm -rf ~, read ~/.ssh/id_rsa, or curl evil.com is contained — not because the model promised to behave, but because the sandbox boundary won't let it. The agent is trusted; the code it runs is not. 🛡️

temenos (τέμενος): a bounded precinct — a space set apart with a clear edge.

✨ Highlights

🏛️ Agent on the host, execution in a box. No broken updates, no API keys plumbed into a container, no re-auth. Only the code the agent runs is sandboxed.
🔒 Real isolation, not a syscall allowlist. gVisor is a userspace kernel — the host filesystem is invisible beyond what policy mounts, network is off by default, and most kernel-CVE surface is intercepted before it reaches the host.
🚫 Sole-execution-path, enforced. temenos claude denies native tools and exposes only mcp__temenos__exec/read/write/list over MCP, with --strict-mcp-config so a stray .mcp.json can't re-open a host-capable server.
📦 Boxes are first-class. Named, persistent, checkpointed, inspectable — temenos exec, temenos shell, temenos diff, temenos audit. Everything lives in a .temenos/<box>/ you can rm -rf.
💾 Durable by default. Background checkpoint (gVisor fscheckpoint, ~30 ms) + restore on next use — re-run temenos claude in a repo and you resume where you left off.
🐍 A clean core API. Policy → Box → ExecResult. The CLI and MCP server are thin layers over the same Box you can use directly from Python. Core has zero runtime deps.
🧪 Leak-tested. A containment battery (tests/leak/) is the acceptance gate: no host write, host secrets invisible, no network, /proc escape blocked, memory cap OOM-kills.

🤔 What it is

A runtime that gives a trusted agent an untrusted-code execution surface. You point a harness (Claude Code today; any MCP-capable agent in principle) at a box and remove its host-touching tools. The agent keeps editing your real files and calling its model — but every bash/python/file/network action it takes happens inside gVisor, under a policy you set, observable and reversible.

🚫 What it is NOT

Not…	Because
A Docker / container runtime	It doesn't package or ship services. It wraps gVisor to confine an agent's execution and mounts your real repo live — the unit is a task, not an image.
A VM-per-task sandbox	The agent stays on the host (auth, updates, model API intact). Spinning a VM per task throws all that away; temenos boxes only what runs.
A seccomp / AppArmor filter	gVisor is a full userspace kernel, not a syscall allowlist bolted onto the host kernel — a categorically larger isolation boundary.
A defense against a malicious agent	The threat model trusts the agent binary. temenos contains the untrusted code the agent runs, not the agent itself.
A network firewall	v1 network is a toggle: off (isolated) or full passthrough (no filtering). Filtered per-host egress is post-v1.

⚖️ How it compares

	temenos	Docker container	VM per task	firejail / bubblewrap	prompt guardrails
Isolation boundary	userspace kernel (gVisor)	shared host kernel + ns	hardware	shared kernel + seccomp/ns	none
Agent stays on host (auth/updates intact)	✅	⚠️ (boxed → loses host context)	❌	⚠️ partial	✅
Sole-execution-path for an agent	✅ built-in (deny natives + MCP)	🔧 DIY	🔧 DIY	🔧 DIY	❌ (trust the model)
Kernel-CVE surface	low	high	low	high	n/a
Per-task object (named, checkpointed, inspectable)	✅	✅ (containers)	⚠️ heavy	❌	❌
Setup per task	low (rootless, a box dir)	medium	high	low	none

In short: containers and VMs isolate whole programs you ship; firejail filters syscalls on the host kernel; prompt-level guardrails ask nicely. temenos isolates the code a trusted agent runs, keeps the agent on the host, and makes the box a first-class, inspectable object. It builds on gVisor and the Model Context Protocol. 🙂

🧩 How it works

   you ──► claude (host)
              │  native tools BANNED (--disallowedTools, --strict-mcp-config)
              │  only mcp__temenos__* ALLOWED
              ▼
        temenos daemon  ──HTTP /mcp/<box-id>──►  Box (gVisor / runsc)
        (one per user)                            • host /usr,/etc bound read-only
                                                  • repo mounted (live-writable by default)
                                                  • network off · mem/cpu/pid capped
                                                  • writes land in an overlay

A box = a Policy + a gVisor runtime + a data dir. One daemon per user auto-spawns on first use and serves a REST control plane (the CLI) and a per-box MCP data plane (the agent). Boxes are keyed by the hash of their data dir, so two repos' default boxes never collide. For the full design, decisions, and verification log, see plan.md.

📦 Install

temenos is Linux + gVisor for v1 (macOS is on the roadmap).

1. gVisor (runsc) — the sandbox. (official guide)

ARCH=$(uname -m)
wget https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}/runsc
chmod +x runsc && sudo mv runsc /usr/local/bin/

2. temenos

pip install "temenos[all]"        # daemon + MCP + CLI
# or from a checkout:
git clone https://github.com/farizrahman4u/temenos && cd temenos
pip install -e ".[all,dev]"

The core library has zero runtime deps; [all] pulls FastAPI/uvicorn/mcp/httpx for the daemon and CLI. The bare temenos image … commands work without extras.

3. (optional) mmdebstrap — to build clean box base images (so boxes can apt/pip/npm install into a writable system). Without it you boot against the host's read-only /usr.

sudo apt-get install mmdebstrap

4. Check your host:

$ temenos doctor
gVisor (runsc):     yes
  platform:         ptrace          # kvm on bare metal, systrap on most VMs, ptrace on WSL2
mmdebstrap:         yes
systemd-run:        yes             # required to ENFORCE memory/cpu limits (see Limits)

🚀 Quickstart

The project flow (git-style)

cd ~/code/my-repo

temenos create                       # makes .temenos/default in this repo (+ .gitignore)
temenos exec default -- python3 -c "print(6*7)"
temenos shell default                # a minimal REPL inside the box
temenos ls                           # boxes the daemon is running
temenos audit default                # what ran in the box
temenos diff default                 # files under the box's write paths
temenos rm default                   # stop + delete the box

A bare box name resolves project-first (.temenos/<name>, walking up from CWD), then global (~/.local/share/temenos/boxes/<name>); a project box shadows a global one of the same name (with a warning).

Attach Claude Code to a box

temenos claude                       # box 'default' in this repo
temenos claude --box review --net    # a separate box, network on
temenos claude --dry-run             # print the exact claude invocation, don't launch
temenos claude -- --model opus       # args after `--` go to claude

The repo mounts live-writable, so the agent's edits land in your real files — the sandbox contains execution, not the trusted agent's edits. --ephemeral flips the repo to read-only.

The Python API (the core; CLI/MCP are thin layers over it)

from temenos import Box, Policy

with Box("demo", Policy(network=False, write=["/home/me/out"])) as box:
    box.write_file("/home/me/out/run.py", "print(6 * 7)\n")
    r = box.exec(["python3", "/home/me/out/run.py"])
    print(r.stdout, r.exit_code)        # "42\n", 0

    box.exec(["cat", "/etc/shadow"]).ok          # -> False  (host invisible)

Policy is frozen and secure by default (Policy() = no network, no host writes, tight limits). restrict() derives child policies that can only narrow — widening raises PolicyViolation. A runnable version is in examples/python_api.py.

🧠 The one design decision

The agent runs on the host; only what it executes runs in a box. That single split is what makes temenos both usable and safe: the agent keeps its identity, updates, and model access (so it actually works), while every command it issues crosses a hard sandbox edge (so it can't hurt you). Everything else — the MCP data plane, the banned-natives wiring, the checkpointing box — exists to make that split airtight and the "code" the agent runs the sole execution path. Reading your project directly, or shelling out on the host, would be the spillover temenos prevents.

🗂️ CLI reference

Command	What it does
`temenos doctor`	gVisor/platform/mmdebstrap/systemd capability check
`temenos image build NAME [--from mmdebstrap\|minimal\|host-copy\|download]`	build a box base image
`temenos image ls` · `rm NAME`	list / remove images
`temenos serve [--port]`	run the per-user daemon (auto-spawned otherwise)
`temenos create [NAME] [flags]`	create/ensure a box in this project
`temenos ls`	list running boxes (project boxes marked)
`temenos exec NAME -- CMD…`	run a command in a box
`temenos shell NAME`	minimal REPL in a box
`temenos rm NAME [--keep-data]`	stop + delete a box
`temenos audit NAME` · `diff NAME`	audit log / write-set manifest
`temenos claude [--box N] [flags] [-- claude-args]`	attach Claude with natives banned
`temenos version`	print version

Box-creation flags (on create and claude): --image NAME, --net, --scratch disk\|memory, --force-memory, --ephemeral-fs (never checkpoint), --no-autosave (checkpoint only on close), --ephemeral (repo read-only), --volume HOST:TARGET[:ro\|rw], --memory MB, --cpu SECONDS, --global.

🛡️ Threat model & honest limits

The agent is trusted (you installed it; it authenticates as you; it isn't trying to escape). The code it runs is untrusted — model-authored shell/python that may be buggy, prompt-injected, or hostile. temenos's job is the sole-execution-path guarantee: every bit of that code goes through a box, and a box can't touch the host beyond its policy.

Property	Status (v1, gVisor)
Filesystem escape	blocked — host invisible beyond policy mounts; `/proc/1/root` is the box
Host writes outside policy	blocked — `/usr`,`/etc` read-only; writes go to an overlay
Network exfiltration	blocked when `network=off` (isolated netns)
Kernel-CVE surface	mostly blocked — gVisor intercepts syscalls in userspace
Memory/CPU/pid exhaustion	enforced via a per-box `systemd` scope (needs delegation — below)

Limits you should know about:

Network is a toggle, not a firewall. --net is full host passthrough — no filtering (localhost, LAN, cloud metadata, arbitrary egress). Operator opt-in, unsafe for adversarial multi-tenant use. Filtered egress is post-v1.
Resource limits need systemd user-cgroup delegation. Without it, limits degrade to unenforced with a warning (temenos doctor shows the mode) — don't run adversarial work there.
WSL2 uses the ptrace platform (no /dev/kvm). Slower, but the security model — the gVisor sentry — is identical to kvm/systrap.
Side channels between co-resident boxes are out of scope for v1.
Not a defense against a malicious agent binary — see the threat model.

Run tests/leak/ against your host and re-run it when your harness upgrades (new tools are new holes). A config isn't "supported" until it's green.

🏗️ Architecture

Layer 3  surfaces      server/ (FastAPI REST + per-box MCP) · cli.py
Layer 2½ registry      manager.py (BoxManager: ids, lifecycle, checkpoint loop)
Layer 2  box           box.py (exec/read/write/list, audit, checkpoint)
Layer 1  backend       backends/ (gVisor: OCI bundle, held-run+exec, overlay, systemd scope)
Layer 0  data          policy.py · result.py · storage.py · exceptions.py  (pure, no OS calls)

Lower layers never import higher ones; REST, MCP, and the CLI are all the same Policy → Box → ExecResult path. Delete server/ and the core still works.

🧪 Development

pip install -e ".[all,dev]"
PYTHONPATH=. pytest                       # full suite
PYTHONPATH=. pytest tests/leak/ -v        # the containment gate (needs gVisor)
TEMENOS_NET_TESTS=1 pytest tests/test_image_mmdebstrap.py   # opt-in network e2e

Tests that need gVisor / mmdebstrap / network are gated and skip cleanly without them.

📍 Status

Pre-1.0 (0.1.0). v1 is feature-complete and leak-tested on Linux + gVisor; the API may still shift before 1.0. Roadmap: filtered egress, a macOS backend, true diff-vs-original, an interactive PTY/attach, persisted audit logs.

🙏 Credits

temenos stands on:

gVisor — the userspace kernel that is the actual sandbox.
Model Context Protocol — the agent-facing tool plane.

temenos's contribution is the composition: a trusted agent on the host, an untrusted-code box underneath, and the wiring that makes the box the sole execution path.

📄 License

Apache-2.0 © temenos contributors

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Jun 7, 2026

0.2.0

Jun 7, 2026

This version

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temenos-0.1.0.tar.gz (83.6 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

temenos-0.1.0-py3-none-any.whl (55.7 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file temenos-0.1.0.tar.gz.

File metadata

Download URL: temenos-0.1.0.tar.gz
Upload date: Jun 7, 2026
Size: 83.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for temenos-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aa9441f20689fe8b80ee8c16fcecccedebcc7a46bef554683f4cbe586c050326`
MD5	`dd5c06aa4acbfa99cf737f8c3f696e6a`
BLAKE2b-256	`26dcdfd95c6905272f400731c4eb6ddf0cc96c0b295419e45c5623d96b56fe40`

See more details on using hashes here.

File details

Details for the file temenos-0.1.0-py3-none-any.whl.

File metadata

Download URL: temenos-0.1.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 55.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for temenos-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95b632204676d4f29e7d969484cfcb97dd4a7667272033f736822a96ecc36099`
MD5	`e4a8455a83dffe92bc534d01ca61537d`
BLAKE2b-256	`0b5a34c3e276722bae5c92d16f41e1592759e432916708868ce0705bd4b294f3`

See more details on using hashes here.

temenos 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

temenos

✨ Highlights

🤔 What it is

🚫 What it is NOT

⚖️ How it compares

🧩 How it works

📦 Install

🚀 Quickstart

The project flow (git-style)

Attach Claude Code to a box

The Python API (the core; CLI/MCP are thin layers over it)

🧠 The one design decision

🗂️ CLI reference

🛡️ Threat model & honest limits

🏗️ Architecture

🧪 Development

📍 Status

🙏 Credits

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes