Skip to main content

Lightweight process sandbox using Landlock, seccomp, and seccomp user notification

Project description

Sandlock

Lightweight process sandbox for Linux. Confines untrusted code using Landlock (filesystem + network + IPC), seccomp-bpf (syscall filtering), and seccomp user notification (resource limits, IP enforcement, /proc virtualization). No root, no cgroups, no containers.

sandlock run -w /tmp -r /usr -m 512M -- python3 untrusted.py

Why Sandlock?

Containers and VMs are powerful but heavy. Sandlock targets the gap: strict confinement without image builds, overlay filesystems, or root privileges.

Feature Sandlock Container MicroVM (Firecracker) gVisor
Root required No Yes* Yes (KVM) Yes
Image build No Yes Yes Yes
Startup time ~1 ms (fork) ~200 ms ~100 ms ~100 ms
Kernel Shared Shared Separate guest Shared (sentry)
Filesystem isolation Landlock Overlay Block-level (QCOW2) ptrace/KVM
Network isolation Landlock + seccomp notif Network namespace TAP device Sentry kernel
Syscall filtering seccomp-bpf seccomp N/A (full kernel) Sentry kernel
Resource limits seccomp notif + rlimit cgroup v2 VM config cgroup v2
Memory sharing COW (fork), zero-copy Bind-mount + re-init Shared mem (explicit) N/A
Nesting Native (fork) Complex (DinD/DooD) Not supported Supported
Checkpoint/restore ptrace + BranchFS CRIU VM snapshot N/A

* Rootless containers exist but require user namespace support, /etc/subuid configuration, and fuse-overlayfs.

Requirements

  • Linux 5.13+ (Landlock ABI v1), Python 3.10+
  • No root, no cgroups, no C compiler (all kernel interfaces via ctypes)
Feature Minimum kernel
seccomp user notification 5.6
Landlock filesystem rules 5.13
Landlock TCP port rules 6.7 (ABI v4)
Landlock IPC scoping 6.12 (ABI v6)

Quick Start

CLI

# Basic confinement
sandlock run -r /usr -r /lib -w /tmp -- ls /tmp

# Interactive shell
sandlock run -i -r /usr -r /lib -r /lib64 -r /bin -r /etc -w /tmp -- /bin/sh

# Resource limits + timeout
sandlock run -m 512M -P 20 -t 30 -- ./compute.sh

# Domain-based network isolation
sandlock run --net-allow-host api.openai.com -r /usr -r /lib -r /etc -- python3 agent.py

# TCP port restrictions (Landlock)
sandlock run --net-bind 8080 --net-connect 443 -r /usr -r /lib -r /etc -- python3 server.py

# IPC scoping + clean environment
sandlock run --isolate-ipc --isolate-signals --clean-env --env CC=gcc \
  -r /usr -r /lib -w /tmp -- make

# Use a saved profile (CLI flags override profile values)
sandlock run -p build -- make -j4
sandlock run -p build --max-memory 1G -- make

# Profile management
sandlock profile list
sandlock profile show build

Profiles

Save reusable policies as TOML files in ~/.config/sandlock/profiles/:

# ~/.config/sandlock/profiles/build.toml
fs_writable = ["/tmp/work"]
fs_readable = ["/usr", "/lib", "/lib64", "/bin", "/etc"]
clean_env = true
isolate_ipc = true
max_memory = "512M"
max_processes = 50

[env]
CC = "gcc"
LANG = "C.UTF-8"

Field names match Policy exactly. Unknown fields are rejected. CLI flags override profile values.

Python API

from sandlock import Sandbox, Policy

policy = Policy(
    fs_writable=["/tmp/sandbox"],
    fs_readable=["/usr", "/lib", "/etc"],
    max_memory="256M",
    max_processes=10,
    isolate_ipc=True,
    clean_env=True,
    env={"LANG": "C.UTF-8"},
)

# Run a command
result = Sandbox(policy).run(["python3", "-c", "print('hello')"])

# Run a Python function
result = Sandbox(policy).call(lambda: sum(range(1_000_000)))
print(result.value)  # 499999500000

# Use a saved TOML profile
result = Sandbox("build").run(["make", "-j4"])

Long-lived and nested sandboxes:

with Sandbox(policy) as sb:
    sb.exec(["python3", "server.py"])
    sb.pause()       # Freeze all processes atomically
    sb.resume()
    sb.update(max_memory="128M")  # Adjust limits live
    exit_code = sb.wait(timeout=30)

# Nested: child inherits parent's constraints
with Sandbox(parent_policy) as parent:
    result = parent.sandbox(child_policy).call(untrusted_fn)

Architecture

Sandlock applies confinement in sequence after fork():

Parent                              Child
  │  fork()                           │
  │──────────────────────────────────>│
  │                                   ├─ 1. setpgid(0,0)
  │                                   ├─ 2. unshare(NEWUSER) (if privileged)
  │                                   ├─ 3. RLIMIT_CPU (if max_cpu)
  │                                   ├─ 4. chroot (optional)
  │                                   ├─ 5. Landlock (fs + net + IPC, irreversible)
  │                                   ├─ 6. Combined seccomp filter (irreversible)
  │                                   │     └─ send notify fd ──────> Parent
  │  receive notify fd                ├─ 7. Close fds 3+
  │  start supervisor thread          ├─ 8. Environment (clean_env + env)
  │  pidfd_open() + poll()            └─ 9. exec(cmd) or target()

Each layer is defense-in-depth: bypassing one doesn't defeat the others.

Landlock: Filesystem + Network + IPC

Landlock restricts filesystem, network, and IPC access without root. Rules are applied via landlock_restrict_self() and are irreversible.

Policy(
    fs_writable=["/tmp/work"],
    fs_readable=["/usr", "/lib"],        # ABI v1+: filesystem
    net_bind=[8080],                      # ABI v4+: TCP ports
    net_connect=[443, "5432"],
    isolate_ipc=True,                     # ABI v6+: block abstract UNIX sockets to host
    isolate_signals=True,                 # ABI v6+: block signals to host processes
)

seccomp-bpf: Syscall Filtering

A cBPF filter blocks dangerous syscalls at the kernel entry point: ptrace, mount, unshare, setns, kexec_load, bpf, perf_event_open, ioperm/iopl, and more. Argument-level filtering blocks namespace flags in clone/clone3 and TIOCSTI in ioctl. Supports x86_64 and aarch64.

Resource Limits

Enforced via seccomp user notification and rlimit — no cgroups or root required.

Resource Mechanism Enforcement
Memory seccomp notif on mmap/munmap/brk/mremap ENOMEM when over budget
Processes seccomp notif on clone/fork/vfork EAGAIN when at limit
CPU RLIMIT_CPU per-process SIGXCPUSIGKILL

Network: Domain-Based Access Control

Specify allowed hostnames — everything else is blocked:

Policy(net_allow_hosts=["api.openai.com", "github.com"])

Two cooperating layers: (1) /etc/hosts virtualization via seccomp notif (programs see clean "host not found" for unlisted domains), and (2) connect()/sendto() IP enforcement (blocks hardcoded IPs, raw DNS over UDP, etc.).

/proc and /sys Virtualization

Seccomp user notification intercepts open()/openat() for path-based rules. Three actions: ALLOW (continue), DENY (return -EACCES), VIRTUALIZE (inject memfd with fake content). Auto-enabled when -r /proc is passed.

Denies: /proc/kcore, /proc/kallsyms, /proc/keys, /sys/kernel/, /sys/firmware/. Virtualizes: /proc/self/mounts, /proc/self/mountinfo. PID isolation: blocks access to /proc/<foreign_pid>/.

BranchFS: Copy-on-Write Filesystem

With fs_isolation=BRANCHFS, sandbox writes go to an isolated COW branch on a BranchFS FUSE mount. Writes are committed on success, discarded on failure. Nested sandboxes create child branches.

Policy(
    fs_isolation=FsIsolation.BRANCHFS,
    fs_mount="/mnt/workspace",
    on_exit=BranchAction.COMMIT,
    on_error=BranchAction.ABORT,
)

Checkpoint/Restore

Two-layer checkpoint without CRIU or root:

  1. OS-level: ptrace + /proc dumps registers, memory, and file descriptors (transparent to child)
  2. App-level: optional save_fn callback for application state

Combined with BranchFS O(1) snapshots for full checkpoint/restore.

with Sandbox(policy) as sb:
    sb.exec(["python3", "server.py"], save_fn=lambda: serialize_state())
    cp = sb.checkpoint()
    cp.save("my-env")       # persist to disk

# Later: restore
cp = Checkpoint.load("my-env")
Sandbox.from_checkpoint(cp, restore_fn=lambda state: load_state(state))

Privileged Mode

privileged=True maps UID 0 inside a user namespace. The child appears as root but Landlock + seccomp still apply — "root" cannot escape confinement.

Policy Reference

@dataclass(frozen=True)
class Policy:
    # Filesystem (Landlock)
    fs_writable: Sequence[str] = []     # Read/write access
    fs_readable: Sequence[str] = []     # Read-only access
    fs_denied: Sequence[str] = []       # Explicitly denied

    # Syscall filtering (seccomp)
    deny_syscalls: Sequence[str] | None = None   # None = default blocklist
    allow_syscalls: Sequence[str] | None = None  # Allowlist mode (stricter)

    # Network
    net_allow_hosts: Sequence[str] = []     # Domain allowlist (seccomp notif)
    net_bind: Sequence[int | str] = []      # TCP bind ports (Landlock ABI v4+)
    net_connect: Sequence[int | str] = []   # TCP connect ports (Landlock ABI v4+)

    # IPC scoping (Landlock ABI v6+)
    isolate_ipc: bool = False       # Block abstract UNIX sockets to host
    isolate_signals: bool = False   # Block signals to host processes

    # Resources (seccomp notif + rlimit)
    max_memory: str | int | None = None  # '512M'
    max_processes: int | None = None     # per-sandbox fork count
    max_cpu: str | None = None           # '50%' per-process

    # Environment
    clean_env: bool = False              # Minimal env (PATH, HOME, TERM, LANG, USER, SHELL)
    env: Mapping[str, str] = {}          # Set/override env vars

    # BranchFS COW isolation
    fs_isolation: FsIsolation = NONE     # NONE | BRANCHFS
    fs_mount: str | None = None          # BranchFS mount point
    on_exit: BranchAction = COMMIT       # COMMIT | ABORT | KEEP
    on_error: BranchAction = ABORT

    # Misc
    chroot: str | None = None
    close_fds: bool = True
    strict: bool = True          # Abort on confinement failure
    privileged: bool = False     # UID 0 inside user namespace
    notif_policy: NotifPolicy | None = None

strict=False degrades gracefully when Landlock or seccomp is unavailable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandlock-0.1.0.tar.gz (85.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandlock-0.1.0-py3-none-any.whl (73.6 kB view details)

Uploaded Python 3

File details

Details for the file sandlock-0.1.0.tar.gz.

File metadata

  • Download URL: sandlock-0.1.0.tar.gz
  • Upload date:
  • Size: 85.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sandlock-0.1.0.tar.gz
Algorithm Hash digest
SHA256 55185c797acd0338f613692cd475a19a5c47dd03ffd329f00c6d8a7588757862
MD5 acbfea74dd647b22ec85334949ea5277
BLAKE2b-256 8a4e9f3aaa49c77e7c07bad151ed42e4863ff187d6c6c9b65c4de8c75690836b

See more details on using hashes here.

File details

Details for the file sandlock-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sandlock-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 73.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sandlock-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11eca23ae336a93590a0d3f83ab3b82b7d21b587155f3edafbce77d405299abe
MD5 d6ce27332662166c57a4fbe45a57aefe
BLAKE2b-256 89c13f8c893928365ff3dbd20dfc501711c30b4b93752fdeb17ec87ada623044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page