Skip to main content

HPC cluster daemon for bridging AI agents to remote compute resources

Project description

HPC Daemon

A lightweight daemon that runs on HPC clusters or remote servers, bridging AI agents to local PTY shells and SLURM job schedulers over WebSocket.

How It Works

AI agents run in isolated cloud sandboxes and cannot directly reach machines behind firewalls. The daemon solves this with a reverse-proxy pattern:

  1. The daemon runs on your server and opens an outbound WebSocket to the Tiptree platform
  2. The agent client also connects outbound to the same platform
  3. The platform routes messages between them, keyed on your user identity

Because the daemon initiates the connection, no inbound firewall rules are needed.

Execution Modes

  • PTY mode (synchronous) — interactive shell sessions for quick commands
  • Job mode (asynchronous) — batch job submission with automatic wake-on-complete callbacks that are persisted locally and retried until delivery succeeds

The daemon auto-detects SLURM. If sbatch is available, jobs go through SLURM; otherwise they run as local background processes.

Installation

pip install .          # or: python3 -m pip install .

This installs the hpc-daemon command and its only dependency (websockets).

Setup

hpc-daemon setup \
    --email you@example.com \
    --url https://tiptreesystems.com

The setup wizard:

  1. Authenticates via a one-time code sent to your email
  2. Presents a disclaimer about remote code execution risks
  3. Creates an API key for daemon authentication
  4. Prompts for an optional skill (built-in cluster-specific guidance, or a custom server description)
  5. Prompts for directory restrictions (where the agent is allowed to write; defaults to $SCRATCH/tiptree-workspace or ~/tiptree-workspace)
  6. Prompts for a job working directory and optional server instructions
  7. Registers the daemon with the platform

The daemon ID defaults to the machine's hostname. Override with --daemon-id.

Re-running setup for the same daemon ID updates the existing profile (no duplicates).

Non-Interactive Setup

For automated deployments:

hpc-daemon setup \
    --email you@example.com \
    --url https://tiptreesystems.com \
    --no-interview \
    --allowed-dirs ~/workspace ~/scratch \
    --skill mila-hpc \
    --cluster-name my-cluster

Use --cluster-name to set a human-readable name for the cluster (defaults to the machine's hostname).

Warning: --no-interview skips all interactive prompts (disclaimer, skill selection, directory restrictions, working directory, server instructions). OTP is still required. Without --allowed-dirs, the agent gets unrestricted filesystem write access.

Running

# Start in foreground
hpc-daemon start

# Start in background (persists after logout)
nohup hpc-daemon start 2>&1 &

# Check status
hpc-daemon status

# View logs
tail -f ~/.hpc_daemon/<daemon_id>.log

# Stop
hpc-daemon stop

# List registered daemons
hpc-daemon list

If only one daemon profile exists, --daemon-id is auto-detected. With multiple profiles, specify it explicitly (e.g., hpc-daemon start --daemon-id mila-login-1).

To force local mode on a SLURM cluster (jobs run as background processes instead of sbatch):

LOCAL_MODE=1 hpc-daemon start

State

All configuration, job records, and callback delivery state are stored in ~/.hpc_daemon/state.db (SQLite). PID files and logs live in the same directory.

Safety

The daemon enforces directory restrictions via bash function wrappers injected into PTY sessions. File-modifying commands (rm, rmdir, mv, cp, mkdir, touch, tee), directory navigation (cd, pushd, popd), and output redirections are validated against the allowed directory list configured during setup.

Job ownership is also enforced: code assistants can only cancel jobs they submitted.

These guardrails are shell-level and not a hard security boundary. They prevent accidental damage, not a determined adversary.

Shared machines: Your API key is stored in ~/.hpc_daemon/state.db. The file and directory are restricted to your user account (0600/0700), so other users on the same login node cannot read it. However, if multiple people share the same Unix account, they all have access. Do not use the daemon on a shared account.

Project Structure

hpc_daemon/
├── core/
│   ├── cli.py            # CLI (setup, start, stop, status, list)
│   ├── config.py         # Runtime config, SLURM detection
│   ├── ws.py             # WebSocket connection and message routing
│   └── pty_session.py    # PTY shell session management
├── setup/
│   ├── wizard.py         # Setup wizard and daemon registration
│   └── prompts.py        # Interactive setup prompts
├── jobs/
│   ├── handlers.py       # Job submit/status handlers
│   ├── interfaces.py     # SlurmInterface, LocalJobInterface
│   ├── models.py         # JobRecord, JobState
│   └── monitor.py        # Background job polling and callbacks
├── api_client/
│   ├── auth.py           # OTP signin, API key creation
│   ├── daemon_registry.py # Daemon registration
│   └── http.py           # HTTP client utilities
├── skills/
│   ├── __init__.py       # Skill discovery and loading
│   └── mila-hpc.md       # Built-in Mila cluster skill
├── db.py                 # SQLite database (profiles, jobs)
├── guardrails.py         # Directory restriction enforcement
└── __main__.py           # Entry point

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasc_hpc_daemon-0.1.0.tar.gz (43.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tasc_hpc_daemon-0.1.0-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file tasc_hpc_daemon-0.1.0.tar.gz.

File metadata

  • Download URL: tasc_hpc_daemon-0.1.0.tar.gz
  • Upload date:
  • Size: 43.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for tasc_hpc_daemon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 67d677979495cbad290d0d12aba1495536474f17a75e816e7c581246e936d308
MD5 5dc80c2951138f45acc8289313347f2b
BLAKE2b-256 c6be84eea604965f97a16f426c1c93f6405926b3c012f261f838ecc453820f05

See more details on using hashes here.

File details

Details for the file tasc_hpc_daemon-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tasc_hpc_daemon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4119c86833527e8f682bf0a9d18e6e5856a67ea27b93291d42660aae704dc48d
MD5 9fed28acc418dd5ef36f74d7ef58f14d
BLAKE2b-256 122f4859f29eabbcb4e641eb1471c8cd3c6770947d0db699452af6d4e930b2ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page