Skip to main content

HPC cluster daemon for bridging AI agents to remote compute resources

Project description

HPC Daemon

A lightweight daemon that runs on HPC clusters or remote servers, bridging AI agents to local PTY shells and SLURM job schedulers over WebSocket.

How It Works

AI agents run in isolated cloud sandboxes and cannot directly reach machines behind firewalls. The daemon solves this with a reverse-proxy pattern:

  1. The daemon runs on your server and opens an outbound WebSocket to the Tiptree platform
  2. The agent client also connects outbound to the same platform
  3. The platform routes messages between them, keyed on your user identity

Because the daemon initiates the connection, no inbound firewall rules are needed.

Execution Modes

  • PTY mode (synchronous) — interactive shell sessions for quick commands
  • Job mode (asynchronous) — batch job submission with automatic wake-on-complete callbacks that are persisted locally and retried until delivery succeeds

The daemon auto-detects SLURM. If sbatch is available, jobs go through SLURM; otherwise they run as local background processes.

Installation

pip install tasc-hpc-daemon

This installs the hpc-daemon command and its only dependency (websockets).

Setup

hpc-daemon setup \
    --email you@example.com \
    --url https://althea.tiptreesystems.com

For the development environment, use:

hpc-daemon setup \
    --email you@example.com \
    --url https://althea.dev.tiptreesystems.com

The --url value must be the Tiptree app/platform base URL that exposes /otp, /auth, and /hpc routes. Do not use the marketing domain, such as https://tiptreesystems.com or https://dev.tiptreesystems.com.

The setup wizard:

  1. Authenticates via a one-time code sent to your email
  2. Presents a disclaimer about remote code execution risks
  3. Creates an API key for daemon authentication
  4. Prompts for an optional skill (built-in cluster-specific guidance, or a custom server description)
  5. Prompts for directory restrictions (where the agent is allowed to write; defaults to $SCRATCH/tiptree-workspace or ~/tiptree-workspace)
  6. Prompts for a job working directory and optional server instructions
  7. Registers the daemon with the platform

The daemon ID defaults to the machine's hostname. Override with --daemon-id.

Re-running setup for the same daemon ID updates the existing profile (no duplicates).

Non-Interactive Setup

For automated deployments:

hpc-daemon setup \
    --email you@example.com \
    --url https://althea.tiptreesystems.com \
    --no-interview \
    --allowed-dirs ~/workspace ~/scratch \
    --skill mila-hpc \
    --cluster-name my-cluster

Use --cluster-name to set a human-readable name for the cluster (defaults to the machine's hostname).

Warning: --no-interview skips all interactive prompts (disclaimer, skill selection, directory restrictions, working directory, server instructions). OTP is still required. Without --allowed-dirs, the agent gets unrestricted filesystem write access.

Running

# Start in foreground
hpc-daemon start

# Start in background (persists after logout)
nohup hpc-daemon start 2>&1 &

# Check status
hpc-daemon status

# View logs
tail -f ~/.hpc_daemon/<daemon_id>.log

# Stop
hpc-daemon stop

# List registered daemons
hpc-daemon list

If only one daemon profile exists, --daemon-id is auto-detected. With multiple profiles, specify it explicitly (e.g., hpc-daemon start --daemon-id mila-login-1).

To force local mode on a SLURM cluster (jobs run as background processes instead of sbatch):

LOCAL_MODE=1 hpc-daemon start

State

All configuration, job records, and callback delivery state are stored in ~/.hpc_daemon/state.db (SQLite). PID files and logs live in the same directory.

Safety

The daemon enforces directory restrictions via bash function wrappers injected into PTY sessions. File-modifying commands (rm, rmdir, mv, cp, mkdir, touch, tee), directory navigation (cd, pushd, popd), and output redirections are validated against the allowed directory list configured during setup.

Job ownership is also enforced: code assistants can only cancel jobs they submitted.

These guardrails are shell-level and not a hard security boundary. They prevent accidental damage, not a determined adversary.

Shared machines: Your API key is stored in ~/.hpc_daemon/state.db. The file and directory are restricted to your user account (0600/0700), so other users on the same login node cannot read it. However, if multiple people share the same Unix account, they all have access. Do not use the daemon on a shared account.

Project Structure

hpc_daemon/
├── core/
│   ├── cli.py            # CLI (setup, start, stop, status, list)
│   ├── config.py         # Runtime config, SLURM detection
│   ├── ws.py             # WebSocket connection and message routing
│   └── pty_session.py    # PTY shell session management
├── setup/
│   ├── wizard.py         # Setup wizard and daemon registration
│   └── prompts.py        # Interactive setup prompts
├── jobs/
│   ├── handlers.py       # Job submit/status handlers
│   ├── interfaces.py     # SlurmInterface, LocalJobInterface
│   ├── models.py         # JobRecord, JobState
│   └── monitor.py        # Background job polling and callbacks
├── api_client/
│   ├── auth.py           # OTP signin, API key creation
│   ├── daemon_registry.py # Daemon registration
│   └── http.py           # HTTP client utilities
├── skills/
│   ├── __init__.py       # Skill discovery and loading
│   └── mila-hpc.md       # Built-in Mila cluster skill
├── db.py                 # SQLite database (profiles, jobs)
├── guardrails.py         # Directory restriction enforcement
└── __main__.py           # Entry point

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasc_hpc_daemon-0.1.1.tar.gz (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tasc_hpc_daemon-0.1.1-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file tasc_hpc_daemon-0.1.1.tar.gz.

File metadata

  • Download URL: tasc_hpc_daemon-0.1.1.tar.gz
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for tasc_hpc_daemon-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ddaa76ee00faf8817e3e57851ca33bc80200cdfcec46143ea571592c6197cf07
MD5 cbff1cc34be771b7c67fe5e121a35166
BLAKE2b-256 a7168929d269237d20620e181d673609a31e12fac383b8464a60de00e529d4eb

See more details on using hashes here.

File details

Details for the file tasc_hpc_daemon-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tasc_hpc_daemon-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2dbe330ee85130218bc051dfa2035da765a84092de1d7f784a60d8c0a9fc7a52
MD5 ab64c0f325ab2e0c9b90041b80fe0d1b
BLAKE2b-256 58ebc1d6e12d6fbae0af241dce10c890afa6fe766d1a1d7ae57f93acd357ee3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page