Skip to main content

Stability-first operations CLI for long-lived agent workspaces.

Project description

Helm icon

Helm

Stop long-running coding agents from losing context, making unsafe edits, and becoming impossible to audit.

Helm is a local operations layer for AI agent workspaces: profiles before commands, checkpoints before risky work, durable task history after the chat is gone.

Current release: v0.10.0

Landing page · 한국어 README

PyPI version PyPI Python versions Publish to PyPI License MIT Stability first Runtime agnostic

Quickstart · Why Helm · What Helm Adds · Workflows · Docs · Landing Page

Quickstart

Install from PyPI:

python -m pip install helm-agent-ops
helm --help

Or use the workspace bootstrap installer:

curl -fsSL https://raw.githubusercontent.com/JDeun/Helm/main/install.sh | bash
helm doctor --path ~/.helm/workspace
helm profile --path ~/.helm/workspace run inspect_local --task-name "first Helm inspection" -- git status --short
helm status --path ~/.helm/workspace --brief
helm dashboard --path ~/.helm/workspace

The installer installs Helm and creates ~/.helm/workspace. If helm is not found afterward, use the PATH line printed by the installer.

Need a different workspace?

curl -fsSL https://raw.githubusercontent.com/JDeun/Helm/main/install.sh | bash -s -- \
  --workspace ~/work/helm

Why Helm

Helm is for developers who already use coding agents for real work and need the session to leave behind something more durable than chat history.

Use Helm when you want to:

  • run agent-adjacent commands under explicit risk profiles
  • block destructive or out-of-profile commands before they execute
  • create visible recovery points before broad edits
  • keep task and command history in local files
  • rehydrate future runs from workspace state instead of memory alone
  • review what happened after a long session ends

Helm is not another agent runtime. It is the operating layer around the one you already run.

Use it when an OpenClaw/Hermes-style workspace, or a similar self-hosted agent service, has moved past demos and needs repeated work to stay:

  • bounded by explicit execution profiles
  • recoverable through checkpoints
  • inspectable through task and command logs
  • resumable from files instead of chat history
  • governed by skill contracts and local policy

If the agent only runs one-off demos, Helm is probably unnecessary.

Research Background

Helm's design direction is aligned with the findings in Harness Design Determines Operational Stability in Small Language Models, which experimentally studies how planning, verification, and recovery harnesses affect the operational stability of small language models.

See docs/research-background.md for the connection between the paper and Helm's workspace-level operations layer.

Three-Minute Demo

Helm three-minute demo terminal capture

helm profile --path ~/.helm/workspace run inspect_local \
  --task-name "inspect current repository" \
  -- git status --short

helm checkpoint create --path ~/.helm/workspace \
  --label before-risky-work \
  --include ~/.helm/workspace

helm report --path ~/.helm/workspace --format markdown
helm dashboard --path ~/.helm/workspace

This leaves a task ledger, command log, checkpoint record, and dashboard summary on disk.

How Helm Fits

Category Better for Helm adds
Agent frameworks prompts, planners, tool loops, agent graphs profiles, guard decisions, checkpoints, task ledgers
Observability tools hosted traces, service metrics, telemetry correlation pre-execution policy and local recovery state
Eval tools scoring model output or task success operational history around repeated human-agent work
Shell wrappers command convenience workspace state, memory capture, reports, and recovery discipline

What Helm Adds

Core ideas:

  • Profile: declares the allowed blast radius before a command runs, such as inspect-only, workspace edit, or risky edit.
  • Guardrail: checks command shape against local policy before execution, blocking dangerous or out-of-profile actions.
  • Checkpoint: preserves a visible recovery point before work that may need rollback.
  • Audit trail: records what ran, under which profile, with what guard decision, and what task it belonged to.
  • File-backed memory: keeps reusable context in files so later runs resume from durable state instead of chat history.
  • Context retrieval: ranks notes, memory, ontology, tasks, commands, and checkpoints through one inspectable query surface.
  • Privacy boundary: scans and tokenizes private text before it crosses tool, API, report, or remote handoff boundaries.
  • Operations digest: summarizes capture status, artifact fingerprints, connector freshness, and review pressure without exposing private workspace contents.
Repeated-agent problem Helm adds
The agent forgets prior work Context hydration from notes, memory, tasks, commands, and checkpoints
Risky edits happen too fast Profiles, command guard, and checkpoint discipline
Runs are hard to explain later Task ledger, command log, status, dashboard, and reports
Private context may leak into tools helm privacy scan/tokenize/restore with local vault and audit events
Retrieval feels like a black box helm context --explain-ranking with field, recency, graph, adapter, and source scores
Skill rules live in prompts SKILL.md guidance plus contract.json execution policy
Model fallback is ad hoc File-backed health checks and fallback selection
Operational state is scattered Workspace layout, adopted sources, and SQLite query index
Long-lived integrations silently go stale Connector freshness probes and daily digest review queues

Helm is runtime-agnostic, but it is built first for persistent workspaces with state, memory, profiles, checkpoints, and task history.

Helm explainer cartoon

Workflows

Inspect the workspace.

helm doctor --path ~/.helm/workspace
helm status --path ~/.helm/workspace --brief
helm dashboard --path ~/.helm/workspace

Run under a declared profile.

helm profile --path ~/.helm/workspace run inspect_local \
  --task-name "inspect repository state" \
  -- git status --short

Adopt existing systems as context sources.

helm survey --path ~/.helm/workspace
helm onboard --path ~/.helm/workspace --use-detected --dry-run
helm onboard --path ~/.helm/workspace --use-detected

Check rollback and recent state.

helm checkpoint-recommend --path ~/.helm/workspace
helm checkpoint list --path ~/.helm/workspace
helm task list --path ~/.helm/workspace --status running
helm task doctor --path ~/.helm/workspace
helm report --path ~/.helm/workspace --format markdown

Query durable context with inspectable ranking.

helm context --path ~/.helm/workspace --mode decisions --explain-ranking --json
helm context --path ~/.helm/workspace --mode timeline --since 2026-05-01
helm context --path ~/.helm/workspace --mode entity --entity project_helm
helm context --path ~/.helm/workspace --mode reflect-candidates

Run a privacy boundary preflight.

helm privacy --path ~/.helm/workspace scan --text "Contact alice@example.com" --json
helm privacy --path ~/.helm/workspace tokenize --scope task-123 --text "Contact alice@example.com"

Review stale negative claims in skill instructions.

helm skill-lifecycle negative-claims --path ~/.helm/workspace --persist
helm skill-lifecycle revalidation-due --path ~/.helm/workspace
helm skill-lifecycle revalidate-claim --path ~/.helm/workspace \
  --skill old-skill \
  --claim-id sha256:abc123 \
  --status resolved \
  --note "command now exists"

Probe model health.

helm health --path ~/.helm/workspace state --json
helm health --path ~/.helm/workspace select --json

Try the demo workspace.

helm doctor --path examples/demo-workspace
helm dashboard --path examples/demo-workspace

Workspace Model

Keep Helm in a dedicated workspace. Treat existing systems as read-only context sources first.

  • Helm state lives under .helm/
  • profiles, notes, policies, and skill rules stay as explicit files
  • OpenClaw, Hermes, and notes vaults can be adopted instead of overwritten
  • JSONL remains the append-only source of truth; SQLite is a query index

Docs

Start here:

Core concepts:

Positioning:

Harness engineering principles:

Release details:

Older release notes live in docs/releases/.

Status

Helm v0.10.0 lands the harness-engineering layer: failure-signature classification, profile→tool-group grants, repeated-failure policy transitions, patch-first edit policy, the task-state control container (Forge "Control Flow Is Not Memory"), agent-reliability eval scenarios, trace recording / replay / candidate promotion, profile pause/resume, browser-work verifier with policy decisions, model-repair and synthetic-respond library hooks, and the shadow-mode reporter that drives enforce-readiness decisions. See docs/releases/0.10.0.md.

Helm does not include private memory, personal agent overlays, credentials, or private task history.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helm_agent_ops-0.10.0.tar.gz (449.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helm_agent_ops-0.10.0-py3-none-any.whl (341.6 kB view details)

Uploaded Python 3

File details

Details for the file helm_agent_ops-0.10.0.tar.gz.

File metadata

  • Download URL: helm_agent_ops-0.10.0.tar.gz
  • Upload date:
  • Size: 449.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for helm_agent_ops-0.10.0.tar.gz
Algorithm Hash digest
SHA256 5a58dd9f2a9913c90124822ce80dafe5caf927908d7c72bfb0910113687d0897
MD5 d55c5cca5d519674b2b0cfc0dd8f67d3
BLAKE2b-256 5552136441b4aea3679bffb868b6e66ad0dd4f2b935682a00470e0208e1d26fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for helm_agent_ops-0.10.0.tar.gz:

Publisher: publish.yml on JDeun/Helm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file helm_agent_ops-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: helm_agent_ops-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 341.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for helm_agent_ops-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ad7f66d204b74cc8c07562642f30e480111167d4e3f0828892b5e54843007a9
MD5 0f05426196097e81ac70cff0b22411f8
BLAKE2b-256 2889c1c763faaa2536d482cad73bd48d5e62262061e6387cd409c3674d775ead

See more details on using hashes here.

Provenance

The following attestation bundles were made for helm_agent_ops-0.10.0-py3-none-any.whl:

Publisher: publish.yml on JDeun/Helm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page