Skip to main content

Agent and CLI for operating an NVIDIA DGX Spark (Grace-Blackwell) workstation — device setup, health/monitoring, and local AI/ML workload management.

Project description

dgx-spark-cli

Agent and CLI for operating an NVIDIA DGX Spark (Grace-Blackwell) workstation — device setup, health/monitoring, and local AI/ML workload management.

What you get

  • An agent-first CLI cited from teken (afi-cli) — the runtime package has no third-party dependencies.
  • A mesh identityculture.yaml (suffix + backend) and the matching prompt file (CLAUDE.md for backend: claude).
  • The canonical guildmaster skill kit (11 skills) under .claude/skills/, vendored cite-don't-import. See docs/skill-sources.md.
  • A build + deploy baseline — pytest, lint, the agent-first rubric gate, and PyPI Trusted Publishing wired into GitHub Actions.

Quickstart

uv sync
uv run pytest -n auto                 # run the test suite
uv run dgx-spark-cli whoami  # identity from culture.yaml
uv run dgx-spark-cli learn   # self-teaching prompt (add --json)
uv run teken cli doctor . --strict    # the agent-first rubric gate CI runs

CLI

Verb What it does
whoami Report this agent's nick, version, backend, and model from culture.yaml.
learn Print a structured self-teaching prompt.
explain <path> Markdown docs for any noun/verb path.
overview Read-only descriptive snapshot of the agent.
doctor Check the agent-identity invariants (prompt-file-present, backend-consistency).
cli overview Describe the CLI surface itself.

Machine scope (DGX Spark host telemetry)

The Spark is the system, so these read-only verbs sit at the top level:

Verb What it does
status Machine-wide scope, anomalies first — the headline.
memory Unified RAM + swap (the GB10 shares one pool across CPU and GPU).
gpu Blackwell GB10: utilization, temp, power, clocks, and GPU processes.
disk Filesystem usage for real block devices (via /proc/mounts + statvfs).
thermal SoC thermal zones and hwmon sensors (no lm-sensors needed).
containers Running Docker containers and their health.
network Interfaces, default route, and reachable addresses.
processes Top processes by resident memory (via /proc).

They have zero runtime dependencies — kernel telemetry is read from /proc and /sys, while nvidia-smi, docker, and ip are shelled out and degrade gracefully (a missing tool reports available: false and still exits 0). doctor remains the health gate. Because the GB10 has no discrete VRAM, nvidia-smi reports aggregate GPU memory as [N/A]; gpu instead sums per-process compute-app memory so you can see how much of the shared pool the GPU holds.

Every command supports --json. Results go to stdout, errors/diagnostics to stderr (never mixed). Exit codes: 0 success, 1 user error, 2 environment error, 3+ reserved.

Monitoring (monitor) — AI-free webhook watchdog

monitor turns the collectors into a deterministic, always-on watchdog. It evaluates the same numbers against configurable thresholds and POSTs to a generic webhook when a catastrophe condition crosses — and again when it clears (edge-triggered, so a standing condition doesn't spam). No AI, no new dependencies (urllib does the POST).

dgx-spark-cli monitor config --init     # scaffold ~/.config/dgx-spark/monitor.json
export DGX_SPARK_WEBHOOK_URL=https://…  # or put webhook_url in the config
dgx-spark-cli monitor check             # dry run: what's firing right now
dgx-spark-cli monitor test              # POST a synthetic alert
dgx-spark-cli monitor install           # write the systemd --user unit
dgx-spark-cli monitor enable            # start it always-on (+ linger)
dgx-spark-cli monitor status            # service + currently firing alerts
Verb What it does
monitor check Evaluate thresholds now (no webhook, no state change).
monitor once One cycle: evaluate, deliver transitions, update state.
monitor run Foreground watch loop (the systemd ExecStart).
monitor test POST a synthetic alert to verify the webhook.
monitor config [--init] Show resolved config / write a scaffold.
monitor install|enable|disable|status|uninstall Manage the systemd --user service.

Watches memory %, swap %, disk %, hottest sensor, GPU temp, load-per-core, container health, and subsystem availability. Thresholds live in the config (null disables a check); webhook_format is generic (default), slack, or discord.

Make it your own

  1. Rename the package spark/ and the dgx-spark-cli CLI/dist name throughout pyproject.toml, the package, tests/, and sonar-project.properties.
  2. Edit culture.yaml with your suffix and backend.
  3. Rewrite CLAUDE.md for your agent and run /init.
  4. Re-vendor only the skills you need from guildmaster (see docs/skill-sources.md).

See CLAUDE.md for the full conventions (version-bump-every-PR, the cicd PR lane, deploy setup).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgx_spark_cli-0.3.1.tar.gz (137.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgx_spark_cli-0.3.1-py3-none-any.whl (61.5 kB view details)

Uploaded Python 3

File details

Details for the file dgx_spark_cli-0.3.1.tar.gz.

File metadata

  • Download URL: dgx_spark_cli-0.3.1.tar.gz
  • Upload date:
  • Size: 137.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dgx_spark_cli-0.3.1.tar.gz
Algorithm Hash digest
SHA256 12f66d985126c9c7be74ee0575979c9b036c87cf373e87f026f0ab4c631f83d4
MD5 9c977585140f49e0b9bdf909e42249a5
BLAKE2b-256 063a5b90d82d50f814082bcce41964fbce7ee637ceeecade0f819b108159d35c

See more details on using hashes here.

File details

Details for the file dgx_spark_cli-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: dgx_spark_cli-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 61.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dgx_spark_cli-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f46d0628e5c27df4c9f32262331119e7ff0845c5e2e1e6f7d0f934e683f6428b
MD5 ddf963a9d13ca26f72a26bf03617ad5d
BLAKE2b-256 6908497621b402b47df5c050fa61e9ce1886d050b52e8e6a47d9ad5fb0aacd3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page