Skip to main content

Agent and CLI for operating an NVIDIA DGX Spark (Grace-Blackwell) workstation — device setup, health/monitoring, and local AI/ML workload management.

Project description

dgx-spark-cli

Agent and CLI for operating an NVIDIA DGX Spark (Grace-Blackwell) workstation — device setup, health/monitoring, and local AI/ML workload management.

What you get

  • An agent-first CLI cited from teken (afi-cli) — the runtime package has no third-party dependencies.
  • A mesh identityculture.yaml (suffix + backend) and the matching prompt file (CLAUDE.md for backend: claude).
  • The canonical guildmaster skill kit (11 skills) under .claude/skills/, vendored cite-don't-import. See docs/skill-sources.md.
  • A build + deploy baseline — pytest, lint, the agent-first rubric gate, and PyPI Trusted Publishing wired into GitHub Actions.

Quickstart

uv sync
uv run pytest -n auto                 # run the test suite
uv run dgx-spark-cli whoami  # identity from culture.yaml
uv run dgx-spark-cli learn   # self-teaching prompt (add --json)
uv run teken cli doctor . --strict    # the agent-first rubric gate CI runs

CLI

Verb What it does
whoami Report this agent's nick, version, backend, and model from culture.yaml.
learn Print a structured self-teaching prompt.
explain <path> Markdown docs for any noun/verb path.
overview Read-only descriptive snapshot of the agent.
doctor Check the agent-identity invariants (prompt-file-present, backend-consistency).
cli overview Describe the CLI surface itself.

Machine scope (DGX Spark host telemetry)

The Spark is the system, so these read-only verbs sit at the top level:

Verb What it does
status Machine-wide scope, anomalies first — the headline.
memory Unified RAM + swap (the GB10 shares one pool across CPU and GPU).
gpu Blackwell GB10: utilization, temp, power, clocks, and GPU processes.
disk Filesystem usage for real block devices (via /proc/mounts + statvfs).
thermal SoC thermal zones and hwmon sensors (no lm-sensors needed).
containers Running Docker containers and their health.
network Interfaces, default route, and reachable addresses.
processes Top processes by resident memory (via /proc).

They have zero runtime dependencies — kernel telemetry is read from /proc and /sys, while nvidia-smi, docker, and ip are shelled out and degrade gracefully (a missing tool reports available: false and still exits 0). doctor remains the health gate. Because the GB10 has no discrete VRAM, nvidia-smi reports aggregate GPU memory as [N/A]; gpu instead sums per-process compute-app memory so you can see how much of the shared pool the GPU holds.

Every command supports --json. Results go to stdout, errors/diagnostics to stderr (never mixed). Exit codes: 0 success, 1 user error, 2 environment error, 3+ reserved.

Monitoring (monitor) — AI-free webhook watchdog

monitor turns the collectors into a deterministic, always-on watchdog. It evaluates the same numbers against configurable thresholds and POSTs to a generic webhook when a catastrophe condition crosses — and again when it clears (edge-triggered, so a standing condition doesn't spam). No AI, no new dependencies (urllib does the POST).

dgx-spark-cli monitor config --init     # scaffold ~/.config/dgx-spark/monitor.json
export DGX_SPARK_WEBHOOK_URL=https://…  # or put webhook_url in the config
dgx-spark-cli monitor check             # dry run: what's firing right now
dgx-spark-cli monitor test              # POST a synthetic alert
dgx-spark-cli monitor install           # write the systemd --user unit
dgx-spark-cli monitor enable            # start it always-on (+ linger)
dgx-spark-cli monitor status            # service + currently firing alerts
Verb What it does
monitor check Evaluate thresholds now (no webhook, no state change).
monitor once One cycle: evaluate, deliver transitions, update state.
monitor run Foreground watch loop (the systemd ExecStart).
monitor test POST a synthetic alert to verify the webhook.
monitor config [--init] Show resolved config / write a scaffold.
monitor install|enable|disable|status|uninstall Manage the systemd --user service.

Watches memory %, swap %, disk %, hottest sensor, GPU temp, load-per-core, container health, and subsystem availability. Thresholds live in the config (null disables a check); webhook_format is generic (default), slack, or discord.

Make it your own

  1. Rename the package spark/ and the dgx-spark-cli CLI/dist name throughout pyproject.toml, the package, tests/, and sonar-project.properties.
  2. Edit culture.yaml with your suffix and backend.
  3. Rewrite CLAUDE.md for your agent and run /init.
  4. Re-vendor only the skills you need from guildmaster (see docs/skill-sources.md).

See CLAUDE.md for the full conventions (version-bump-every-PR, the cicd PR lane, deploy setup).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgx_spark_cli-0.3.0.tar.gz (137.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgx_spark_cli-0.3.0-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file dgx_spark_cli-0.3.0.tar.gz.

File metadata

  • Download URL: dgx_spark_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 137.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dgx_spark_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8bae762e4f6e1d2324960b9f11b861f13a30f28dfa2eda6bc9df0126d1619202
MD5 78df5645fb0c30adefc0e306815e72d3
BLAKE2b-256 2d3e8196e675bc1034585291b10d1f03a7404a204d77ad8fbb09a8c3d07447ca

See more details on using hashes here.

File details

Details for the file dgx_spark_cli-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dgx_spark_cli-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dgx_spark_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02465d1f1d0683cbfbbb984b918cef1421a7ed4f79bc70ce10cec036221e0058
MD5 012bbabe8244fe998a9d17a4cd93d664
BLAKE2b-256 a25070d6e019929afe89dd1c1d1d26d66571385bc82ada7e3e6797962bb3b3ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page