Launch and manage Docker-based inference workloads on NVIDIA DGX Spark systems

These details have not been verified by PyPI

Project description

sparkrun

One command to rule them all

Launch, manage, and stop inference workloads on one or more NVIDIA DGX Spark systems — no Slurm, no Kubernetes, no fuss.

sparkrun is a unified CLI for running LLM inference on DGX Spark. Point it at your hosts, pick a recipe, and go. It handles container orchestration, InfiniBand/RDMA detection, model distribution, and multi-node tensor parallelism across your Spark cluster automatically.

sparkrun does not need to run on a member of the cluster. You can coordinate one or more DGX Sparks from any Linux machine with SSH access.

# uv is preferred mechanism for managing python environments
# To install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh

# automatic installation via uvx (manages virtual environment and
# creates alias in your shell, sets up autocomplete too!)
uvx sparkrun setup install

Alternative: manual pip install

pip install sparkrun
# or
uv pip install sparkrun

With a manual install you will need to run sparkrun setup completion separately for tab completion.

Quick Start

Tab completion

Note: If you installed via sparkrun setup install, tab completion is already set up — you can skip this step.

sparkrun setup completion          # auto-detects your shell
sparkrun setup completion --shell zsh

After restarting your shell, recipe names, cluster names, and subcommands all tab-complete.

Save a cluster config

# Save your hosts once
sparkrun cluster create mylab --hosts 192.168.11.13,192.168.11.14 -d "My DGX Spark lab"
sparkrun cluster set-default mylab

# Now just run — hosts are automatic
sparkrun run nemotron3-nano-30b-nvfp4-vllm

Run an inference job

# Single node vLLM (Note that minimum nodes / parallelism is configured by the recipe)
sparkrun run qwen3-1.7b-vllm

# Multi-node (2-node tensor parallel) -- using your default two node cluster
sparkrun run qwen3-1.7b-vllm --tp 2

# Override settings on the fly
sparkrun run qwen3-1.7b-vllm --hosts 192.168.11.14 --port 9000 --gpu-mem 0.8
sparkrun run qwen3-1.7b-vllm --tp 2 -H 192.168.11.13,192.168.11.14 -o max_model_len=8192

# GGUF quantized models via llama.cpp
sparkrun run qwen3-1.7b-llama-cpp

sparkrun always launches jobs in the background (detached containers) and then follows logs. Ctrl+C detaches from logs — it never kills your inference job. Your model keeps serving.

Inspect a recipe

sparkrun show nemotron3-nano-30b-nvfp4-vllm

Name:         nemotron3-nano-30b-nvfp4
Description:  NVIDIA Nemotron 3 Nano 30B (upstream NVFP4) -- cluster or solo
Runtime:      vllm
Model:        nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
Container:    scitrera/dgx-spark-vllm:0.16.0-t5
Nodes:        1 - unlimited
Repository:   Local

Defaults:
  gpu_memory_utilization: 0.8
  max_model_len: 200000
  port: 8000
  served_model_name: nemotron3-30b-a3b
  tensor_parallel: 1

VRAM Estimation:
  Model dtype:      nvfp4
  Model params:     30,000,000,000
  KV cache dtype:   bfloat16
  Architecture:     52 layers, 2 KV heads, 128 head_dim
  Model weights:    19.56 GB
  KV cache:         9.92 GB (max_model_len=200,000)
  Tensor parallel:  1
  Per-GPU total:    29.48 GB
  DGX Spark fit:    YES

  GPU Memory Budget:
    gpu_memory_utilization: 80%
    Usable GPU memory:     96.8 GB (121 GB x 80%)
    Available for KV:      77.2 GB
    Max context tokens:    1,557,583
    Context multiplier:    7.8x (vs max_model_len=200,000)

The VRAM estimator auto-detects model architecture from HuggingFace and tells you whether your configuration fits within DGX Spark's 128 GB unified memory before you launch.

Custom recipe registries

# See what's configured
sparkrun recipe registries

# Add a community or private registry
sparkrun recipe add-registry myteam \
  --url https://github.com/myorg/spark-recipes.git \
  --subpath recipes

# Update all registries
sparkrun recipe update

# Search across all registries
sparkrun search qwen3

Manage running workloads

# Re-attach to logs (Ctrl+C is always safe) -- NOTE: finds cluster by combination of hosts, model, and runtime
sparkrun logs nemotron3-nano-30b-nvfp4-vllm --cluster mylab

# Stop a workload -- NOTE: finds cluster by combination of hosts, model, and runtime
sparkrun stop nemotron3-nano-30b-nvfp4-vllm --cluster mylab

# If you launched with --tp (modifying the recipe default), e.g.:
sparkrun run nemotron3-nano-30b-nvfp4-vllm --tp 2
# then pass --tp so stop/logs resolve the same cluster ID as run:
sparkrun stop nemotron3-nano-30b-nvfp4-vllm --tp 2
sparkrun logs nemotron3-nano-30b-nvfp4-vllm --tp 2
# TIP: you can just press up and modify "run" to "stop"

Supported Runtimes

vLLM

First-class support for vLLM. Solo and multi-node clustering via Ray. Works with ready-built images (e.g. scitrera/dgx-spark-vllm). Also works with other images including those built from eugr's repo and/or NVIDIA images.

SGLang

First-class support for SGLang. Solo and multi-node clustering via SGLang's native distributed backend (--dist-init-addr, --nnodes, --node-rank). Works with ready-built images (e.g. scitrera/dgx-spark-sglang). Should also work with other sglang images, but there seem to be a lot fewer sglang images around than vllm images.

llama.cpp

Support for llama.cpp via llama-server. Solo mode with GGUF quantized models. Loads models directly from HuggingFace (e.g. Qwen/Qwen3-1.7B-GGUF:Q4_K_M). Lightweight alternative to vLLM/SGLang for smaller models or constrained environments.

GGUF models use colon syntax to select a quantization variant: model: Qwen/Qwen3-1.7B-GGUF:Q8_0. sparkrun pre-downloads only the matching quant files and resolves the local cache path so the container doesn't need to re-download at serve time.

Experimental: Multi-node inference via llama.cpp's RPC backend. Worker nodes run rpc-server and the head node connects via --rpc. This is still evolving both upstream and in sparkrun and should be considered experimental. Note that the fastest DGX Spark interconnect communication will be via NCCL and RoCE -- and the llama.cpp RPC mechanism involves a lot more overhead.

eugr-vllm (compatibility runtime)

Full compatibility with eugr/spark-vllm-docker. This runtime delegates entirely to eugr's scripts — mods, local builds, and all eugr-specific features work natively because sparkrun calls their code directly rather than reimplementing it.

Use this when you need a nightly vLLM build, custom modifications, or anything that requires building containers locally from eugr's repo.

The recipe format for sparkrun is designed to be mostly compatible with eugr's (more like a v2 format) -- sparkrun will translate any variations in recipe format to the eugr repo format automatically. Changes were mostly to ensure greater compatibility with multiple runtimes and to reduce redundancy (somewhat). The full command listing is preserved to ensure greater compatibility, but long-term, runtime implementations should be able to generate commands.

# eugr-vllm recipe example
runtime: eugr-vllm
model: my-org/custom-model
container: vllm-node-tf5
runtime_config:
  mods: [ my-custom-mod ]
  build_args: [ --some-flag ]

How It Works

Recipes are YAML files that describe an inference workload: the model, container image, runtime, and default parameters. sparkrun ships bundled recipes and supports custom registries (any git repo with YAML files). Sparkrun includes limited recipes and otherwise also includes the eugr repo as a default registry (which also delegates running to eugr's repo also...). The idea in the long-run is to merge recipes from multiple registries into a single unified catalog. And be able to run them even if they were designed for different runtimes (e.g. vLLM vs SGLang) without needing to worry about the underlying command differences. See the RECIPES specification file for more details.

Runtimes are plugins that know how to launch a specific inference engine. sparkrun discovers them via Python entry points, so custom runtimes can be added by installing a package.

Orchestration is handled over SSH. sparkrun detects InfiniBand/RDMA interfaces on your hosts, distributes container images and models from local to remote (using the ethernet interfaces of the RDMA interfaces for fast transfers when available), configures NCCL environment variables, and launches containers with the right networking.

Each DGX Spark has one GPU, so tensor parallelism maps directly to node count: --tp 2 means 2 hosts.

SSH Prerequisites

All multi-node orchestration relies on SSH. At minimum, you need passwordless SSH from your control machine to every cluster node. sparkrun pulls container images and models locally and pushes them to each node directly, so node-to-node SSH is not strictly required for the default workflow.

That said, setting up a full SSH mesh (every host can reach every other host) is recommended — it enables alternative distribution strategies and is generally useful for cluster administration.

The easiest way to set this up is sparkrun setup ssh, which creates a full mesh across your cluster hosts and the control machine (included automatically via --include-self, on by default):

# Set up passwordless SSH mesh across your cluster + this machine
sparkrun setup ssh --hosts 192.168.11.13,192.168.11.14 --user ubuntu

# Or use a saved cluster
sparkrun setup ssh --cluster mylab

# Or if you've set your default cluster -- it'll just use that
sparkrun setup ssh

# Add extra hosts beyond the cluster (e.g. a jump host)
sparkrun setup ssh --cluster mylab --extra-hosts 10.0.0.99

# Exclude the control machine from the mesh
sparkrun setup ssh --cluster mylab --no-include-self

You will be prompted for passwords on first connection to each host. After that, every host in the mesh can SSH to every other host without passwords.

Manual SSH setup (without sparkrun setup ssh)

If you prefer to set up SSH yourself, you need key-based auth from your control machine to each node:

# Generate a key if you don't have one
ssh-keygen -t ed25519

# Copy to each node
ssh-copy-id 192.168.11.13
ssh-copy-id 192.168.11.14

SSH user: By default sparkrun uses your current OS user for SSH. You can set a per-cluster user with sparkrun cluster create --user dgxuser or sparkrun cluster update --user dgxuser, or override per-command with --user.

For more advanced SSH configuration (non-default ports, identity files), use `~/.ssh/config`.

Host spark1
    HostName 192.168.11.13
    User dgxuser

Host spark2
    HostName 192.168.11.14
    User dgxuser

Solo mode (--solo) runs on a single host and still uses SSH unless the target is localhost.

Docker Group

sparkrun launches containers via docker on each host. The SSH user must be a member of the docker group on every cluster node:

sudo usermod -aG docker "$USER"

Recipes

A recipe is a YAML file:

model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
runtime: vllm
min_nodes: 2
container: scitrera/dgx-spark-vllm:0.16.0-t5

metadata:
  description: NVIDIA Nemotron 3 Nano 30B (upstream NVFP4)
  maintainer: scitrera.ai <open-source-team@scitrera.com>

defaults:
  port: 8000
  tensor_parallel: 1
  gpu_memory_utilization: 0.8
  max_model_len: 200000
  served_model_name: nemotron3-30b-a3b

command: |
  vllm serve {model} \
      --served-model-name {served_model_name} \
      --max-model-len {max_model_len} \
      --gpu-memory-utilization {gpu_memory_utilization} \
      -tp {tensor_parallel} \
      --host {host} --port {port}

Any default can be overridden at launch time with -o key=value or dedicated flags like --port, --tp, --gpu-mem.

Recipes can also include an env block for environment variables injected into the container. Shell variable references like ${HF_TOKEN} are expanded from the control machine's environment, so you can forward secrets without hardcoding them. See RECIPES.md for the full recipe format specification.

GGUF recipes (llama.cpp)

GGUF recipes use the llama-cpp runtime and specify a quantization variant with colon syntax:

model: Qwen/Qwen3-1.7B-GGUF:Q8_0
runtime: llama-cpp
min_nodes: 1
max_nodes: 1
container: scitrera/dgx-spark-llama-cpp:latest

defaults:
  port: 8000
  host: 0.0.0.0
  n_gpu_layers: 99
  ctx_size: 8192

command: |
  llama-server \
      -hf {model} \
      --host {host} --port {port} \
      --n-gpu-layers {n_gpu_layers} \
      --ctx-size {ctx_size} \
      --flash-attn on --jinja --no-webui

When model pre-sync is enabled (the default), sparkrun downloads only the matching quant files locally, distributes them to target hosts, and rewrites -hf to -m with the resolved container cache path so the container serves from the local copy without re-downloading.

CLI Reference

Global options

Option	Description
`-v` / `--verbose`	Enable verbose/debug output
`--version`	Show version and exit
`--help`	Show help for any command

Workload commands

Command	Description
`sparkrun run <recipe>`	Launch an inference workload
`sparkrun stop <recipe>`	Stop a running workload
`sparkrun logs <recipe>`	Re-attach to workload logs

sparkrun run options:

Option	Description
`--hosts` / `-H`	Comma-separated host list (first = head)
`--hosts-file`	File with hosts (one per line, `#` comments)
`--cluster`	Use a saved cluster by name
`--solo`	Force single-node mode
`--port`	Override serve port
`--tp` / `--tensor-parallel`	Override tensor parallelism
`--gpu-mem`	Override GPU memory utilization (0.0-1.0)
`--image`	Override container image (not recommended)
`--cache-dir`	HuggingFace cache directory
`--option` / `-o`	Override any recipe default: `-o key=value` (repeatable)
`--dry-run` / `-n`	Show what would be done without executing
`--foreground`	Run in foreground (don't detach)
`--no-follow`	Don't follow container logs after launch
`--skip-ib`	Skip InfiniBand detection (not recommended)
`--ray-port`	Ray GCS port (default: 46379) (vllm)
`--init-port`	SGLang distributed init port (default: 25000)
`--dashboard`	Enable Ray dashboard on head node (vllm)
`--dashboard-port`	Ray dashboard port (default: 8265)

sparkrun stop options:

Option	Description
`--hosts` / `-H`	Comma-separated host list
`--hosts-file`	File with hosts
`--cluster`	Use a saved cluster by name
`--tp` / `--tensor-parallel`	Match host trimming from run
`--dry-run` / `-n`	Show what would be done

sparkrun logs options:

Option	Description
`--hosts` / `-H`	Comma-separated host list
`--hosts-file`	File with hosts
`--cluster`	Use a saved cluster by name
`--tp` / `--tensor-parallel`	Match host trimming from run
`--tail`	Number of existing log lines to show (default: 100)

Recipe commands

Command	Description
`sparkrun list [query]`	List available recipes (alias)
`sparkrun show <recipe>`	Show recipe details + VRAM estimate (alias)
`sparkrun search <query>`	Search recipes by name/model/description (alias)
`sparkrun recipe list [query]`	List available recipes from all registries
`sparkrun recipe show <recipe>`	Show detailed recipe information
`sparkrun recipe search <query>`	Search for recipes by name, model, or description
`sparkrun recipe validate <recipe>`	Validate a recipe file
`sparkrun recipe vram <recipe>`	Estimate VRAM usage for a recipe

sparkrun recipe vram options:

Option	Description
`--tp` / `--tensor-parallel`	Override tensor parallelism
`--max-model-len`	Override max sequence length
`--gpu-mem`	Override gpu_memory_utilization (0.0-1.0)
`--no-auto-detect`	Skip HuggingFace model auto-detection

Registry commands

Command	Description
`sparkrun recipe registries`	List configured recipe registries
`sparkrun recipe add-registry <name>`	Add a custom recipe registry
`sparkrun recipe remove-registry <name>`	Remove a recipe registry
`sparkrun recipe update`	Update registries from git

Cluster commands

Command	Description
`sparkrun cluster create <name>`	Create a new named cluster (`--user` sets SSH user)
`sparkrun cluster update <name>`	Update hosts, description, or user of a cluster
`sparkrun cluster list`	List all saved clusters
`sparkrun cluster show <name>`	Show details of a saved cluster
`sparkrun cluster delete <name>`	Delete a saved cluster
`sparkrun cluster set-default <name>`	Set the default cluster
`sparkrun cluster unset-default`	Remove the default cluster setting
`sparkrun cluster default`	Show the current default cluster
`sparkrun cluster status`	Show running containers, pending operations, and IP mappings
`sparkrun status`	Alias for `sparkrun cluster status`

The first host in a cluster definition is used as the head node for multi-node jobs. Order the remaining hosts however you like — they become workers.

Setup commands

Command	Description
`sparkrun setup install`	Install sparkrun as a uv tool + tab-completion
`sparkrun setup completion`	Install shell tab-completion (bash/zsh/fish)
`sparkrun setup update`	Update sparkrun to the latest version
`sparkrun setup ssh`	Set up passwordless SSH mesh across hosts

Roadmap

sparkrun setup subcommands for basic system configuration, ConnectX-7 NIC setup, and SSH mesh provisioning
Additional bundled recipes for popular models
Health checks and status monitoring for running workloads

About

sparkrun provides a unified tool for running inference on DGX Spark systems without Slurm or Kubernetes coordination. It is intended to be donated to a future community organization.

License

Apache License 2.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.20

Apr 4, 2026

0.2.19

Apr 3, 2026

0.2.18

Apr 2, 2026

0.2.17

Apr 2, 2026

0.2.16

Apr 1, 2026

0.2.15

Apr 1, 2026

0.2.14

Apr 1, 2026

0.2.13

Mar 31, 2026

0.2.12

Mar 31, 2026

0.2.11

Mar 31, 2026

0.2.10

Mar 30, 2026

0.2.9

Mar 30, 2026

0.2.8

Mar 30, 2026

0.2.7

Mar 28, 2026

0.2.6

Mar 28, 2026

0.2.5

Mar 27, 2026

0.2.4

Mar 27, 2026

0.2.3

Mar 27, 2026

0.2.2

Mar 27, 2026

0.2.1

Mar 27, 2026

0.2.0

Mar 26, 2026

0.1.14

Mar 20, 2026

0.1.13

Mar 13, 2026

0.1.12

Mar 13, 2026

0.1.11

Mar 11, 2026

0.1.10

Mar 11, 2026

0.1.9

Mar 8, 2026

0.1.8

Mar 8, 2026

0.1.7

Mar 8, 2026

0.1.6

Mar 8, 2026

0.1.5

Mar 4, 2026

0.1.4

Mar 4, 2026

0.1.3

Mar 4, 2026

0.1.2

Mar 3, 2026

0.1.1

Mar 1, 2026

0.1.0

Mar 1, 2026

0.0.23

Feb 25, 2026

0.0.21

Feb 24, 2026

0.0.20

Feb 23, 2026

0.0.19

Feb 22, 2026

0.0.18

Feb 20, 2026

0.0.17

Feb 19, 2026

This version

0.0.16

Feb 19, 2026

0.0.15

Feb 18, 2026

0.0.13

Feb 17, 2026

0.0.12

Feb 17, 2026

0.0.11

Feb 17, 2026

0.0.10

Feb 17, 2026

0.0.9

Feb 17, 2026

0.0.8

Feb 17, 2026

0.0.7

Feb 17, 2026

0.0.6

Feb 17, 2026

0.0.5

Feb 17, 2026

0.0.4

Feb 17, 2026

0.0.3

Feb 17, 2026

0.0.2

Feb 17, 2026

0.0.1

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkrun-0.0.16.tar.gz (9.5 MB view details)

Uploaded Feb 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparkrun-0.0.16-py3-none-any.whl (114.1 kB view details)

Uploaded Feb 19, 2026 Python 3

File details

Details for the file sparkrun-0.0.16.tar.gz.

File metadata

Download URL: sparkrun-0.0.16.tar.gz
Upload date: Feb 19, 2026
Size: 9.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkrun-0.0.16.tar.gz
Algorithm	Hash digest
SHA256	`49065713f86d11ccb926e4c47ed16cce4fb52aa7697ac93f16abfeb5dfcbd8e1`
MD5	`83ac246a5175bc546acbce07e3b063ce`
BLAKE2b-256	`a4421891de5a472059c837fd5b5fef7e83bca3190f756b5ab24cc1dac73af156`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkrun-0.0.16.tar.gz:

Publisher: publish.yml on scitrera/sparkrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparkrun-0.0.16.tar.gz
- Subject digest: 49065713f86d11ccb926e4c47ed16cce4fb52aa7697ac93f16abfeb5dfcbd8e1
- Sigstore transparency entry: 970566842
- Sigstore integration time: Feb 19, 2026
Source repository:
- Permalink: scitrera/sparkrun@0d03ad033f105c7fad36d3250c2b6ea39ef362d2
- Branch / Tag: refs/tags/v0.0.16
- Owner: https://github.com/scitrera
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0d03ad033f105c7fad36d3250c2b6ea39ef362d2
- Trigger Event: push

File details

Details for the file sparkrun-0.0.16-py3-none-any.whl.

File metadata

Download URL: sparkrun-0.0.16-py3-none-any.whl
Upload date: Feb 19, 2026
Size: 114.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkrun-0.0.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`059515127c2685238254e5dbc3987836c09dac07d5272b9406df11089e9e8a40`
MD5	`8b8b74ad476e49ed96912dd399da0dcc`
BLAKE2b-256	`7f81af6a3833b9f91a363eaa6242d691b0fb1f36ed46a60535b3193877d6ef58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkrun-0.0.16-py3-none-any.whl:

Publisher: publish.yml on scitrera/sparkrun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparkrun-0.0.16-py3-none-any.whl
- Subject digest: 059515127c2685238254e5dbc3987836c09dac07d5272b9406df11089e9e8a40
- Sigstore transparency entry: 970566873
- Sigstore integration time: Feb 19, 2026
Source repository:
- Permalink: scitrera/sparkrun@0d03ad033f105c7fad36d3250c2b6ea39ef362d2
- Branch / Tag: refs/tags/v0.0.16
- Owner: https://github.com/scitrera
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0d03ad033f105c7fad36d3250c2b6ea39ef362d2
- Trigger Event: push

sparkrun 0.0.16

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

sparkrun

Quick Start

Tab completion

Save a cluster config

Run an inference job

Inspect a recipe

Custom recipe registries

Manage running workloads

Supported Runtimes

vLLM

SGLang

llama.cpp

eugr-vllm (compatibility runtime)

How It Works

SSH Prerequisites

Docker Group

Recipes

GGUF recipes (llama.cpp)

CLI Reference

Global options

Workload commands

Recipe commands

Registry commands

Cluster commands

Setup commands

Roadmap

About

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance