Skip to main content

Config-driven CLI to launch, monitor, and ship VLA fine-tunes across ephemeral GPU boxes.

Project description

vlakit

vlakit is a config-driven command-line tool for launching VLA fine-tunes across ephemeral GPU boxes, keeping them alive through crashes, and shipping verified weights to durable storage. You describe a run in YAML and drive everything with a single vla command from your laptop; the actual work runs on a GPU box over ssh, and the durable artifacts land on Weights & Biases and a storage box — never on the GPU box, which you throw away.

It packages a set of battle-tested shell and Python scripts (the ones that encode the hard-won operational lessons) behind a friendly CLI. The scripts ship with the install as read-only package data, while your environment — the boxes, datasets, baselines, and runs — lives in a configs/ directory you own and edit.

Install

vlakit is best installed as an isolated CLI with pipx:

pipx install vlakit

Or with pip. The core install is dependency-light because the laptop-side commands need almost nothing; the heavier pieces are opt-in extras:

pip install vlakit                 # laptop-side: config / remote / launch
pip install "vlakit[stats]"        # adds numpy + pyarrow for `vla stats`
pip install "vlakit[wandb]"        # adds wandb for publish / pull / eval logging
pip install "vlakit[all]"          # everything

Quickstart

vla init                           # scaffold an editable ./configs from templates
# edit configs/boxes.yaml, datasets.yaml, baselines.yaml, and a runs/<name>.yaml
# then copy configs/secrets.example.env -> configs/secrets.local.env and fill it

vla config <run>                   # resolve + print the run config (local, no box)
vla remote <box> deploy            # rsync the toolkit + your configs onto the box
vla remote <box> push-secrets      # install ~/.secrets.env on the box (mode 600)
vla remote <box> ensure-swap       # provision swap (absorbs the checkpoint-save spike)
vla launch <run>                   # launch detached + auto-resume (box read from the run cfg)
vla remote <box> monitor           # step / rate / ETA + liveness

Where each command runs

vlakit keeps a clean split between your laptop and the GPU box. Commands that resolve or inspect configuration — init, config, stats, split, eval, doctor — run entirely on your laptop and need no box. Commands that operate on a machine — everything under vla remote ..., and vla launch — open an ssh connection from your laptop and run the work on the box defined in your boxes.yaml.

Run vla doctor to see exactly which scripts directory and config directory were resolved, and which optional dependencies are installed.

Commands

Command Runs What it does
vla init [dir] laptop Scaffolds an editable configs/ directory from the bundled templates.
vla config <run> laptop Resolves a run (defaults merged under the run) and prints the config plus the exact command, running nothing.
vla remote <box> <subcmd> [args] box Runs an operational subcommand on the box: deploy, push-secrets, ensure-swap, launch, autoresume, monitor, kill, rescale, pull, gpus, exec, shell.
vla launch <run> box Launches the run detached and auto-resuming; the box is read from the run's box: field.
vla stats [args] laptop/box Computes the full dataset statistics (quantiles + image stats) that lerobot and molmo need. Requires the [stats] extra.
vla split [args] laptop Produces a deterministic held-out episode split for validation/eval.
vla eval [args] laptop/box Ranks a checkpoint by held-out error or rollout, not loss. Try vla eval --self-test to verify the harness with no box.
vla doctor laptop Prints the resolved scripts/config directories and optional-dependency status.

Configuration

Your configs/ directory holds everything dynamic, and no secrets ever live in it: keys resolve on the box via ~/.secrets.env. The directory is resolved from --config-dir, then the VLA_CONFIG_DIR environment variable, then ./configs. A run file under runs/ is a thin recipe that names a box, a dataset, and a baseline — each a pointer into the corresponding registry — plus a few hyperparameters; everything else is inherited from _defaults.yaml.

Status

The local commands (init, config, stats, split, eval, doctor) are implemented and tested. The remote commands shell out to the bundled, proven ops scripts; vla remote <box> deploy now ships both the toolkit and your configs/ to the box. The eval offline comparator is implemented and self-tested (vla eval --self-test); its sim and robot rollout modes are still stubs.

Publishing (maintainers)

Releases publish to PyPI automatically through Trusted Publishing (OIDC), so no API token is stored anywhere. The workflow is .github/workflows/release.yml.

One-time setup:

  1. On PyPI, add a pending Trusted Publisher (Account → Publishing) with these exact values:
    • PyPI Project Name: vlakit
    • Owner: kkipngenokoech
    • Repository name: vlakit
    • Workflow name: release.yml
    • Environment name: pypi
  2. In the GitHub repo, create an Environment named pypi (Settings → Environments).

To cut a release, tag a version that matches pyproject.toml and push it:

git tag v0.1.0
git push origin v0.1.0

The workflow builds the sdist + wheel, verifies the bundled scripts/templates are inside the wheel, checks the tag matches the package version, and publishes. After the first successful run the pending publisher becomes a normal one, and pipx install vlakit works for everyone.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlakit-0.1.0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlakit-0.1.0-py3-none-any.whl (87.7 kB view details)

Uploaded Python 3

File details

Details for the file vlakit-0.1.0.tar.gz.

File metadata

  • Download URL: vlakit-0.1.0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vlakit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1e4046089974e48de7dc3163d21b7c3cc4f0b4a9c20e8c475f65c6b6ee8d17ca
MD5 2f42c937a189264ecb698b370b791a67
BLAKE2b-256 fc35ce8b433097960399b33df3e0a9b0bb292a5d87ceae6f4ab3135dde480872

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlakit-0.1.0.tar.gz:

Publisher: release.yml on kkipngenokoech/vlakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlakit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vlakit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 87.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vlakit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97858e11830273e689705a8e81d48160fc426489b820c0638aa9058ccaac330a
MD5 1028eef044807eca1191569d940a149c
BLAKE2b-256 5c4d73467f8e0015a817736dd678687d45b8dac3fa182be41b1a97d1b8342d4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlakit-0.1.0-py3-none-any.whl:

Publisher: release.yml on kkipngenokoech/vlakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page