Skip to main content

model-gear — run, assess, and switch the local vLLM model.

Project description

model-gear

model-gear is the tooling that runs, assesses, and switches the local, OpenAI-compatible vLLM model the Culture mesh consumes. The binary is modelmodel switch, model assess, model serve, and so on.

The served model is what the lepenseur agent connects to over the acp vllm-local provider. model-gear runs the engine; lepenseur is one consumer of it.

Sibling to culture (the agent mesh), daria (awareness), and steward (alignment).

Install

uv tool install model-gear

Usage

model init --apply          # scaffold a deployment dir (default ~/.model-gear)
model serve --apply         # start the vLLM server (alias: start)
model switch nvidia/Qwen3-32B-NVFP4 --apply   # switch the served model
model status                # current model, container state, /health
model assess                # correctness probes (markdown for a per-model doc)
model benchmark             # decode throughput + prefill latency
model stop --apply          # stop the server

model overview              # tool snapshot + served model + candidate list
model whoami                # tool, machine, served model, container health
model explain switch        # markdown docs for a topic
model doctor                # diagnose docker / compose / .env / health

Every command supports --json. Write verbs (switch, serve, stop, init) are dry-run by default and require --apply to commit — agents call CLIs in loops, so safe-by-default is mandatory.

Running the model locally (vLLM)

model init scaffolds a deployment directory (default ~/.model-gear) from the packaged templates: a docker-compose.yml that stands up the vLLM model as an OpenAI-compatible server on :8000, plus a .env. Tuned for DGX Spark (GB10 Grace Blackwell, 128 GB unified memory) per build.nvidia.com/spark/vllm.

Prerequisites: the NVIDIA Container Toolkit, and docker login nvcr.io with an NGC API key to pull the nvcr.io/nvidia/vllm image.

model init --apply          # writes ~/.model-gear/{docker-compose.yml,.env}
# edit ~/.model-gear/.env to set HF_TOKEN if the model repo is gated
model serve --apply         # first run downloads ~18 GB of weights
model status                # waits/reports until /health is up

Verify it is up:

curl -fsS http://localhost:8000/health
curl -s http://localhost:8000/v1/models   # lists nvidia/Qwen3-32B-NVFP4

Tunables live in the deployment .env (VLLM_MODEL, VLLM_GPU_MEM_UTIL, VLLM_MAX_MODEL_LEN, HF_CACHE, …). VLLM_SERVED_NAME must match the part after vllm-local/ in culture.yamlmodel doctor checks this. model switch rewrites these keys for you.

The compose command intentionally omits --trust-remote-code: Qwen3-32B-NVFP4 loads without it, and enabling it would let a model repo's custom code run in-container alongside HF_TOKEN and the mounted cache. Add it back only for a model whose repo ships custom modeling code. If vLLM rejects the nvidia/ ModelOpt checkpoint, set VLLM_MODEL to the vLLM-native RedHatAI/Qwen3-32B-NVFP4 and drop --quantization from the compose command.

Per-model notes

Each runtime model has a doc under docs/ recording how to run it, live test results, and caveats:

  • docs/qwen3-32b-nvfp4.md — the current runtime model (nvidia/Qwen3-32B-NVFP4), benchmarked on DGX Spark.
  • docs/qwen3.6-27b-nvfp4.md — a candidate (mmangkad/Qwen3.6-27B-NVFP4), load-tested on DGX Spark; loads under the current vLLM image but is slower on decode, so the 32B stays.

The numbers in each doc come from model switch <model> --apply then model assess (correctness) and model benchmark (throughput). model overview --list lists these docs and flags which model is currently served.

lepenseur still gets deployed

The mesh agent served by this model is lepenseur ("le penseur" — the thinker), a local thinking agent. Its runtime identity lives in AGENTS.md (the acp system prompt) and culture.yaml (backend: acp, model: vllm-local/nvidia/Qwen3-32B-NVFP4). model-gear is the repo identity and the tool; lepenseur is the agent that consumes the model model-gear serves.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_gear-0.7.0.tar.gz (140.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_gear-0.7.0-py3-none-any.whl (47.3 kB view details)

Uploaded Python 3

File details

Details for the file model_gear-0.7.0.tar.gz.

File metadata

  • Download URL: model_gear-0.7.0.tar.gz
  • Upload date:
  • Size: 140.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for model_gear-0.7.0.tar.gz
Algorithm Hash digest
SHA256 9639507397c75ec1999804d47cddbfc4c5457d89ac4b9f0109beab9c0010cc15
MD5 e48d1c7089c6581d7e7939a7b5ecb245
BLAKE2b-256 eb729c71f67e20823429bf90d4b1ae310d71748d37841f7fa8ba7aa105f65414

See more details on using hashes here.

File details

Details for the file model_gear-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: model_gear-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 47.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for model_gear-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90287aeb32f7dd8def6b0f508b511f52c389d0d2cfb0cec3d2840b48d427a3d0
MD5 4abc97a72531e5414e061e256d87efed
BLAKE2b-256 01043ec6f71509f9b18ee12f5910fb4bd29b65da2f0f4f20d1189beacd72409f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page