Skip to main content

Local-first CLI coding agent — tested with Gemma 4 26B via vLLM

Project description

DryDock

Python Version License

     ____             ____             _    
    |  _ \ _ __ _   _|  _ \  ___   ___| | __
    | | | | '__| | | | | | |/ _ \ / __| |/ /
    | |_| | |  | |_| | |_| | (_) | (__|   < 
    |____/|_|   \__, |____/ \___/ \___|_|\_\
                |___/                       

Local-first CLI coding agent. Chart your course. Execute with precision.

DryDock is a TUI coding assistant designed to work with local LLMs. It provides a conversational interface to your codebase — explore, modify, build, and test projects through natural language and a powerful set of tools.

[!IMPORTANT] DryDock is tested and optimized for Gemma 4 26B-A4B (26B MoE, 4B active parameters). Recommended serving stack: llama.cpp with --jinja (the chat-template fix that prevents the tool-call loops Gemma 4 hits under other backends). vLLM is also documented below as a higher-throughput alternative for batch/eval workloads. Other models and providers are supported (Mistral, OpenAI, Anthropic, Ollama) but are not as thoroughly tested. If you use a different model, expect to tune prompts and tool settings.

Tested Hardware + Model

Component Spec
GPUs 2× NVIDIA RTX 4060 Ti 16GB
Model (llama.cpp, recommended) unsloth/gemma-4-26B-A4B-it-GGUF — UD-Q3_K_M (12.7GB) or UD-Q4_K_M (16.9GB)
Model (vLLM, alternative) casperhansen/gemma-4-26b-a4b-it-AWQ-4bit
Performance ~15–17 tok/s decode (llama.cpp Q3), ~70 tok/s decode (vLLM AWQ)
Active params 4B per token (MoE architecture — fast inference)

Recommended path: llama.cpp + Unsloth GGUF

Why this is the recommended setup: Gemma 4's tool-calling format requires precise chat-template handling. Without --jinja, tool results get injected without the right turn markers and the model loops or returns empty assistant messages — the exact 400 Bad Request loop fixed in v2.7.39 (GH #14). With --jinja, the GGUF's bundled chat template handles tool turns natively and the loops disappear.

# 1. Download Unsloth's GGUF (Q3_K_M is the article-recommended quant;
#    UD-Q4_K_M is a higher-quality alternative if you have ~17GB VRAM)
huggingface-cli download unsloth/gemma-4-26B-A4B-it-GGUF \
    --include "gemma-4-26B-A4B-it-UD-Q3_K_M.gguf" \
    --local-dir /path/to/models

# 2. Build llama.cpp with CUDA (or use the Docker image
#    ghcr.io/ggml-org/llama.cpp:server-cuda)
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DLLAMA_CURL=OFF
cmake --build build --config Release -j8 --target llama-server

# 3. Start the server with the article recipe
./build/bin/llama-server \
    -m /path/to/models/gemma-4-26B-A4B-it-UD-Q3_K_M.gguf \
    --host 0.0.0.0 --port 8000 \
    -ngl 99 -c 32768 -np 1 \
    --jinja \
    -ctk q8_0 -ctv q8_0 \
    --alias gemma4

Critical flags:

  • --jinjathe loop-fix. Required for tool-using workflows. Without it, Gemma 4 enters infinite retry loops on multi-turn tool sessions.
  • -ngl 99 — offload all layers to GPU
  • -c 32768 — 32K context (fits in 16GB VRAM with q8 KV cache)
  • -ctk q8_0 -ctv q8_0 — quantize KV cache for longer contexts
  • -np 1 — single slot (concurrent requests serialize)
  • --alias gemma4 — what the API reports as the model field

Drydock config (~/.drydock/config.toml):

active_model = "gemma4"

[[providers]]
name = "local"
api_base = "http://localhost:8000/v1"
api_key_env_var = ""
backend = "generic"

[[models]]
name = "gemma4"
provider = "local"
alias = "gemma4"
temperature = 1.0           # MUST be 1.0 with --jinja — lower temps reinforce loops
context_window = 32768       # Match `-c 32768` from llama-server. Drydock
                             # auto-clamps auto_compact_threshold to
                             # context_window − 4096 so we never blow past
                             # the server's max input.
auto_compact_threshold = 28000

# Article-recommended sampling (passed through extra_sampling to llama-server).
# Drydock auto-bakes these on first launch when llama.cpp is detected at
# 127.0.0.1:8080 / :8000, but you can override here.
[models.extra_params]
top_k = 40
top_p = 0.95
frequency_penalty = 1.1
max_tokens = 2048

Alternative: vLLM (higher throughput, no --jinja equivalent)

vLLM has its own --tool-call-parser gemma4 path that works for most workflows, but has been observed to enter tool-call loops on long multi-turn sessions (GH #14, fixed at the drydock side in v2.7.39 by filtering empty assistant messages before re-call). Use vLLM when you need higher decode throughput (~70 tok/s vs llama.cpp's ~15–17) for batch eval or non-interactive workloads where loop-fix matters less.

huggingface-cli download casperhansen/gemma-4-26b-a4b-it-AWQ-4bit \
    --local-dir /path/to/models/Gemma-4-26B-A4B-it-AWQ-4bit

docker run -d \
    --gpus all \
    --name gemma4 \
    -p 8000:8000 \
    -v /path/to/models:/models \
    --ipc=host \
    vllm/vllm-openai:gemma4 \
    --model /models/Gemma-4-26B-A4B-it-AWQ-4bit \
    --quantization compressed-tensors \
    --tensor-parallel-size 2 \
    --max-model-len 131072 \
    --max-num-seqs 2 \
    --gpu-memory-utilization 0.95 \
    --kv-cache-dtype fp8 \
    --served-model-name gemma4 \
    --trust-remote-code \
    --tool-call-parser gemma4 \
    --enable-auto-tool-choice \
    --attention-backend TRITON_ATTN

Key flags:

  • --tensor-parallel-size 2 — split across 2 GPUs
  • --kv-cache-dtype fp8 — reduce KV cache memory for longer contexts
  • --tool-call-parser gemma4 + --enable-auto-tool-choice — required for Gemma 4 tool calling under vLLM
  • --max-num-seqs 2 — limit concurrent requests (prevents OOM on 16GB GPUs)

Verify either backend is running:

curl http://localhost:8000/v1/models

For vLLM, drydock config is the same as the llama.cpp block above, except temperature = 0.2 is fine — the --jinja requirement only applies to llama.cpp.

Install

pip install drydock-cli

Or with uv:

uv tool install drydock-cli

[!TIP] New install hitting weird behavior? See DEPLOYMENT.md for the exact known-working vLLM launch flags, ~/.drydock/config.toml, env vars, and a diagnostic checklist. Most "DryDock doesn't work" issues trace back to missing vLLM flags (--tool-call-parser gemma4, --enable-auto-tool-choice) or temperature/thinking config drift.

Quick Start

cd your-project/
drydock

First run creates a config at ~/.drydock/config.toml and prompts for your provider setup.

> Review the PRD and build the package

Features

  • TUI Interface: Full terminal UI with streaming output, tool approval, and session management.
  • Adaptive Thinking: Automatically adjusts reasoning depth per turn — full thinking for planning, fast mode for file writes.
  • Powerful Toolset: Read, write, and patch files. Execute shell commands. Search code with grep. Delegate to subagents.
  • Project-Aware: Scans project structure, loads AGENTS.md / DRYDOCK.md for context.
  • Subagent Delegation: Large tasks can be delegated to builder/planner/explorer subagents with isolated context.
  • Loop Detection: Advisory-only detection that nudges the model away from repetitive actions without blocking.
  • Conda/Pip Support: Auto-approves pip install, conda install, pytest, and other dev commands.
  • Bundled Skills: Ships with skills like create-presentation for PowerPoint generation.
  • MCP Support: Connect Model Context Protocol servers for extended capabilities.
  • Safety First: Tool execution approval with --dangerously-skip-permissions for full auto-approve.

Built-in Agents

  • default: Standard agent that requires approval for tool executions.
  • plan: Read-only agent for exploration and planning.
  • accept-edits: Auto-approves file edits only.
  • auto-approve: Auto-approves all tool executions.
drydock --agent plan

Gemma 4 Optimizations

DryDock includes several optimizations specifically tuned for Gemma 4:

  • Simplified prompt (gemma4.md): 20-line system prompt instead of 125 lines. Complex prompts cause Gemma 4 to plan instead of act.
  • Non-streaming mode: Streaming breaks Gemma 4 tool call JSON parsing. DryDock automatically disables streaming for Gemma 4.
  • Thinking token filtering: Gemma 4 leaks <|channel>thought<channel|> tokens into text output. DryDock strips these before storing in context.
  • Adaptive thinking: Full thinking for planning (turn 1) and error recovery. Thinking OFF for routine file writes — eliminates 30-120s hangs between files.
  • search_replace resilience: Auto-detects already-applied edits, infers missing file paths, fuzzy-matches whitespace differences.
  • Reduced tool set: Disables tools that confuse Gemma 4 (ask_user_question, task_create, etc.).

Usage

Interactive Mode

drydock                        # Start interactive session
drydock "Fix the login bug"    # Start with a prompt
drydock --continue             # Resume last session
drydock --resume abc123        # Resume specific session

Keyboard shortcuts:

  • Ctrl+C — Cancel current operation (double-tap to quit)
  • Shift+Tab — Toggle auto-approve mode
  • Ctrl+O — Toggle tool output
  • Ctrl+G — Open external editor
  • @ — File path autocompletion
  • !command — Run shell command directly

Programmatic Mode

drydock --prompt "Analyze the codebase" --max-turns 5 --output json
drydock --dangerously-skip-permissions -p "Fix all lint errors"

Configuration

DryDock is configured via config.toml. It looks first in ./.drydock/config.toml, then ~/.drydock/config.toml.

API Key

drydock --setup                              # Interactive setup
export MISTRAL_API_KEY="your_key"            # Or set env var

Keys are saved to ~/.drydock/.env.

Consultant Model

Set a smarter model for the /consult command:

consultant_model = "gemini-2.5-pro"

The consultant provides read-only advice — it never calls tools. Use /consult <question> to ask it.

Custom Agents

Create agent configs in ~/.drydock/agents/:

# ~/.drydock/agents/redteam.toml
active_model = "devstral-2"
system_prompt_id = "redteam"
disabled_tools = ["search_replace", "write_file"]

Skills

DryDock discovers skills from:

  1. Custom paths in config.toml via skill_paths
  2. Project .drydock/skills/ or .agents/skills/
  3. Global ~/.drydock/skills/
  4. Bundled skills (shipped with the package)

MCP Servers

[[mcp_servers]]
name = "fetch_server"
transport = "stdio"
command = "uvx"
args = ["mcp-server-fetch"]

Testing

DryDock uses a shakedown harness (scripts/shakedown.py) that drives the real TUI via pexpect and judges on user-perceptible criteria — not tool-call counts.

# Single project test
python3 scripts/shakedown.py \
    --cwd /path/to/project \
    --prompt "review the PRD and build the package" \
    --pkg package_name

# Interactive back-and-forth test
python3 scripts/shakedown_interactive.py \
    --cwd /path/to/project \
    --pkg package_name

# Full regression suite (370 PRDs)
bash scripts/shakedown_suite.sh

Pass criteria: no write loops, no ignored interrupts, no search_replace cascades, package executes, session finishes within time budget.

Slash Commands

Type /help in the input for available commands. Create custom slash commands via the skills system.

Session Management

drydock --continue              # Continue last session
drydock --resume abc123         # Resume specific session
drydock --workdir /path/to/dir  # Set working directory

License

Copyright 2025 Mistral AI (original work) Copyright 2026 DryDock contributors (modifications)

Licensed under the Apache License, Version 2.0. See LICENSE for details.

DryDock is a fork of mistralai/mistral-vibe (Apache 2.0). See NOTICE for attribution.

Project details


Release history Release notifications | RSS feed

This version

2.8.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drydock_cli-2.8.2.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drydock_cli-2.8.2-py3-none-any.whl (575.4 kB view details)

Uploaded Python 3

File details

Details for the file drydock_cli-2.8.2.tar.gz.

File metadata

  • Download URL: drydock_cli-2.8.2.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for drydock_cli-2.8.2.tar.gz
Algorithm Hash digest
SHA256 22c1b733e8cecc6e0a7a0a9e52ce8c634d32222c0239bc2b9e90e3afbffb0eea
MD5 5e106186f6d0b7cd5a7c47aa1279e94e
BLAKE2b-256 5cb339ec4d51e02d018c606bb8e927de2a97790bc37253dd05fcfa54d1e120ad

See more details on using hashes here.

File details

Details for the file drydock_cli-2.8.2-py3-none-any.whl.

File metadata

  • Download URL: drydock_cli-2.8.2-py3-none-any.whl
  • Upload date:
  • Size: 575.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for drydock_cli-2.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 73a9b2d217afaa81d9c0c35cd72c2e1f8aa0370135118a0d22b2d7cdc9dbc456
MD5 ffdb2277b8f32d0fb49ced319d8726e5
BLAKE2b-256 429c43095965964173d2b6419eafe491b5279eeae073d33fcdd04e7bc6f7bb56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page