Fine-tune LLMs in one command. No SSH, no config hell.
Project description
Soup
Fine-tune LLMs in one command. No SSH, no config hell.
Quick Start · Features · Data Tools · Tracking · Eval · Commands
Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.
pip install soup-cli
soup init --template chat
soup train
Why Soup?
Training LLMs is still painful. Even experienced teams spend 30-50% of their time fighting infrastructure instead of improving models. Soup fixes that.
- Zero SSH. Never SSH into a broken GPU box again.
- One config. A simple YAML file is all you need.
- Auto everything. Batch size, GPU detection, quantization — handled.
- Works locally. Train on your own GPU with QLoRA. No cloud required.
Quick Start
1. Install
# From PyPI (recommended):
pip install soup-cli
# Or from GitHub (latest dev):
pip install git+https://github.com/MakazhanAlpamys/Soup.git
2. Create config
# Interactive wizard
soup init
# Or use a template
soup init --template chat # conversational fine-tune
soup init --template code # code generation
soup init --template medical # domain expert
soup init --template reasoning # GRPO reasoning training
soup init --template vision # vision/multimodal fine-tune
soup init --template rlhf # full RLHF pipeline (SFT→RM→PPO)
3. Train
soup train --config soup.yaml
That's it. Soup handles LoRA setup, quantization, batch size, monitoring, and checkpoints.
4. Test your model
soup chat --model ./output
5. Push to HuggingFace
soup push --model ./output --repo your-username/my-model
6. Merge & Export
# Merge LoRA adapter with base model
soup merge --adapter ./output
# Export to GGUF for Ollama / llama.cpp
soup export --model ./output --format gguf --quant q4_k_m
Config Example
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
# backend: unsloth # 2-5x faster, pip install 'soup-cli[fast]'
data:
train: ./data/train.jsonl
format: alpaca
val_split: 0.1
training:
epochs: 3
lr: 2e-5
batch_size: auto
lora:
r: 64
alpha: 16
quantization: 4bit
output: ./output
Unsloth Backend (2-5x Faster Training)
Use the Unsloth backend for significantly faster training and up to 80% less VRAM:
# Install unsloth support
pip install 'soup-cli[fast]'
Then add one line to your config:
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
backend: unsloth # 2-5x faster, -80% VRAM
data:
train: ./data/train.jsonl
format: alpaca
training:
epochs: 3
lr: 2e-5
quantization: 4bit
lora:
r: 64
alpha: 16
Works with all training tasks: SFT, DPO, GRPO, and PPO. If unsloth is installed but not enabled, Soup will suggest it automatically.
Tip: Soup auto-detects unsloth. When installed, you'll see a hint during
soup trainif you haven't enabled it yet.
Vision / Multimodal Fine-tuning
Fine-tune vision-language models (LLaMA-3.2-Vision, Qwen2-VL, Pixtral) on image+text data:
# Install vision support
pip install 'soup-cli[vision]'
# Create a vision config
soup init --template vision
# Train
soup train --config soup.yaml
base: meta-llama/Llama-3.2-11B-Vision-Instruct
task: sft
modality: vision
data:
train: ./data/vision_train.jsonl
format: llava
image_dir: ./data/images
val_split: 0.1
training:
epochs: 3
lr: 1e-5
quantization: 4bit
lora:
r: 64
alpha: 16
Supported vision data formats:
LLaVA:
{"image": "photo.jpg", "conversations": [{"from": "human", "value": "<image>\nDescribe this image."}, {"from": "gpt", "value": "A cat on a mat."}]}
ShareGPT4V:
{"image": "chart.png", "conversations": [{"from": "human", "value": "<image>\nWhat does this show?"}, {"from": "gpt", "value": "Quarterly revenue."}]}
soup data inspect automatically shows image statistics (count, formats, missing files) for vision datasets.
Quantization-Aware Training (QAT)
Train with simulated quantization for significantly better post-quantization quality compared to standard QLoRA:
# Install QAT support
pip install 'soup-cli[qat]'
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
data:
train: ./data/train.jsonl
format: alpaca
training:
epochs: 3
lr: 2e-5
quantization: 4bit
quantization_aware: true # Enable QAT
lora:
r: 64
alpha: 16
output: ./output
When to use QAT vs post-training quantization:
- QAT (
quantization_aware: true): Better quality when you plan to deploy with aggressive quantization (int8/int4). ~5-10% slower training, but the model learns to compensate for quantization noise. - Post-training quantization (default): Faster training, good enough for most use cases. Quantize after training with
soup export --quant q4_k_m.
QAT works with all training tasks (SFT, DPO, GRPO, PPO) and vision modality. Not compatible with the unsloth backend. After QAT training, export to GGUF normally with soup export.
DPO Training
Train with preference data using Direct Preference Optimization:
base: meta-llama/Llama-3.1-8B-Instruct
task: dpo
data:
train: ./data/preferences.jsonl
format: dpo
training:
epochs: 3
dpo_beta: 0.1
lora:
r: 64
alpha: 16
quantization: 4bit
GRPO Training (Reasoning)
Train reasoning models with Group Relative Policy Optimization (DeepSeek-R1 style):
base: meta-llama/Llama-3.1-8B-Instruct
task: grpo
data:
train: ./data/reasoning_train.jsonl
format: sharegpt
max_length: 4096
training:
epochs: 3
lr: 1e-5
grpo_beta: 0.1
num_generations: 4
reward_fn: accuracy # or 'format', or path to custom .py
lora:
r: 64
alpha: 16
quantization: 4bit
# Create a reasoning config
soup init --template reasoning
# Train
soup train --config soup.yaml
Built-in reward functions:
accuracy— checks if the final answer matches expected (supports####and\boxed{}formats)format— checks for structured<think>...</think>reasoning blocks
Custom reward functions — point to a Python file:
# my_reward.py
def reward_fn(completions, **kwargs):
"""Score each completion. Return list of floats."""
return [1.0 if "correct" in c[-1]["content"] else 0.0 for c in completions]
training:
reward_fn: ./my_reward.py
PPO / Full RLHF Pipeline
Train models with the full RLHF pipeline: SFT warmup → Reward Model → PPO alignment.
# Create an RLHF config
soup init --template rlhf
Step 1: SFT warmup — fine-tune a base model on your data:
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
data:
train: ./data/train.jsonl
format: alpaca
output: ./output_sft
Step 2: Train reward model — learn preferences from human feedback:
base: meta-llama/Llama-3.1-8B-Instruct
task: reward_model
data:
train: ./data/preferences.jsonl
format: dpo
output: ./output_rm
Step 3: PPO alignment — optimize the policy using the reward model:
base: meta-llama/Llama-3.1-8B-Instruct
task: ppo
data:
train: ./data/prompts.jsonl
format: chatml
training:
reward_model: ./output_rm
ppo_epochs: 4
ppo_clip_ratio: 0.2
ppo_kl_penalty: 0.05
lora:
r: 64
alpha: 16
quantization: 4bit
output: ./output_ppo
PPO supports two reward sources:
- Reward model (
reward_model): pre-trained reward model (from step 2) - Reward function (
reward_fn): callable function (same as GRPO —accuracy,format, or custom.py)
Chat with your model
# Chat with a LoRA adapter (auto-detects base model)
soup chat --model ./output
# Specify base model explicitly
soup chat --model ./output --base meta-llama/Llama-3.1-8B-Instruct
# Adjust generation
soup chat --model ./output --temperature 0.3 --max-tokens 256
Push to HuggingFace
# Upload model to HF Hub
soup push --model ./output --repo your-username/my-model
# Make it private
soup push --model ./output --repo your-username/my-model --private
Merge LoRA Adapter
Merge a LoRA adapter with its base model into a standalone model:
# Auto-detect base model from adapter_config.json
soup merge --adapter ./output --output ./merged
# Specify base model and dtype
soup merge --adapter ./output --base meta-llama/Llama-3.1-8B --dtype bfloat16
Export to GGUF
Export models to GGUF format for use with Ollama and llama.cpp:
# Export LoRA adapter (auto-merges with base, then converts)
soup export --model ./output --format gguf --quant q4_k_m
# Export with different quantizations
soup export --model ./output --format gguf --quant q8_0
soup export --model ./output --format gguf --quant f16
# Export a full (already merged) model
soup export --model ./merged --format gguf
# Specify llama.cpp path manually
soup export --model ./output --format gguf --llama-cpp /path/to/llama.cpp
Supported quantizations: q4_0, q4_k_m, q5_k_m, q8_0, f16, f32
After export, use with Ollama:
echo 'FROM ./my-model.q4_k_m.gguf' > Modelfile
ollama create my-model -f Modelfile
ollama run my-model
Resume Training
Resume a training run from a checkpoint:
# Auto-detect latest checkpoint in output directory
soup train --config soup.yaml --resume auto
# Resume from a specific checkpoint
soup train --config soup.yaml --resume ./output/checkpoint-500
Weights & Biases Integration
Send training metrics to W&B for cloud-based experiment tracking:
# Enable W&B logging (requires: pip install wandb)
soup train --config soup.yaml --wandb
Make sure WANDB_API_KEY is set or run wandb login first.
Inference Server
Start a local OpenAI-compatible inference server:
# Install server dependencies
pip install 'soup-cli[serve]'
# Start server
soup serve --model ./output --port 8000
# With custom settings
soup serve --model ./output --port 8080 --host 127.0.0.1 --max-tokens 1024
Endpoints:
POST /v1/chat/completions— chat completions (streaming supported)GET /v1/models— list available modelsGET /health— health check
Compatible with OpenAI SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
model="output",
messages=[{"role": "user", "content": "Hello!"}],
)
vLLM Backend (2-4x Faster Inference)
Use vLLM for significantly better throughput in production:
# Install vLLM support
pip install 'soup-cli[serve-fast]'
# Start with vLLM backend
soup serve --model ./output --backend vllm
# Multi-GPU with tensor parallelism
soup serve --model ./output --backend vllm --tensor-parallel 2
# Control GPU memory usage
soup serve --model ./output --backend vllm --gpu-memory 0.8
Tip: Soup auto-detects vLLM. When installed, you'll see a hint during
soup serveif you haven't enabled it yet.
Synthetic Data Generation
Generate training data using LLMs:
# Generate using OpenAI API
soup data generate --prompt "Create math word problems" --count 100 --format alpaca
# Use a different model
soup data generate --prompt "Medical Q&A pairs" --model gpt-4o --count 500
# Deduplicate against existing data
soup data generate --prompt "..." --count 200 --dedup-with existing.jsonl
# Use seed examples to guide style
soup data generate --prompt "..." --seed examples.jsonl --count 100
Hyperparameter Sweep
Search for the best hyperparameters:
# Grid search over learning rate and LoRA rank
soup sweep --config soup.yaml --param lr=1e-5,2e-5,5e-5 --param lora_r=8,16,32
# Random search with max runs
soup sweep --config soup.yaml --param lr=1e-5,2e-5,5e-5 --strategy random --max-runs 5
# Preview without running
soup sweep --config soup.yaml --param lr=1e-5,2e-5 --param epochs=2,3 --dry-run
# Early stopping: skip remaining runs if loss exceeds 1.5x best
soup sweep --config soup.yaml --param lr=1e-5,2e-5,5e-5 --early-stop 1.5
Model Comparison
Compare outputs of two models side-by-side:
# Compare with inline prompts
soup diff --model-a ./model_v1 --model-b ./model_v2 --prompt "Explain gravity"
# Compare with a prompts file
soup diff --model-a ./base --model-b ./finetuned --prompts test_prompts.jsonl
# Save results
soup diff --model-a ./a --model-b ./b --prompts prompts.txt --output results.jsonl
Multi-GPU / DeepSpeed
Train on multiple GPUs with DeepSpeed:
# ZeRO Stage 2 (recommended for most cases)
soup train --config soup.yaml --deepspeed zero2
# ZeRO Stage 3 (for very large models)
soup train --config soup.yaml --deepspeed zero3
# ZeRO Stage 2 with CPU offload (memory-constrained)
soup train --config soup.yaml --deepspeed zero2_offload
# Custom DeepSpeed config
soup train --config soup.yaml --deepspeed ./my_ds_config.json
Quickstart Demo
Run a complete demo in one command — creates sample data, config, and trains a tiny model:
# Full demo (creates data + config + trains TinyLlama)
soup quickstart
# Just create files without training
soup quickstart --dry-run
# Skip confirmation
soup quickstart --yes
Health Check
Check your environment for compatibility issues:
soup doctor
Shows: Python version, GPU availability, all dependency versions, and fix suggestions.
Version Info
# Basic version
soup version
# Full system info (useful for bug reports)
soup version --full
# -> soup v0.10.5 | Python 3.11.5 | CUDA 12.1 | extras: serve, data
Web UI
Launch a local web interface to manage experiments, start training, explore data, and chat with models — all from your browser.
pip install 'soup-cli[ui]'
soup ui
# -> opens http://127.0.0.1:7860 in your browser
Pages:
- Dashboard — view all experiment runs, loss charts, system info
- New Training — create configs from templates, validate, and start training
- Data Explorer — browse and inspect datasets (JSONL, JSON, CSV, Parquet)
- Model Chat — chat with a running
soup serveinference server
# Custom port, don't auto-open browser
soup ui --port 8080 --no-browser
Error Handling
Soup shows friendly error messages by default (2-3 lines with a fix suggestion). For full tracebacks:
# Global flag goes BEFORE the command
soup --verbose train --config soup.yaml
# Works with any command
soup --verbose eval --model ./output --benchmarks mmlu
Note:
--verboseis a global flag — it must go before the command name, not after.
Data Formats
Soup supports these formats (auto-detected). Files can be JSONL, JSON, CSV, or Parquet.
Alpaca:
{"instruction": "Explain gravity", "input": "", "output": "Gravity is..."}
ShareGPT:
{"conversations": [{"from": "human", "value": "Hi"}, {"from": "gpt", "value": "Hello!"}]}
ChatML:
{"messages": [{"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello!"}]}
DPO (preference pairs):
{"prompt": "Explain gravity", "chosen": "Gravity is a force...", "rejected": "I don't know"}
LLaVA (vision):
{"image": "photo.jpg", "conversations": [{"from": "human", "value": "<image>\nDescribe this."}, {"from": "gpt", "value": "A cat."}]}
ShareGPT4V (vision):
{"image": "chart.png", "conversations": [{"from": "human", "value": "<image>\nExplain this chart."}, {"from": "gpt", "value": "Revenue growth."}]}
Data Tools
# Inspect a dataset
soup data inspect ./data/train.jsonl
# Validate format
soup data validate ./data/train.jsonl --format alpaca
# Convert between formats
soup data convert ./data/train.jsonl --to sharegpt --output converted.jsonl
# Merge multiple datasets
soup data merge data1.jsonl data2.jsonl --output merged.jsonl --shuffle
# Remove near-duplicates (requires: pip install 'soup-cli[data]')
soup data dedup ./data/train.jsonl --threshold 0.8
# Extended statistics (length distribution, token counts, languages)
soup data stats ./data/train.jsonl
Experiment Tracking
Every soup train run is automatically tracked in a local SQLite database (~/.soup/experiments.db).
# List all training runs
soup runs
# Show detailed info + loss curve for a run
soup runs show run_20260223_143052_a1b2
# Compare two runs side by side
soup runs compare run_1 run_2
# Delete a run
soup runs delete run_1
Model Evaluation
Evaluate models on standard benchmarks using lm-evaluation-harness:
# Install eval dependencies
pip install 'soup-cli[eval]'
# Evaluate on benchmarks
soup eval --model ./output --benchmarks mmlu,gsm8k,hellaswag
# Link results to a training run
soup eval --model ./output --benchmarks mmlu --run-id run_20260223_143052_a1b2
All Commands
soup init [--template chat|code|medical|reasoning|vision|rlhf] Create config
soup train --config soup.yaml Start training
soup chat --model ./output Interactive chat
soup push --model ./output --repo user/name Upload to HuggingFace
soup merge --adapter ./output Merge LoRA with base model
soup export --model ./output --format gguf Export to GGUF (Ollama)
soup eval --model ./output --benchmarks mmlu Evaluate on benchmarks
soup serve --model ./output --port 8000 OpenAI-compatible API server
soup serve --model ./output --backend vllm vLLM backend (2-4x throughput)
soup sweep --config soup.yaml --param lr=... Hyperparameter search
soup diff --model-a ./a --model-b ./b Compare two models
soup data inspect <path> View dataset stats
soup data validate <path> --format alpaca Check format
soup data convert <path> --to chatml Convert between formats
soup data merge data1.jsonl data2.jsonl Combine datasets
soup data dedup <path> --threshold 0.8 Remove duplicates (MinHash)
soup data stats <path> Extended statistics
soup data generate --prompt "..." --count 100 Generate synthetic data
soup runs List training runs
soup runs show <run_id> Run details + loss graph
soup runs compare <run_1> <run_2> Compare two runs
soup ui [--port 7860] Web UI (experiments, training, data)
soup doctor Check environment
soup quickstart [--dry-run] Full demo
soup version [--full] Show version (--full: system info)
soup --verbose <command> Full traceback on errors
Requirements
- Python 3.9+
- GPU with CUDA (recommended) or Apple Silicon (MPS) or CPU (slow)
- 8 GB+ VRAM for 7B models with QLoRA
Optional Extras
| Extra | Install | What it adds |
|---|---|---|
vision |
pip install 'soup-cli[vision]' |
Vision/multimodal fine-tuning (Pillow) |
qat |
pip install 'soup-cli[qat]' |
Quantization-Aware Training (torchao) |
fast |
pip install 'soup-cli[fast]' |
Unsloth backend (2-5x faster, -80% VRAM) |
ui |
pip install 'soup-cli[ui]' |
Web UI + inference server (FastAPI + uvicorn) |
serve |
pip install 'soup-cli[serve]' |
Inference server (FastAPI + uvicorn) |
serve-fast |
pip install 'soup-cli[serve-fast]' |
vLLM inference backend (2-4x throughput) |
data |
pip install 'soup-cli[data]' |
Deduplication (MinHash via datasketch) |
eval |
pip install 'soup-cli[eval]' |
Benchmark evaluation (lm-evaluation-harness) |
deepspeed |
pip install 'soup-cli[deepspeed]' |
Multi-GPU training (DeepSpeed ZeRO) |
dev |
pip install 'soup-cli[dev]' |
Tests + linting (pytest, ruff) |
Development
git clone https://github.com/MakazhanAlpamys/Soup.git
cd Soup
pip install -e ".[dev]"
# Lint
ruff check soup_cli/ tests/
# Run unit tests (fast, no GPU needed)
pytest tests/ -v
# Run smoke tests (downloads tiny model, runs real training)
pytest tests/ -m smoke -v
Changelog
See GitHub Releases for version history.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soup_cli-0.10.5.tar.gz.
File metadata
- Download URL: soup_cli-0.10.5.tar.gz
- Upload date:
- Size: 474.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74b9df026e364c92294bbd7496283e9dfd4ff9abe64426eed945a8ca135e498b
|
|
| MD5 |
7987fe16f99edb08362fc15af119bae3
|
|
| BLAKE2b-256 |
58dd73d57a5a8d2f71e46d12deeed502566e013f80a72fcb34a5b2bfebb7e3e7
|
Provenance
The following attestation bundles were made for soup_cli-0.10.5.tar.gz:
Publisher:
publish.yml on MakazhanAlpamys/Soup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soup_cli-0.10.5.tar.gz -
Subject digest:
74b9df026e364c92294bbd7496283e9dfd4ff9abe64426eed945a8ca135e498b - Sigstore transparency entry: 1171960199
- Sigstore integration time:
-
Permalink:
MakazhanAlpamys/Soup@690d4fa7ddd9d7999f3b17d04e4e865d1823368f -
Branch / Tag:
refs/tags/v0.10.5 - Owner: https://github.com/MakazhanAlpamys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@690d4fa7ddd9d7999f3b17d04e4e865d1823368f -
Trigger Event:
push
-
Statement type:
File details
Details for the file soup_cli-0.10.5-py3-none-any.whl.
File metadata
- Download URL: soup_cli-0.10.5-py3-none-any.whl
- Upload date:
- Size: 274.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e0ed255c35f31d75338591aec4fda945a1f422fad581514a7c4a02c12e0d683
|
|
| MD5 |
2b4d7946d34de6f1d6a6923d852ba1ec
|
|
| BLAKE2b-256 |
bc05e51bf6636e39ff085b3245dfac7006248fa79531badbcf215ca27b255d24
|
Provenance
The following attestation bundles were made for soup_cli-0.10.5-py3-none-any.whl:
Publisher:
publish.yml on MakazhanAlpamys/Soup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soup_cli-0.10.5-py3-none-any.whl -
Subject digest:
0e0ed255c35f31d75338591aec4fda945a1f422fad581514a7c4a02c12e0d683 - Sigstore transparency entry: 1171960229
- Sigstore integration time:
-
Permalink:
MakazhanAlpamys/Soup@690d4fa7ddd9d7999f3b17d04e4e865d1823368f -
Branch / Tag:
refs/tags/v0.10.5 - Owner: https://github.com/MakazhanAlpamys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@690d4fa7ddd9d7999f3b17d04e4e865d1823368f -
Trigger Event:
push
-
Statement type: