Fine-tune LLMs in one command. No SSH, no config hell.
Project description
Soup
Fine-tune LLMs in one command. No SSH, no config hell.
Website · Quick Start · Config · Docs · Commands · Models
Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.
pip install 'soup-cli[train]' # add [train] to fine-tune; bare `soup-cli` is the light CLI
soup init --template chat
soup train
Why Soup?
Training LLMs is still painful. Even experienced teams spend 30-50% of their time fighting infrastructure instead of improving models. Soup fixes that.
- Zero SSH. Never SSH into a broken GPU box again.
- One config. A simple YAML file is all you need.
- Auto everything. Batch size, GPU detection, quantization — handled.
- Works locally. Train on your own GPU with QLoRA. No cloud required.
What's New
v0.71.1 — Quick wins + wiring. Seven small-but-sharp closures:
soup env fixrenders a reproducible install plan (copy/pasteuv pipcommands or arequirements.txt) straight fromsoup-env.lock— print-only, no surprise package-manager calls.soup lock write --env-lockauto-derives the env hash fromsoup-env.lockso you never hand-copy a 64-hex string aftersoup env lock.soup serve --record-thumbs <db>captures 👍/👎 feedback into a local-RL SQLite, plus a newPOST /v1/thumbsendpoint — the start of an on-box feedback flywheel.- Judge calibration persistence — write/load a
JudgeCalibrationReportas JSON, backed by a newjudge_calibrationregistry artifact kind. - Bundled MUSE + WMDP unlearning eval fixtures so
soup eval unlearning --benchmark muse|wmdpruns out of the box (WMDP forget-set probes ship redacted — never verbatim hazardous content). soup completionsnow introspects a cached base model's real LoRA target modules.
Full history: CHANGELOG.md · GitHub Releases.
Quick Start
1. Install
pip install soup-cli # light: CLI + config + data tools (no PyTorch)
pip install 'soup-cli[train]' # add the training stack (torch, transformers, peft, trl, …)
pip install git+https://github.com/MakazhanAlpamys/Soup.git # latest dev
soup init, soup data …, and the other data/inspection commands work on the light install.
Fine-tuning (soup train) needs the [train] extra.
2. Create a config
soup init # interactive wizard
soup init --template chat # or start from a template
Templates: chat, code, tool-calling, medical, reasoning, vision, kto, orpo,
simpo, ipo, bco, rlhf, pretrain, moe, longcontext, embedding, audio.
3. Train, test, ship
soup train --config soup.yaml # LoRA, quantization, batching — all handled
soup chat --model ./output # talk to your model
soup push --model ./output --repo you/my-model
soup merge --adapter ./output # merge LoRA into the base
soup export --model ./output --format gguf --quant q4_k_m # GGUF for Ollama / llama.cpp
More export targets (ONNX, TensorRT, AWQ, GPTQ, BitNet) and deployment options live in
docs/serving-and-export.md.
Configuration
A complete soup.yaml:
base: meta-llama/Llama-3.1-8B-Instruct
task: sft
# backend: unsloth # 2-5x faster, pip install 'soup-cli[fast]'
data:
train: ./data/train.jsonl
format: alpaca
val_split: 0.1
training:
epochs: 3
lr: 2e-5
batch_size: auto
lora:
r: 64
alpha: 16
quantization: 4bit
output: ./output
config/schema.py is the single source of truth for every field. Advanced data, training,
and PEFT options are documented under Documentation.
Documentation
The full feature reference lives in docs/. Start here:
| Guide | Covers |
|---|---|
| Training tasks & methods | SFT, DPO/GRPO/PPO/KTO/ORPO/SimPO/IPO/BCO, tool-calling, PRM, pre-training, distillation, classification, vision/audio/TTS, unlearning, RAFT/RA-DIT, loop-hardening detectors |
| PEFT, long context & efficiency | DoRA, LoRA+, rsLoRA, VeRA, OLoRA, NEFTune, PiSSA, ReLoRA, optimizer & PEFT zoo, LLaMA Pro, GaLore, YaRN/LongLoRA, packing, curriculum, auto-tuning |
| Performance & quantization | QAT, FP8, Quant Menu (I + II), KV-cache, NVFP4, save formats, Cut Cross-Entropy, gradient checkpointing, kernels, activation offloading, multi-GPU / DeepSpeed / FSDP |
| Data engineering | Formats, the Axolotl/LF-parity pipeline, data tools, synthetic generation & forge, quality scorecards, trace tooling, remote datasets, mixing, recipe DAGs |
| Evaluation & probes | Eval design/gate, eval-gated training, benchmarks, NLG metrics, calibration, Elo arena, diagnose, post-train X-ray probes, A/B, drift, tunability, soup advise |
| Serving & export | OpenAI-compatible server, batch inference, benchmarking, merge/export, Anthropic Messages endpoint, speculative decoding, deploy autopilot, Web UI, Agent Forge |
| Adapters, registry & governance | Adapter lifecycle/management, model registry, Soup Cans, the data flywheel (soup loop), knowledge editing, steering, supply-chain controls (scan/sign/BOM/attest/audit/airgap) |
| Backends, platform & ops | MLX/Unsloth backends, alternative hubs, HF Hub integration, autopilot, experiment tracking, plan/apply, env lockfiles, hardware-fit, completions, plugins, utility commands |
| Command reference | The full soup command list |
| Supported models & extras | Recommended model families, the VRAM size guide, the pip extras matrix |
Data Formats
All formats are auto-detected from JSONL, JSON, CSV, Parquet, or TXT:
- alpaca —
{"instruction": ..., "input": ..., "output": ...} - sharegpt —
{"conversations": [{"from": "human", "value": ...}, ...]} - chatml —
{"messages": [{"role": "user", "content": ...}, ...]} - dpo / orpo / simpo / ipo —
{"prompt": ..., "chosen": ..., "rejected": ...} - kto —
{"prompt": ..., "completion": ..., "label": true} - llava / sharegpt4v (vision), audio, plaintext (pre-training), embedding, prm, pre_tokenized, video, multimodal
Full schemas and the Axolotl/LlamaFactory-parity data pipeline (remote URIs, streaming,
sharding, interleaving, vocab expansion, document ingestion) are in
docs/data.md.
Common Commands
soup train --config soup.yaml # train (SFT/DPO/GRPO/PPO/KTO/ORPO/SimPO/IPO/...)
soup infer --model ./output --input prompts.jsonl # batch inference
soup chat --model ./output # interactive chat
soup serve --model ./output # OpenAI-compatible API server
soup merge --adapter ./output # merge LoRA into the base model
soup export --model ./output --format gguf # export for deployment
soup eval benchmark --model ./output # evaluate
soup data inspect ./data/train.jsonl # dataset stats
soup recipes list # 100+ ready-made model recipes
soup autopilot --model <id> --data d.jsonl --goal chat # zero-config
soup doctor # check GPU / deps / environment
The complete command list is in docs/commands.md.
Supported Models
Soup works with any text-generation model on the
HuggingFace Hub — if it loads with
AutoModelForCausalLM, it works, zero config changes. Llama 3.x/4, Qwen 2.5/3, Gemma 3, Mistral,
Mixtral, DeepSeek R1/V3, Phi-4, and 100+ others ship as ready-made recipes (soup recipes list).
| VRAM | Max model (QLoRA 4-bit) | Example |
|---|---|---|
| 8 GB | ~7B | Llama-3.1-8B, Mistral-7B |
| 16 GB | ~14B | Phi-4-14B, Qwen2.5-14B |
| 24 GB | ~34B | CodeLlama-34B, Yi-1.5-34B |
| 48 GB | ~70B | Llama-3.3-70B |
| 80 GB+ | 70B+ (full) or MoE | Mixtral-8x22B, DeepSeek-V3 |
Full model + vision tables and the optional-extras matrix are in docs/models.md.
Docker
Run Soup without installing CUDA or PyTorch locally (image published to GHCR on every release):
docker pull ghcr.io/makazhanalpamys/soup:latest
docker run --gpus all -v $(pwd):/workspace ghcr.io/makazhanalpamys/soup train --config soup.yaml
docker compose up # or build locally
Requirements
- Python 3.10+
- GPU with CUDA (recommended), Apple Silicon (MPS), or CPU (experimental — very slow)
- 8 GB+ VRAM for 7B models with QLoRA
All training tasks run on CPU for testing (quantization auto-disabled). Optional extras
(train, all, fast, vision, qat, serve, serve-fast, ui, eval, deepspeed,
liger, mlx, onnx, tensorrt, …) are listed in
docs/models.md.
Troubleshooting
soup doctor # GPU, system resources, dependencies, and version in one place
ImportError: DLL load failed while importing _C(Windows) — reinstall PyTorch for your CUDA version:pip install torch --index-url https://download.pytorch.org/whl/cu121.soup version≠pip show soup-cli— multiple Python installs; use a virtualenv.
Development
git clone https://github.com/MakazhanAlpamys/Soup.git
cd Soup
pip install -e ".[dev]"
ruff check src/soup_cli/ tests/ # lint
pytest tests/ -v # unit tests (fast, no GPU)
pytest tests/ -m smoke -v # smoke tests (downloads a tiny model, trains)
pre-commit install # optional: ruff lint+format on commit
See CONTRIBUTING.md for the full workflow and SECURITY.md to report a vulnerability.
License
Apache-2.0. Copyright © the Soup contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soup_cli-0.71.1.tar.gz.
File metadata
- Download URL: soup_cli-0.71.1.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
443814d7c7f84e8cb441e800bb784437718305779d22125e7dcc6afc372e785b
|
|
| MD5 |
76f0f4d9d60ec4f15f7c044cb97afb16
|
|
| BLAKE2b-256 |
f49a4f037e73d5fc0f4062fe5b69622981fb6a82d7d29ee42505396c95bcd891
|
Provenance
The following attestation bundles were made for soup_cli-0.71.1.tar.gz:
Publisher:
publish.yml on MakazhanAlpamys/Soup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soup_cli-0.71.1.tar.gz -
Subject digest:
443814d7c7f84e8cb441e800bb784437718305779d22125e7dcc6afc372e785b - Sigstore transparency entry: 1690890048
- Sigstore integration time:
-
Permalink:
MakazhanAlpamys/Soup@0ee5f78986c96a01d184956c9ff17ef1080b2046 -
Branch / Tag:
refs/tags/v0.71.1 - Owner: https://github.com/MakazhanAlpamys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0ee5f78986c96a01d184956c9ff17ef1080b2046 -
Trigger Event:
push
-
Statement type:
File details
Details for the file soup_cli-0.71.1-py3-none-any.whl.
File metadata
- Download URL: soup_cli-0.71.1-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d568f04caf0913fe3a1a76c3df566f41e95bde2f7f4caecb2020b899f6130605
|
|
| MD5 |
6e15851a9bb2295323ee743073ab03f6
|
|
| BLAKE2b-256 |
202cb57c89b802b498feead305cf624e5c00f6cabcbb91057fc1d42ac4d9a69c
|
Provenance
The following attestation bundles were made for soup_cli-0.71.1-py3-none-any.whl:
Publisher:
publish.yml on MakazhanAlpamys/Soup
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
soup_cli-0.71.1-py3-none-any.whl -
Subject digest:
d568f04caf0913fe3a1a76c3df566f41e95bde2f7f4caecb2020b899f6130605 - Sigstore transparency entry: 1690890053
- Sigstore integration time:
-
Permalink:
MakazhanAlpamys/Soup@0ee5f78986c96a01d184956c9ff17ef1080b2046 -
Branch / Tag:
refs/tags/v0.71.1 - Owner: https://github.com/MakazhanAlpamys
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0ee5f78986c96a01d184956c9ff17ef1080b2046 -
Trigger Event:
push
-
Statement type: