Fine-tune any LLM with a beautiful TUI. Zero config, maximum power.
Project description
๐ฅ FlameForge
Fine-tune any LLM with a beautiful TUI. Zero config, maximum power.
Supports NVIDIA CUDA (PyTorch/PEFT) and Apple Silicon (MLX).
FlameForge is a terminal user interface that takes you from raw data to a working fine-tuned model in under ten minutes of active effort โ no YAML wrangling, no CUDA incantations, no guessing whether a model will fit in memory. Pick a model, pick a method, load your data, and watch the loss curve fall in real time.
โโ Loss โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 2.80 โโ โ
โ โ โโ โ
โ โ โโโ โ
โ โ โโโโโโ โ
โ โ โโโโโโโโโโโโโ โ
โ 0.74 โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โจ Features
- Zero configuration. Device, data format, chat template, and batch size are all auto-detected. You only answer questions the tool genuinely can't.
- Runs everywhere. NVIDIA GPUs via PyTorch + PEFT + TRL, Apple Silicon via MLX. The right backend is chosen for you automatically.
- Memory-safe by design. Every model is size-checked against a conservative budget before loading, so your Mac never freezes mid-train.
- LoRA, QLoRA, DoRA, and full fine-tuning with sensible, auto-tuned defaults.
- A live training dashboard with a real-time loss chart, throughput, memory usage, ETA, and pause / checkpoint / stop controls.
- Frictionless auth. Gated models walk you through getting a token instead of throwing a cryptic 401.
- Helpful errors, never tracebacks. Every failure explains what happened and exactly what to do next.
- Flexible export. Save the adapter, a merged standalone model, or a GGUF for llama.cpp / Ollama.
๐ Quick start
# Apple Silicon (M1/M2/M3/M4)
pip install "flameforge[mlx]"
# NVIDIA CUDA
pip install "flameforge[cuda]"
# Then just run:
flameforge
That's it. The welcome screen confirms your device and memory budget, and from there it's six guided steps: model โ method โ data โ config โ train โ export.
No GPU handy? FlameForge still launches and runs the whole flow in a simulation mode with a synthetic loss curve, so you can explore the UI before installing a backend.
๐ Requirements
- Python 3.10+
- An NVIDIA GPU with CUDA, or an Apple Silicon Mac
- CPU-only works for the smallest models, but is very slow
๐ค Supported models
FlameForge ships a curated registry (browse it in the TUI), but you can fine-tune any HuggingFace causal-LM by typing its id or pointing at a local path. A few highlights:
| Model | Size | License | Auth | Cheapest method |
|---|---|---|---|---|
| Llama 3.2 1B Instruct | 1.0B | Llama 3.2 | ๐ | QLoRA ~2 GB |
| Qwen 2.5 0.5B / 1.5B / 3B / 7B | 0.5โ7B | Apache 2.0 | โ | QLoRA |
| Mistral 7B Instruct v0.3 | 7.0B | Apache 2.0 | โ | QLoRA ~5 GB |
| Llama 3.1 8B Instruct | 8.0B | Llama 3.1 | ๐ | QLoRA ~6 GB |
| Gemma 2 2B / 9B Instruct | 2โ9B | Gemma | ๐ | QLoRA |
| Phi-3 / Phi-3.5 Mini | 3.8B | MIT | โ | QLoRA ~3 GB |
| Llama 3.1 70B Instruct | 70B | Llama 3.1 | ๐ | QLoRA ~40 GB |
๐ = requires a free HuggingFace account + license acceptance (FlameForge guides you through it).
๐ Data format guide
FlameForge auto-detects your format from the file. All of these work out of the box:
Alpaca / instruction (.jsonl or .json)
{"instruction": "Summarize this.", "input": "Long textโฆ", "output": "Short summary."}
Alternate keys are auto-mapped: prompt/completion, question/answer.
Conversational / ShareGPT / OpenAI (.jsonl or .json)
{"messages": [{"role": "system", "content": "โฆ"}, {"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello!"}]}
conversations with from/value turns is also recognised. Multi-turn supported.
CSV / TSV โ columns are auto-mapped (e.g. question โ instruction,
answer โ output); ambiguous files get a column picker.
Raw text (.txt) โ documents separated by blank lines, for continued
pre-training.
The correct chat template for your model family (Llama 3, Mistral, ChatML, Gemma, Phi-3, Qwen) is applied automatically โ preferring the tokenizer's own template when available. A preview shows you exactly what the model will see before you commit.
There are ready-to-run samples in examples/.
โ๏ธ Configuration
Sensible defaults are auto-tuned from your model, hardware, and dataset size (fewer epochs for tiny datasets, bf16โfp16 on older GPUs, a memory-aware batch size, and so on). You can accept them or adjust anything on the config screen.
To start from a custom defaults file:
flameforge --config my_config.yaml # see configs/default.yaml for the schema
Other flags:
flameforge --model meta-llama/Llama-3.2-3B-Instruct # pre-select a model
flameforge --max-memory-gb 14 # set the memory budget to use
flameforge --version
Memory budget. On the welcome screen you can set the memory budget FlameForge
is allowed to use โ lower it to be safe, or raise it above the conservative
default if you know what you're doing. This matters most on Apple Silicon, where
GPU memory is shared with macOS: the default leaves headroom so your machine
stays responsive, and raising it past that default is flagged as risky. The same
value can be passed up front with --max-memory-gb.
Authentication is read from HF_TOKEN / HUGGING_FACE_HUB_TOKEN or
~/.cache/huggingface/token (e.g. after huggingface-cli login).
๐ฆ Export
When training finishes you can export:
- Adapter โ just the LoRA weights (smallest; load on top of the base model).
- Merged model โ a standalone model with the adapter baked in.
- GGUF โ for llama.cpp / Ollama, at your choice of quantization (Q4_K_M,
Q5_K_M, Q8_0, F16). Requires a local llama.cpp checkout pointed to by
FLAMEFORGE_LLAMACPP.
๐ Troubleshooting
Out of memory. FlameForge size-checks before loading, but if a run still OOMs,
switch LoRA โ QLoRA, lower the max sequence length, or pass --max-memory-gb with
a smaller cap. Your last checkpoint is always saved.
ImportError: bitsandbytes (CUDA QLoRA). pip install bitsandbytes.
bitsandbytes only supports Linux/Windows + NVIDIA; on Mac, FlameForge uses MLX.
HuggingFace 401 / gated model. Accept the license on the model's HF page,
then huggingface-cli login (or set HF_TOKEN). The auth screen walks you
through it.
Can't reach the Hub. Check your connection and status.huggingface.co. You can always train from a local model directory.
Full operational logs are written to flameforge.log (never your data or tokens).
๐ค Contributing
Contributions welcome! See CONTRIBUTING.md. The quality bar is
ruff check . && ruff format --check . && mypy src/ && pytest โ all green.
๐ License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flameforge-0.1.0.tar.gz.
File metadata
- Download URL: flameforge-0.1.0.tar.gz
- Upload date:
- Size: 85.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6df072791f6d5e7c6851fba4f26600db77c0fe7922bfc7d2bf7274df0c68b425
|
|
| MD5 |
6834a1a1bcd3e2e044b14f24290e906e
|
|
| BLAKE2b-256 |
0ceba3d4f16d4372607140bdf07088a1992d0adbde96f0165efd0f3bf84b7fa3
|
File details
Details for the file flameforge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flameforge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 95.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d7d0949ec026c000b38a75e89f457a554775d36d12b6f782f2db1c5f999ca4b
|
|
| MD5 |
7e463a61f1e2b156b86e13b15bfde9aa
|
|
| BLAKE2b-256 |
291c3bde5a530166614fcf7db4b22311b65c657b6443da8a2aeb0fb495ae4f01
|