Skip to main content

Fine-tune any LLM with a beautiful TUI. Zero config, maximum power.

Project description

๐Ÿ”ฅ FlameForge

Fine-tune any LLM with a beautiful TUI. Zero config, maximum power.

Supports NVIDIA CUDA (PyTorch/PEFT) and Apple Silicon (MLX).

CI License: MIT Python 3.10+


FlameForge is a terminal user interface that takes you from raw data to a working fine-tuned model in under ten minutes of active effort โ€” no YAML wrangling, no CUDA incantations, no guessing whether a model will fit in memory. Pick a model, pick a method, load your data, and watch the loss curve fall in real time.

  โ”Œโ”€ Loss โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ 2.80 โ”‚โ—                                              โ”‚
  โ”‚      โ”‚ โ—โ—                                            โ”‚
  โ”‚      โ”‚   โ—โ—โ—                                         โ”‚
  โ”‚      โ”‚      โ—โ—โ—โ—โ—โ—                                   โ”‚
  โ”‚      โ”‚            โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—                      โ”‚
  โ”‚ 0.74 โ”‚                         โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ—โ— โ”‚
  โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Features

  • Zero configuration. Device, data format, chat template, and batch size are all auto-detected. You only answer questions the tool genuinely can't.
  • Runs everywhere. NVIDIA GPUs via PyTorch + PEFT + TRL, Apple Silicon via MLX. The right backend is chosen for you automatically.
  • Memory-safe by design. Every model is size-checked against a conservative budget before loading, so your Mac never freezes mid-train.
  • LoRA, QLoRA, DoRA, and full fine-tuning with sensible, auto-tuned defaults.
  • A live training dashboard with a real-time loss chart, throughput, memory usage, ETA, and pause / checkpoint / stop controls.
  • Frictionless auth. Gated models walk you through getting a token instead of throwing a cryptic 401.
  • Helpful errors, never tracebacks. Every failure explains what happened and exactly what to do next.
  • Flexible export. Save the adapter, a merged standalone model, or a GGUF for llama.cpp / Ollama.

๐Ÿš€ Quick start

# Apple Silicon (M1/M2/M3/M4)
pip install "flameforge[mlx]"

# NVIDIA CUDA
pip install "flameforge[cuda]"

# Then just run:
flameforge

That's it. The welcome screen confirms your device and memory budget, and from there it's six guided steps: model โ†’ method โ†’ data โ†’ config โ†’ train โ†’ export.

No GPU handy? FlameForge still launches and runs the whole flow in a simulation mode with a synthetic loss curve, so you can explore the UI before installing a backend.

๐Ÿ“‹ Requirements

  • Python 3.10+
  • An NVIDIA GPU with CUDA, or an Apple Silicon Mac
  • CPU-only works for the smallest models, but is very slow

๐Ÿค– Supported models

FlameForge ships a curated registry (browse it in the TUI), but you can fine-tune any HuggingFace causal-LM by typing its id or pointing at a local path. A few highlights:

Model Size License Auth Cheapest method
Llama 3.2 1B Instruct 1.0B Llama 3.2 ๐Ÿ”’ QLoRA ~2 GB
Qwen 2.5 0.5B / 1.5B / 3B / 7B 0.5โ€“7B Apache 2.0 โ€” QLoRA
Mistral 7B Instruct v0.3 7.0B Apache 2.0 โ€” QLoRA ~5 GB
Llama 3.1 8B Instruct 8.0B Llama 3.1 ๐Ÿ”’ QLoRA ~6 GB
Gemma 2 2B / 9B Instruct 2โ€“9B Gemma ๐Ÿ”’ QLoRA
Phi-3 / Phi-3.5 Mini 3.8B MIT โ€” QLoRA ~3 GB
Llama 3.1 70B Instruct 70B Llama 3.1 ๐Ÿ”’ QLoRA ~40 GB

๐Ÿ”’ = requires a free HuggingFace account + license acceptance (FlameForge guides you through it).

๐Ÿ“‚ Data format guide

FlameForge auto-detects your format from the file. All of these work out of the box:

Alpaca / instruction (.jsonl or .json)

{"instruction": "Summarize this.", "input": "Long textโ€ฆ", "output": "Short summary."}

Alternate keys are auto-mapped: prompt/completion, question/answer.

Conversational / ShareGPT / OpenAI (.jsonl or .json)

{"messages": [{"role": "system", "content": "โ€ฆ"}, {"role": "user", "content": "Hi"}, {"role": "assistant", "content": "Hello!"}]}

conversations with from/value turns is also recognised. Multi-turn supported.

CSV / TSV โ€” columns are auto-mapped (e.g. question โ†’ instruction, answer โ†’ output); ambiguous files get a column picker.

Raw text (.txt) โ€” documents separated by blank lines, for continued pre-training.

The correct chat template for your model family (Llama 3, Mistral, ChatML, Gemma, Phi-3, Qwen) is applied automatically โ€” preferring the tokenizer's own template when available. A preview shows you exactly what the model will see before you commit.

There are ready-to-run samples in examples/.

โš™๏ธ Configuration

Sensible defaults are auto-tuned from your model, hardware, and dataset size (fewer epochs for tiny datasets, bf16โ†’fp16 on older GPUs, a memory-aware batch size, and so on). You can accept them or adjust anything on the config screen.

To start from a custom defaults file:

flameforge --config my_config.yaml      # see configs/default.yaml for the schema

Other flags:

flameforge --model meta-llama/Llama-3.2-3B-Instruct   # pre-select a model
flameforge --max-memory-gb 14                          # set the memory budget to use
flameforge --version

Memory budget. On the welcome screen you can set the memory budget FlameForge is allowed to use โ€” lower it to be safe, or raise it above the conservative default if you know what you're doing. This matters most on Apple Silicon, where GPU memory is shared with macOS: the default leaves headroom so your machine stays responsive, and raising it past that default is flagged as risky. The same value can be passed up front with --max-memory-gb.

Authentication is read from HF_TOKEN / HUGGING_FACE_HUB_TOKEN or ~/.cache/huggingface/token (e.g. after huggingface-cli login).

๐Ÿ“ฆ Export

When training finishes you can export:

  • Adapter โ€” just the LoRA weights (smallest; load on top of the base model).
  • Merged model โ€” a standalone model with the adapter baked in.
  • GGUF โ€” for llama.cpp / Ollama, at your choice of quantization (Q4_K_M, Q5_K_M, Q8_0, F16). Requires a local llama.cpp checkout pointed to by FLAMEFORGE_LLAMACPP.

๐Ÿ›Ÿ Troubleshooting

Out of memory. FlameForge size-checks before loading, but if a run still OOMs, switch LoRA โ†’ QLoRA, lower the max sequence length, or pass --max-memory-gb with a smaller cap. Your last checkpoint is always saved.

ImportError: bitsandbytes (CUDA QLoRA). pip install bitsandbytes. bitsandbytes only supports Linux/Windows + NVIDIA; on Mac, FlameForge uses MLX.

HuggingFace 401 / gated model. Accept the license on the model's HF page, then huggingface-cli login (or set HF_TOKEN). The auth screen walks you through it.

Can't reach the Hub. Check your connection and status.huggingface.co. You can always train from a local model directory.

Full operational logs are written to flameforge.log (never your data or tokens).

๐Ÿค Contributing

Contributions welcome! See CONTRIBUTING.md. The quality bar is ruff check . && ruff format --check . && mypy src/ && pytest โ€” all green.

๐Ÿ“„ License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flameforge-0.1.0.tar.gz (85.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flameforge-0.1.0-py3-none-any.whl (95.9 kB view details)

Uploaded Python 3

File details

Details for the file flameforge-0.1.0.tar.gz.

File metadata

  • Download URL: flameforge-0.1.0.tar.gz
  • Upload date:
  • Size: 85.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for flameforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6df072791f6d5e7c6851fba4f26600db77c0fe7922bfc7d2bf7274df0c68b425
MD5 6834a1a1bcd3e2e044b14f24290e906e
BLAKE2b-256 0ceba3d4f16d4372607140bdf07088a1992d0adbde96f0165efd0f3bf84b7fa3

See more details on using hashes here.

File details

Details for the file flameforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flameforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 95.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for flameforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d7d0949ec026c000b38a75e89f457a554775d36d12b6f782f2db1c5f999ca4b
MD5 7e463a61f1e2b156b86e13b15bfde9aa
BLAKE2b-256 291c3bde5a530166614fcf7db4b22311b65c657b6443da8a2aeb0fb495ae4f01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page