Config-driven, YAML-first open-source LLM training platform
Project description
llm-forge
Build your own AI model. No ML expertise required.
llm-forge lets you fine-tune large language models by editing a single YAML file and dropping in your dataset. One config file controls everything: data cleaning, training, evaluation, and serving.
Clone and Run (5 Minutes)
No config editing needed. Copy-paste these commands to train your first model:
# 1. Clone and install
git clone https://github.com/Nagavenkatasai7/llm-forge.git
cd llm-forge
pip install -e ".[all]"
# 2. Login to HuggingFace (one-time, for downloading models)
huggingface-cli login
# 3. Train Llama-3.2-1B on Alpaca dataset (~30 min on GPU)
llm-forge train --config configs/demo_lora_llama.yaml --verbose
# 4. Chat with your fine-tuned model
llm-forge serve --config configs/demo_lora_llama.yaml \
--model-path outputs/demo-lora-llama3.2-1b/merged
Your fine-tuned model appears in outputs/. Chat UI opens at http://localhost:7860.
No GPU? Test on CPU in 5 minutes:
# Uses a tiny 135M model and 10 sample records (included in repo)
llm-forge train --config configs/quickstart_tiny.yaml --verbose
Prerequisites
Before installing, make sure you have:
- Python 3.10-3.12 (
python --version) - NVIDIA GPU with 8+ GB VRAM (
nvidia-smi) - or CPU for testing - Git (
git --version) - 20 GB free disk space
- HuggingFace account (free signup) - needed for gated models like Llama
No GPU? You can still test with mode: "qlora" on CPU (slow but works), or use a free GPU on Google Colab.
Installation
# From source (recommended)
git clone https://github.com/Nagavenkatasai7/llm-forge.git
cd llm-forge
pip install -e ".[all]"
# Or from PyPI
pip install llm-forge[all]
How It Works
You edit this: llm-forge handles the rest:
+-----------------+
| config.yaml | ------> Data Cleaning -> Training -> Evaluation -> Serving
+-----------------+
1. Generate a starter config:
llm-forge init --template lora
2. Edit config.yaml - set your model, data path, and hyperparameters. Every field has inline documentation.
3. Validate before training (catches errors early):
llm-forge validate config.yaml
4. Train:
llm-forge train --config config.yaml
llm-forge auto-detects your hardware and optimizes accordingly.
Example: Train on Your Own Data
Create a JSONL file with instruction-output pairs:
{"instruction": "Summarize this article", "input": "The economy grew...", "output": "Economic growth..."}
{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"}
Then create a config (config.yaml):
model:
name: "meta-llama/Llama-3.2-1B"
max_seq_length: 2048
data:
train_path: "./data/train.jsonl" # Your data file
format: "alpaca"
training:
mode: "lora" # Memory-efficient fine-tuning
output_dir: "./outputs/my-model"
num_epochs: 3
learning_rate: 2.0e-4
Run llm-forge train --config config.yaml and you're done.
Features
Getting Started
| Feature | Description |
|---|---|
| YAML-First Config | Single config file controls the entire pipeline |
| Hardware Auto-Detection | Works seamlessly from RTX 3060 to H100 clusters |
| Smart Validation | Catches config errors with actionable suggestions |
Training
| Feature | Description |
|---|---|
| LoRA / QLoRA | Memory-efficient fine-tuning (train 7B models on 24GB GPUs) |
| Full Fine-Tuning | Unrestricted parameter updates when you have the VRAM |
| Pre-Training | Train language models from scratch |
| DPO Alignment | Direct Preference Optimization for RLHF |
| NEFTune | Noise-based regularization for better generalization |
Data & Evaluation
| Feature | Description |
|---|---|
| Data Cleaning | Unicode fixing, deduplication, PII redaction, toxicity filtering |
| Evaluation | lm-evaluation-harness benchmarks with HTML reports |
| RAG Pipeline | Chunking, embeddings, hybrid retrieval, reranking |
Production
| Feature | Description |
|---|---|
| Distributed Training | FSDP, DeepSpeed ZeRO, Megatron-Core |
| Serving | Gradio UI, FastAPI REST API, vLLM high-throughput |
| Model Export | safetensors, GGUF, ONNX formats |
| HPC Support | SLURM scripts, Singularity containers |
Recommended Starting Models
| Model | Size | Best For | GPU Needed |
|---|---|---|---|
HuggingFaceTB/SmolLM2-135M |
135M | Quick testing, learning | Any (even CPU) |
meta-llama/Llama-3.2-1B |
1B | General fine-tuning (recommended) | 8+ GB VRAM |
Qwen/Qwen2.5-1.5B |
1.5B | Multilingual, code | 12+ GB VRAM |
meta-llama/Llama-3.2-3B |
3B | Higher quality results | 16+ GB VRAM |
microsoft/phi-3-mini-4k-instruct |
3.8B | Instruction tuning | 24+ GB VRAM |
Hardware Compatibility
| GPU | VRAM | Recommended Mode | Max Model Size |
|---|---|---|---|
| RTX 3090 | 24 GB | QLoRA (4-bit) | 7B |
| RTX 4090 | 24 GB | LoRA / QLoRA | 7B |
| A100 40GB | 40 GB | LoRA | 13B |
| A100 80GB | 80 GB | Full fine-tune | 7B |
| H100 80GB | 80 GB | Full fine-tune + FP8 | 13B |
| Multi-GPU | Varies | FSDP / DeepSpeed | 70B+ |
| CPU only | N/A | QLoRA (slow) | 1B |
CLI Reference
llm-forge init [--template lora|qlora|pretrain|rag|full] # Generate starter config
llm-forge validate config.yaml # Validate + hardware check
llm-forge train --config config.yaml [--verbose] [--dry-run] # Train model
llm-forge eval --config config.yaml # Run benchmarks
llm-forge serve --config config.yaml # Launch chat UI
llm-forge export --config config.yaml --format gguf # Export model
llm-forge clean --config config.yaml # Data cleaning only
llm-forge rag build --config config.yaml # Build RAG index
llm-forge rag query "question" --config config.yaml # Query RAG
llm-forge info # System + GPU info
llm-forge hardware # Hardware summary
Example Configs
Ready-to-use configs in configs/:
| Config | Use Case |
|---|---|
quickstart_tiny.yaml |
5-min test: SmolLM2-135M on sample data (works on CPU) |
demo_lora_llama.yaml |
Demo: Llama-3.2-1B on Alpaca (copy-paste ready) |
benchmark_smollm_135m.yaml |
Benchmark: SmolLM2-135M (<5 min on A100) |
benchmark_smollm_360m.yaml |
Benchmark: SmolLM2-360M (~10 min on A100) |
benchmark_tinyllama_1b.yaml |
Benchmark: TinyLlama-1.1B (~15 min on A100) |
benchmark_qlora_phi2.yaml |
Benchmark: Phi-2 QLoRA 4-bit (~20 min on A100) |
benchmark_llama_1b_full.yaml |
Benchmark: Llama-3.2-1B full LoRA (~30 min on A100) |
example_lora.yaml |
LoRA fine-tuning template |
example_qlora.yaml |
QLoRA for memory-constrained GPUs |
example_pretrain.yaml |
Pre-training from scratch |
example_rag.yaml |
RAG pipeline |
example_medical_domain.yaml |
Medical domain fine-tuning |
example_legal_domain.yaml |
Legal domain fine-tuning |
example_code_domain.yaml |
Code generation fine-tuning |
Troubleshooting
| Problem | Solution |
|---|---|
OutOfMemoryError |
Use mode: "qlora" in your config |
401 Unauthorized from HuggingFace |
Run huggingface-cli login |
flash_attention_2 not found |
OK - llm-forge auto-falls back to SDPA |
| Training interrupted | Rerun same command - auto-resumes from last checkpoint |
BF16 not supported |
llm-forge auto-falls back to FP16 on older GPUs |
| Model download fails | Rerun - downloads auto-resume |
Docker
# GPU training
docker build -t llm-forge:gpu -f docker/Dockerfile.gpu .
docker run --gpus all -v $(pwd)/outputs:/app/outputs \
llm-forge:gpu train --config configs/demo_lora_llama.yaml
# Docker Compose (training + serving + vector DB)
docker compose -f docker/docker-compose.yml up
HPC / SLURM
# Single GPU on Hopper cluster
sbatch scripts/slurm/train_demo.sbatch
# Multi-node distributed
sbatch scripts/slurm/train_multi_node.sbatch
See Hopper Cluster Guide for detailed setup.
Documentation
| Doc | For |
|---|---|
| Quickstart Guide | First-time setup walkthrough |
| Configuration Reference | All YAML fields explained |
| Data Preparation | Preparing your dataset |
| Training Guide | LoRA vs QLoRA vs full fine-tuning |
| Evaluation Guide | Benchmarking your model |
| Deployment Guide | Serving in production |
| Distributed Training | Multi-GPU scaling |
| API Reference | Python API docs |
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Citation
@software{llm_forge,
author = {Chennu, Naga Venkata Sai},
title = {llm-forge: Config-Driven LLM Training Platform},
year = {2026},
url = {https://github.com/Nagavenkatasai7/llm-forge}
}
License
Author
Naga Venkata Sai Chennu - George Mason University
- GitHub: @Nagavenkatasai7
- Email: nchennu@gmu.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_forge_new-0.2.1.tar.gz.
File metadata
- Download URL: llm_forge_new-0.2.1.tar.gz
- Upload date:
- Size: 581.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7baea97676cc9f568a9f1bd8887f69e2073051f1040e0c85b315f21d91759ba1
|
|
| MD5 |
5c04c2661e5a9edf84bf7a90f95736f1
|
|
| BLAKE2b-256 |
e9516a78269432f85a5881ec57848911278c89fcd1829ff9452441c93c7a896d
|
Provenance
The following attestation bundles were made for llm_forge_new-0.2.1.tar.gz:
Publisher:
release.yml on Nagavenkatasai7/llm-forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_forge_new-0.2.1.tar.gz -
Subject digest:
7baea97676cc9f568a9f1bd8887f69e2073051f1040e0c85b315f21d91759ba1 - Sigstore transparency entry: 1136318453
- Sigstore integration time:
-
Permalink:
Nagavenkatasai7/llm-forge@01dfbbbb787f7d70bd3f6922eff1e7e6ee8c3934 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Nagavenkatasai7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01dfbbbb787f7d70bd3f6922eff1e7e6ee8c3934 -
Trigger Event:
push
-
Statement type:
File details
Details for the file llm_forge_new-0.2.1-py3-none-any.whl.
File metadata
- Download URL: llm_forge_new-0.2.1-py3-none-any.whl
- Upload date:
- Size: 400.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
743152fc53f3810fc92437dea947f7e69e270448464a98f81058b6b47a7a6d40
|
|
| MD5 |
13a0cd346623455946873e2a573deabf
|
|
| BLAKE2b-256 |
4ed23d2bcc7628d5de20684a4205a94dc8bb1233fc7f06c52207aad178ef24cb
|
Provenance
The following attestation bundles were made for llm_forge_new-0.2.1-py3-none-any.whl:
Publisher:
release.yml on Nagavenkatasai7/llm-forge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_forge_new-0.2.1-py3-none-any.whl -
Subject digest:
743152fc53f3810fc92437dea947f7e69e270448464a98f81058b6b47a7a6d40 - Sigstore transparency entry: 1136318506
- Sigstore integration time:
-
Permalink:
Nagavenkatasai7/llm-forge@01dfbbbb787f7d70bd3f6922eff1e7e6ee8c3934 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Nagavenkatasai7
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01dfbbbb787f7d70bd3f6922eff1e7e6ee8c3934 -
Trigger Event:
push
-
Statement type: