Democratized Small Language Model Training - Train, fine-tune, distill, and deploy sub-500M parameter models on Colab T4 in 30-40 minutes
Project description
🚀 LMFast
Democratized Small Language Model Training - Train, fine-tune, distill, and deploy sub-500M parameter models on Colab T4 in 30-40 minutes with enterprise-grade features.
✨ Features
| Feature | Description |
|---|---|
| 🎯 T4 Optimized | Train on free Colab T4 (12GB) with QLoRA + gradient checkpointing |
| ⚡ Fast Training | Unsloth integration for 2-5x faster fine-tuning |
| 🧠 Distillation | Transfer knowledge from larger models to tiny ones |
| 🤖 Agents | Tool-using agents and orchestration framework |
| 📚 RAG | Lightweight document retrieval and indexing |
| 🌐 Browser | Deploy to browser via ONNX/WebLLM (no server costs) |
| 🛡️ Guardrails | PII detection, toxicity filtering, prompt injection protection |
| 📊 Observability | Langfuse integration, metrics, attention visualization |
| 🚀 Fast Inference | vLLM backend with OpenAI-compatible API |
| 📦 Easy Export | GGUF, INT4, AWQ, GPTQ quantization |
| 🧩 MCP | Native Model Context Protocol server support |
🚀 Quick Start
Installation
# Basic installation
pip install lmfast
# With all features
pip install lmfast[all]
# Specific extras
pip install lmfast[fast] # Unsloth for faster training
pip install lmfast[guardrails] # Safety features
pip install lmfast[observability] # Monitoring
pip install lmfast[inference] # vLLM serving
Train in 5 Lines
from lmfast import SLMTrainer, SLMConfig, TrainingConfig
from datasets import load_dataset
# Load data
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:1000]")
# Train
trainer = SLMTrainer(
SLMConfig(model_name="HuggingFaceTB/SmolLM-135M"),
TrainingConfig(max_steps=500)
)
trainer.train(dataset)
trainer.save("./my_slm")
CLI Usage
# Train a model
lmfast train --model HuggingFaceTB/SmolLM-135M --data yahma/alpaca-cleaned --output ./my_model
# Knowledge distillation
lmfast distill --teacher Qwen/Qwen2-1.5B --student HuggingFaceTB/SmolLM-135M --data my_data.json
# Start inference server
lmfast serve --model ./my_model --port 8000
# Export to GGUF
lmfast export --model ./my_model --output ./model.gguf --format gguf
# Interactive chat
lmfast generate --model ./my_model --interactive
📚 Documentation
Training
from lmfast import SLMTrainer, SLMConfig, TrainingConfig
# Configure for T4 GPU
model_config = SLMConfig(
model_name="HuggingFaceTB/SmolLM-135M",
max_seq_length=2048,
load_in_4bit=True, # QLoRA
)
training_config = TrainingConfig(
max_steps=500,
batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
lora_r=16,
lora_alpha=32,
)
trainer = SLMTrainer(model_config, training_config)
trainer.train(dataset)
Knowledge Distillation
from lmfast.distillation import DistillationTrainer
from lmfast.core.config import DistillationConfig
config = DistillationConfig(
teacher_model="Qwen/Qwen2-1.5B",
temperature=2.0,
alpha=0.5,
)
trainer = DistillationTrainer(
student_model="HuggingFaceTB/SmolLM-135M",
distillation_config=config,
)
trainer.distill(dataset)
Guardrails
from lmfast.guardrails import GuardrailsConfig, InputValidator, OutputFilter
config = GuardrailsConfig(
enable_pii_detection=True,
enable_toxicity_filter=True,
enable_prompt_injection=True,
)
validator = InputValidator(config)
result = validator.validate(user_input)
if result.is_valid:
# Process sanitized input
output = model.generate(result.sanitized_input)
Observability
from lmfast.observability import SLMTracer, MetricsCollector
# Tracing (Langfuse integration)
tracer = SLMTracer(project_name="my_project")
with tracer.trace("inference") as span:
span.set_attribute("model", "smollm-135m")
response = model.generate(prompt)
span.set_attribute("tokens", len(response))
# Metrics
collector = MetricsCollector()
collector.log("loss", 0.5, step=100)
collector.plot("loss")
Fast Inference
from lmfast.inference import SLMServer
# Create server
server = SLMServer("./my_model", use_vllm=True)
# Generate
response = server.generate("Hello, how are you?")
# Batch generation
responses = server.generate_batch(["Prompt 1", "Prompt 2"])
# Start OpenAI-compatible API
server.serve(port=8000)
🎯 Supported Models
| Model | Parameters | T4 Compatible | Notes |
|---|---|---|---|
| SmolLM-135M | 135M | ✅ | Fastest training |
| SmolLM-360M | 360M | ✅ | Good balance |
| TinyLlama-1.1B | 1.1B | ✅ (with QLoRA) | More capable |
| Qwen2-0.5B | 500M | ✅ | Multilingual |
| Phi-3-mini | 3.8B | ⚠️ (tight) | Most capable |
📦 Package Structure
lmfast/
├── core/ # Config and model loading
├── training/ # Training and data processing
├── distillation/ # Knowledge distillation
├── guardrails/ # Safety and filtering
├── observability/ # Tracing and metrics
├── inference/ # Serving and quantization
└── cli/ # Command-line interface
🧪 Development
# Clone
git clone https://github.com/lmfast/lmfast
cd lmfast
# Create environment
conda env create -f environment.yml
conda activate lmfast
# Install in dev mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Format code
black lmfast/ tests/
ruff check lmfast/ tests/
📄 License
Apache 2.0 - See LICENSE for details.
🙏 Acknowledgments
- Unsloth for fast training
- HuggingFace for transformers ecosystem
- vLLM for fast inference
- Langfuse for observability
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmfast-0.3.2.tar.gz.
File metadata
- Download URL: lmfast-0.3.2.tar.gz
- Upload date:
- Size: 123.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bebfa86b99e6c576c6903ab590042bf6983c8c6db47e2c5041c584522fb47aa
|
|
| MD5 |
35af2fa6f4b924c19b70cbe9e9bf7dba
|
|
| BLAKE2b-256 |
56a750d612569b22cdcd2d4566cad2ce4e48af6d25ac08e1fb2e734bf137c8c0
|
Provenance
The following attestation bundles were made for lmfast-0.3.2.tar.gz:
Publisher:
publish.yml on 2796gaurav/lmfast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmfast-0.3.2.tar.gz -
Subject digest:
0bebfa86b99e6c576c6903ab590042bf6983c8c6db47e2c5041c584522fb47aa - Sigstore transparency entry: 814840490
- Sigstore integration time:
-
Permalink:
2796gaurav/lmfast@f834e57b2fb8adef090f54d723b43337aff25875 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/2796gaurav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f834e57b2fb8adef090f54d723b43337aff25875 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lmfast-0.3.2-py3-none-any.whl.
File metadata
- Download URL: lmfast-0.3.2-py3-none-any.whl
- Upload date:
- Size: 89.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02a0e330f2e4796a7876f29a65fb895070f1ef698fa666e2778f1982e2844bdf
|
|
| MD5 |
8d85db86a5ae7999ed61967fc9c6d699
|
|
| BLAKE2b-256 |
f2742902c2d0ce31e678df5bd3064955eeabb50c3a2b21fdca9e8f529f5da788
|
Provenance
The following attestation bundles were made for lmfast-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on 2796gaurav/lmfast
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lmfast-0.3.2-py3-none-any.whl -
Subject digest:
02a0e330f2e4796a7876f29a65fb895070f1ef698fa666e2778f1982e2844bdf - Sigstore transparency entry: 814840493
- Sigstore integration time:
-
Permalink:
2796gaurav/lmfast@f834e57b2fb8adef090f54d723b43337aff25875 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/2796gaurav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f834e57b2fb8adef090f54d723b43337aff25875 -
Trigger Event:
push
-
Statement type: