Drop-in Unsloth alternative — zero errors, maximum speed LoRA/QLoRA fine-tuning

These details have not been verified by PyPI

Project links

Project description

FastLoRA ⚡🛡️🧠

The drop-in Unsloth alternative that never crashes.
Maximum speed LoRA/QLoRA fine-tuning — unlimited model size, automatic hardware detection, unstoppable training.

pip install fastlora
pip install "fastlora[full]"   # recommended

Why FastLoRA?

	Unsloth	FastLoRA
Installation errors	Frequent	None
Crashes during training	Common	Never
VRAM Guard (auto OOM prevention)	✗	✓
Auto hardware detection	✗	✓
Unlimited model size (1B → 1T+)	✗	✓
Unstoppable training	✗	✓
Every feature True/False toggle	Partial	✓
0.0–1.0 power control per feature	✗	✓
Compiled kernel cache	✗	✓
Adapter hot-swap (ms)	✗	✓
Multi-GPU: DDP / FSDP / DeepSpeed	Partial	✓

Installation

# Minimal
pip install fastlora

# Recommended (full features)
pip install "fastlora[full]"

# With Flash Attention 2 (requires CUDA + compilation)
pip install "fastlora[full,flash]"

# With DeepSpeed multi-GPU
pip install "fastlora[full,distributed]"

# Everything
pip install "fastlora[all]"

Quick Start

from fastlora import FastLoRA

fl = FastLoRA(
    "meta-llama/Llama-3.2-3B",
    lora=True,
    quantization="4bit",
    flash_attention=True,
    vram_guard=True,
)
model, tokenizer = fl.load()

Settings Panel

from fastlora import FastLoRA

fl = FastLoRA(
    "meta-llama/Llama-3.2-3B",

    # LORA              On/Off     Power (0.0–1.0)
    lora              = True,    # lora_power          = 1.0,
    lora_r            = 16,      # rank: 8 / 16 / 32 / 64
    lora_alpha        = 32,      # scaling (usually r×2)

    # QUANTIZATION      On/Off     Power
    quantization      = "4bit",  # "4bit" / "8bit" / "none"
    quantization_power= 1.0,     # <0.7 → falls back to 8bit

    # SPEED             On/Off     Power
    flash_attention   = True,    # attention_power     = 1.0,
    torch_compile     = True,    # compile_power       = 0.8,
    fused_ops         = True,
    batch_packing     = True,    # packing_power       = 1.0,
    cuda_optimizations= True,
    compile_cache     = True,
    pin_memory        = True,
    auto_batch_size   = True,

    # VRAM GUARD        Threshold (0.0–1.0)
    vram_guard        = True,    # vram_guard_power    = 0.85,

    # TRAINING
    precision         = "auto",
    gradient_checkpointing = True,
    learning_rate     = 2e-4,
)

model, tokenizer = fl.load()

Full Training Pipeline

from fastlora import FastLoRA, format_alpaca
from fastlora import CheckpointManager, LRFinder, EarlyStopping, ExperimentLogger
from datasets import load_dataset

fl = FastLoRA("meta-llama/Llama-3.2-3B", lora=True, quantization="4bit")
model, tokenizer = fl.load()

dataset = load_dataset("tatsu-lab/alpaca", split="train[:2000]")
trainer = fl.get_trainer(dataset, formatting_func=format_alpaca)

ExperimentLogger(fl, tensorboard=True, csv=True).patch_trainer(trainer)
EarlyStopping(patience=3).patch_trainer(trainer)
resume = CheckpointManager(fl, "./checkpoints").patch_trainer(trainer)

fl.train(trainer, resume_path=resume)
fl.save("./my_model")

v4.2 New Features

Auto Hardware Scanner

Scans GPU on startup, applies best settings automatically. Manual settings always take priority.

CPU          → compile=False, flash=False, batch=1
Low-end GPU  → 4bit, grad_ckpt, cpu_offload
Mid-range    → 4bit, flash attention, batch=2
High-end     → 4bit, flash att 2, batch=4, bf16
Datacenter   → no quant, fullgraph compile, batch=8
Flagship     → no quant, fullgraph compile, batch=16

Unlimited Parameter Support

Auto strategy for any model size:

0–3B    → normal mode
3–10B   → 4bit + gradient checkpointing
10–30B  → 4bit + CPU offload + layer offload
30–100B → aggressive offload + batch=1
100B+   → streaming mode (1 layer on GPU at a time)

Unstoppable Training

Only KeyboardInterrupt can stop training:

OOM          → clean memory, reduce batch, continue
CUDA error   → reset device, continue
NaN/Inf loss → skip step, reduce LR if persistent
Data error   → skip sample, continue
Unknown      → activate safe mode, continue

Feature Reference

Speed

Parameter	Default	Description
`torch_compile`	`True`	~2x faster after warmup
`compile_cache`	`True`	5s startup instead of 3min
`fused_ops`	`True`	Fused RMSNorm (Triton)
`cuda_optimizations`	`True`	TF32 + cuDNN benchmark
`batch_packing`	`True`	Zero padding, ~1.4x throughput
`pin_memory`	`True`	Async CPU→GPU
`auto_batch_size`	`True`	Max batch for available VRAM

Safety

Parameter	Default	Description
`vram_guard`	`True`	Auto OOM prevention
`vram_guard_power`	`0.85`	Intervenes at 85% VRAM
`unstoppable`	`True`	Nothing stops training
`allow_remote_code`	`False`	Remote model code (keep False)

v4.2 Systems

Parameter	Default	Description
`auto_hardware_scan`	`True`	Auto GPU profile
`unlimited_params`	`True`	Auto strategy for any model size
`loss_spike_detection`	`False`	Detect loss spikes
`dynamic_batch_scaling`	`False`	Real-time batch adjustment
`gradient_noise_monitor`	`False`	Gradient health monitoring
`smart_checkpoint`	`False`	Save only on improvement

Benchmark Results

Tested on NVIDIA Tesla T4 (Google Colab):

Version	Model	Steps	Time	Throughput
FastLoRA v3	TinyLlama-1.1B	50	192s	2.07 samples/s
FastLoRA v4 Beta	Qwen2.5-1.5B	50	21s	3.40 samples/s
FastLoRA v4.1	TinyLlama-1.1B	50	28.45s	3.516 samples/s
FastLoRA v4.2	Qwen2.5-1.5B	200	470s	3.404 samples/s

Unsloth was also benchmarked. Unsloth didn't run.

Requirements

Required: torch ≥ 2.1.0, transformers ≥ 4.40.0, accelerate ≥ 0.27.0

Optional ([full]): peft, bitsandbytes, trl, datasets

Optional extras: flash-attn, triton, deepspeed, optuna, wandb, tensorboard

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

4.2.1

Mar 20, 2026

1.0.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlora-4.2.1.tar.gz (38.8 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastlora-4.2.1-py3-none-any.whl (38.0 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file fastlora-4.2.1.tar.gz.

File metadata

Download URL: fastlora-4.2.1.tar.gz
Upload date: Mar 20, 2026
Size: 38.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for fastlora-4.2.1.tar.gz
Algorithm	Hash digest
SHA256	`3077ac7b6848f00d49b94e692aeab02a989b86bf08b1cb49034f9c9d58463bd8`
MD5	`4df5f0b59f12ff92ff4c2af0ebbead2e`
BLAKE2b-256	`3131b255e585ee3862f7e1c3f4e7e03b2ad6524ba52a8f1c7054fbc409351b65`

See more details on using hashes here.

File details

Details for the file fastlora-4.2.1-py3-none-any.whl.

File metadata

Download URL: fastlora-4.2.1-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 38.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for fastlora-4.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`697281739e8977fa1c0aca010a7e4604a7153b0b3ddf0350653736993b2ea636`
MD5	`b27eb1f009e836db312c010fc171d962`
BLAKE2b-256	`4657de60b010737844862b2cd36d81e3fed12ea689967d0a3c3705f62859f9b2`

See more details on using hashes here.

fastlora 4.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FastLoRA ⚡🛡️🧠

Why FastLoRA?

Installation

Quick Start

Settings Panel

Full Training Pipeline

v4.2 New Features

Auto Hardware Scanner

Unlimited Parameter Support

Unstoppable Training

Feature Reference

Speed

Safety

v4.2 Systems

Benchmark Results

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes