Skip to main content

High-performance LLM fine-tuning library built with Candle and Rust.

Project description

🚀 Unsloth-Candle

High-performance LLM fine-tuning library built with Rust 🦀 and Candle.

PyPI version License Unsloth


Unsloth-Candle brings the blazing fast performance of Unsloth to the Candle ecosystem. By leveraging optimized Rust kernels and efficient memory management, it enables 2x faster training and 70% less memory usage compared to standard implementations.

✨ Core Advantages

  • Zero Learning Curve: 1:1 API compatibility with Unsloth's Python interface.
  • Hardware Optimized: Native support for CUDA, Metal (Apple Silicon), and AVX/Neon (CPU).
  • Memory Efficient: Native 4-bit NF4 quantization and gradient checkpointing.
  • Unified Support: One engine for Llama 3.2, Mistral, Qwen 2.5, DeepSeek-V3, and more.

📦 Installation

Via Pip (Recommended)

pip install unsloth-candle

Build from Source

git clone https://github.com/unslothai/unsloth-candle.git
cd unsloth-candle
pip install -e .

To enable GPU acceleration:

  • CUDA: pip install -e . --features cuda
  • Metal: pip install -e . --features metal

🛠 Usage

1. Load Model & Tokenizer

from unsloth_candle import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = 2048,
    load_in_4bit = True,
)

2. Apply LoRA/DoRA

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    use_gradient_checkpointing = True,
    use_dora = False, # Set to True for DoRA
)

3. Fine-tuning with SFTTrainer

from unsloth_candle import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    args = SFTConfig(
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
    ),
)
trainer.train()

4. Save & Export

# Save as merged HF weights
model.save_pretrained_merged("output_hf", tokenizer)

# Save as GGUF (for Ollama/llama.cpp)
model.save_pretrained_gguf("output_gguf", tokenizer, quantization_type="q4_k_m")

🗺️ Model Catalog

Model Architecture 4-bit LoRA DoRA
Llama 3.2 LlamaForCausalLM
Mistral Nemo MistralForCausalLM
Qwen 2.5 Qwen2ForCausalLM
DeepSeek V3 DeepSeekV3 (MLA)
Gemma 3 Gemma3 (GeGLU)
Phi 4 Phi4

📜 License

Licensed under the Apache License, Version 2.0.


Built with 💖 by the Unsloth Community and Antigravity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unsloth_candle-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

File details

Details for the file unsloth_candle-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for unsloth_candle-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 4f72a6cfaea3485e928e2795c033fbe258d2f4dd2f0eda6ab378f32a9cdfe2f5
MD5 482a64cd2654856318f8fc6be8afa68d
BLAKE2b-256 c3121b6f092c47df26f9b6c7b61a4f4ab97343c25de2d3f4553c2798dc17c6fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page