Skip to main content

High-performance LLM fine-tuning library built with Candle and Rust.

Project description

🚀 Unsloth-Candle

High-performance LLM fine-tuning library built with Rust 🦀 and Candle.

PyPI version License Unsloth


Unsloth-Candle brings the blazing fast performance of Unsloth to the Candle ecosystem. By leveraging optimized Rust kernels and efficient memory management, it enables 2x faster training and 70% less memory usage compared to standard implementations.

✨ Core Advantages

  • Zero Learning Curve: 1:1 API compatibility with Unsloth's Python interface.
  • Hardware Optimized: Native support for CUDA, Metal (Apple Silicon), and AVX/Neon (CPU).
  • Memory Efficient: Native 4-bit NF4 quantization and gradient checkpointing.
  • Unified Support: One engine for Llama 3.2, Mistral, Qwen 2.5, DeepSeek-V3, and more.

📦 Installation

Via Pip (Recommended)

pip install unsloth-candle

Build from Source

git clone https://github.com/unslothai/unsloth-candle.git
cd unsloth-candle
pip install -e .

To enable GPU acceleration:

  • CUDA: pip install -e . --features cuda
  • Metal: pip install -e . --features metal

🛠 Usage

1. Load Model & Tokenizer

from unsloth_candle import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = 2048,
    load_in_4bit = True,
)

2. Apply LoRA/DoRA

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    use_gradient_checkpointing = True,
    use_dora = False, # Set to True for DoRA
)

3. Fine-tuning with SFTTrainer

from unsloth_candle import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    args = SFTConfig(
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
    ),
)
trainer.train()

4. Save & Export

# Save as merged HF weights
model.save_pretrained_merged("output_hf", tokenizer)

# Save as GGUF (for Ollama/llama.cpp)
model.save_pretrained_gguf("output_gguf", tokenizer, quantization_type="q4_k_m")

🗺️ Model Catalog

Model Architecture 4-bit LoRA DoRA
Llama 3.2 LlamaForCausalLM
Mistral Nemo MistralForCausalLM
Qwen 2.5 Qwen2ForCausalLM
DeepSeek V3 DeepSeekV3 (MLA)
Gemma 3 Gemma3 (GeGLU)
Phi 4 Phi4

📜 License

Licensed under the Apache License, Version 2.0.


Built with 💖 by the Unsloth Community and Antigravity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unsloth_candle_cuda-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

File details

Details for the file unsloth_candle_cuda-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for unsloth_candle_cuda-2026.4.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 bd2bfff744970822df488f65d899615462984e9ac22befd170ef4c81468e80e5
MD5 310bc49c21cb30d24b359bebb15202d5
BLAKE2b-256 1ce9b54851bd1345d45c2aab43e04b628598aace699251999302a9825391d2b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page