Skip to main content

Package for fine-tuning, running and exporting Large Language Models with Unsloth.

Project description




Sinapsis Unsloth

Templates for optimized LLM fine-tuning and deployment.

🐍 Installation🚀 Features📚 Usage example📙 Documentation🔍 License

The sinapsis-unsloth module provides ready-to-use templates for continued pretraining, instruct fine-tuning, conversational fine-tuning, inference and model export to GGUF, merged, and quantized formats using Unsloth.

🐍 Installation

Install using your package manager of choice. We recommend uv for faster installations.

Standard Installation (Pre-built Wheels)

This method automatically installs optimized pre-built wheels for flash-attn, skipping the long compilation times.

Supported: Linux (x86_64), Python 3.10 - 3.12, CUDA 12.x

Using uv:

# Install Flash Attention (Example for Python 3.10 + CUDA 12.4)
uv pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.5.4/flash_attn-2.8.3+cu124torch2.9-cp310-cp310-linux_x86_64.whl

# Install Sinapsis Unsloth
uv pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech

Using raw pip:

# Install Flash Attention (Example for Python 3.10 + CUDA 12.4)
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.5.4/flash_attn-2.8.3+cu124torch2.9-cp310-cp310-linux_x86_64.whl

# Install Sinapsis Unsloth
pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech
Manual Build (From Source)

Use this if you are on an unsupported platform (e.g., Windows, non-standard CUDA versions) or need to compile flash-attn yourself.

Using uv:

export MAX_JOBS=4 # Adjust based on your RAM specs
uv pip install torch packaging ninja setuptools
uv pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech

Using raw pip:

export MAX_JOBS=4 # Adjust based on your RAM specs
pip install torch packaging ninja setuptools
pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech

🚀 Features

The templates support all capabilities from Unsloth for efficient LLM fine-tuning, inference, and model export, including:

  • Optimized Training: 4-bit (QLoRA), 8-bit, 16-bit, and full precision fine-tuning
  • Hardware Efficiency: Reduced GPU memory usage with Unsloth's optimized kernels
  • Flexible Export: GGUF quantization and Merged model export options for deployment
  • High-Performance Inference: Native 4-bit inference with dynamic chat templating and streaming

Templates Supported

Training

  • UnslothPretrainer: Designed for continued pretraining (domain adaptation) on raw text. Features efficient sequence packing and specific learning rate controls for embeddings.

  • UnslothInstructTrainer: Optimized for instruction fine-tuning. Processes standard instruction-input-response triplets with configurable preambles and dynamic formatting (handling optional inputs gracefully).

  • UnslothConversationTrainer: Specialized for conversational AI fine-tuning with chat datasets. Supports both ShareGPT and Alpaca formats (with auto-conversion), handles dynamic chat templating, and supports response-only loss masking.

Inference

  • UnslothInferenceCompletion: Raw text completion template for base models or custom formatting needs.

  • UnslothInferenceInstruct: Streamlined inference for instruction-tuned models using standard task preambles.

  • UnslothInferenceConversational: Manages multi-turn chat history, system prompts, and dynamic chat template application for conversational models.

  • UnslothInferenceReasoning: Extends conversational inference to support Chain-of-Thought (CoT) models (e.g., DeepSeek-R1), handling the extraction of internal reasoning traces.

Export

  • UnslothExportGGUF: Exports models to GGUF format for efficient CPU/Edge inference (e.g., Llama.cpp). Supports configurable quantization methods (q4_k_m, q8_0, etc.).

  • UnslothExportMerged: Merges LoRA adapters back into the base model (16-bit or 4-bit) for deployment on vLLM, or pushes directly to the Hugging Face Hub.

🌍 General Attributes

The model_args attribute controls how Unsloth loads and configures the model.

  • model_name (str, required): Model ID or local path.
  • cache_dir (str): Cache directory. Default: SINAPSIS_CACHE_DIR.
  • max_seq_length (int): Maximum sequence length. Default: 2048.
  • dtype ("auto" | "bfloat16" | "float16"): Weight precision. Default: "auto".
  • load_in_4bit (bool): Enable 4-bit quantization. Default: True.
  • load_in_8bit (bool): Enable 8-bit quantization. Default: False.
  • load_in_16bit (bool): Load weights in FP16. Default: False.
  • full_finetuning (bool): Enable full fine-tuning. Default: False.
  • device_map (str): Device placement strategy. Default: "sequential".
  • use_gradient_checkpointing (str): Checkpointing mode. Default: "unsloth".
  • fast_inference (bool): Enable optimized inference. Default: False.
  • gpu_memory_utilization (float): Max GPU memory fraction. Default: 0.5.
  • random_state (int): Random seed. Default: 3407.
  • max_lora_rank (int): Maximum LoRA rank. Default: 64.
⚙️ Fine-tuning Attributes

These attributes apply to all fine-tuning templates:

  • lora_args (UnslothLoraArgs)

    • LoRA configuration (rank, alpha, dropout, target modules, gradient checkpointing).
  • trainer_args (UnslothTrainerArgs)

    • Trainer options (text field, packing, sequence length, loss type).
  • training_args (UnslothTrainingArgs)

    • Hugging Face training parameters (batch size, learning rate, logging, saving).
  • train_dataset (DatasetConfig)

    • Dataset configuration, including:
      • loader_args (source and loading parameters)
      • map_args (preprocessing)
      • shuffle (shuffling behavior)
      • pre_tokenize (tokenization options)
  • resume_from_checkpoint (bool)

    • Resume training from the last checkpoint.
  • save_path (str)

    • Directory where fine-tuned adapters or weights will be saved.
🧠 Inference Attributes

These attributes configure Unsloth-based inference templates.

  • rag_context_key (str | None)

    • Metadata key used to retrieve optional RAG context.
  • generate_args (UnslothGenerateArgs)

    • Token generation settings (sampling, length, stopping, temperature, penalties).
  • stream (bool)

    • Enables token-by-token console streaming during generation.
📦 Export Attributes

These attributes configure Unsloth-based model export templates.

  • export_args (UnslothExportBaseArgs)

    • Export parameters such as save path, shard size, and memory limits.
  • push_to_hub (bool)

    • Enables pushing the exported model to the Hugging Face Hub.

[!TIP] Use CLI command sinapsis info --all-template-names to show a list with all the available Template names installed with Sinapsis Unsloth.

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for UnslothPretrainer use sinapsis info --example-template-config UnslothPretrainer to produce the following example config:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: UnslothPretrainer
  class_name: UnslothPretrainer
  template_input: InputTemplate
  attributes:
    model_args:
      model_name: '`replace_me:<class ''str''>`'
      cache_dir: /path/to/sinapsis/.cache
      max_seq_length: 2048
      dtype: auto
      load_in_4bit: true
      load_in_8bit: false
      load_in_16bit: false
      full_finetuning: false
      device_map: sequential
      use_gradient_checkpointing: unsloth
      fast_inference: false
      gpu_memory_utilization: 0.5
      random_state: 3407
      max_lora_rank: 64
    lora_args:
      r: 16
      target_modules:
      - q_proj
      - k_proj
      - v_proj
      - o_proj
      - gate_proj
      - up_proj
      - down_proj
      lora_alpha: 16
      lora_dropout: 0.0
      bias: none
      use_gradient_checkpointing: unsloth
      random_state: 3407
      use_rslora: false
      modules_to_save: null
      loftq_config: '`replace_me:<class ''dict''>`'
    trainer_args:
      dataset_text_field: text
      dataset_num_proc: null
      max_length: 1024
      packing: false
      packing_strategy: bfd
      eval_packing: false
      completion_only_loss: null
      assistant_only_loss: false
      loss_type: nll
      activation_offloading: false
    training_args:
      output_dir: trainer_output
      overwrite_output_dir: false
      eval_strategy: 'no'
      eval_steps: null
      per_device_train_batch_size: 8
      per_device_eval_batch_size: 8
      gradient_accumulation_steps: 1
      eval_accumulation_steps: null
      torch_empty_cache_steps: null
      learning_rate: 5.0e-05
      weight_decay: 0.0
      max_grad_norm: 1.0
      num_train_epochs: 3.0
      max_steps: null
      lr_scheduler_type: linear
      warmup_ratio: 0.0
      warmup_steps: null
      logging_strategy: steps
      logging_first_step: false
      logging_steps: 500
      save_strategy: steps
      save_steps: 500
      save_only_model: false
      use_cpu: false
      seed: 3407
      data_seed: null
      bf16: false
      fp16: false
      dataloader_drop_last: false
      dataloader_num_workers: 0
      remove_unused_columns: true
      load_best_model_at_end: false
      metric_for_best_model: loss
      optim: adamw_torch
      report_to: none
      push_to_hub: false
      hub_model_id: null
      embedding_learning_rate: 5.0e-05
    train_dataset:
      loader_args:
        path: '`replace_me:<class ''str''>`'
        name: null
        data_dir: null
        data_files: null
        split: null
        cache_dir: /path/to/sinapsis/.cache
        features: null
        num_proc: null
      map_args:
        desc: null
        batched: false
        batch_size: 1000
        num_proc: 0
        keep_in_memory: false
        load_from_cache_file: true
      shuffle:
        enabled: false
        args:
          seed: null
          keep_in_memory: false
          load_from_cache_file: true
      pre_tokenize:
        enabled: false
        args:
          add_special_tokens: true
          padding: do_not_pad
          truncation: do_not_truncate
          max_length: null
          stride: 0
          is_split_into_words: false
          padding_side: null
          verbose: true
        map_args:
          desc: null
          batched: false
          batch_size: 1000
          num_proc: 0
          keep_in_memory: false
          load_from_cache_file: true
    resume_from_checkpoint: false
    save_path: '`replace_me:<class ''str''>`'

📚 Usage example

The following agent exports the unsloth/DeepSeek-R1-Distill-Qwen-1.5B model in GGUF format with no quantization at the artifacts/DeepSeek-R1-Distill-Qwen-1.5B-gguf path.

Config
agent:
  name: model_export_agent
  description: Agent to handle model export and conversion workflows

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: UnslothExportGGUF
  class_name: UnslothExportGGUF
  template_input: InputTemplate
  attributes:
    model_args:
      model_name: unsloth/DeepSeek-R1-Distill-Qwen-1.5B
      dtype: "bfloat16"
      load_in_4bit: false
      gpu_memory_utilization: 1
    export_args:
      save_path : artifacts/DeepSeek-R1-Distill-Qwen-1.5B-gguf
      maximum_memory_usage: 1
      quantization_method: not_quantized
    push_to_hub: false

You can see additional fine-tuning agent configurations at the configs directory.

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_unsloth-0.1.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_unsloth-0.1.0-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_unsloth-0.1.0.tar.gz.

File metadata

  • Download URL: sinapsis_unsloth-0.1.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_unsloth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 043d9713bc374bd9cfb6b80e33ae9d8bb2099311271250318a4cc3376ce0e8ea
MD5 b864d30080f9a56318a1d0fc53e0bb76
BLAKE2b-256 c9f3dcd379d4313556269dbed122208edc0e15c814550735e5a0bb1032be6a9b

See more details on using hashes here.

File details

Details for the file sinapsis_unsloth-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_unsloth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a256762790f5a8b568bee4f63b140217269a9ee14ecc70d5147955b1c1674c8d
MD5 7f8cb143ff4db8e0f3bd0bfe8e53dcdd
BLAKE2b-256 2ed006f6a5a026c3b931be4a1a5a90ebbdb00a85c61f144d9243e0f46fe7b5d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page