Fine-tuning toolkit for the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using QLoRA on macOS systems with limited RAM (~8GB). Includes conversion to GGUF format for usage with Ollama and LM Studio.
Project description
Mimo LLM Project - Fine-tuning and GGUF Export
This project provides the necessary scripts and instructions to fine-tune the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model using QLoRA on a macOS system with limited RAM (~8GB), convert it to the GGUF format, and make it ready for use with Ollama and LM Studio.
Objective
- Base Model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5Bfrom Hugging Face. - Fine-tuning Method: QLoRA for efficient adaptation on low-resource hardware.
- Dataset:
yahma/alpaca-cleaned(public text dataset). - RAM Optimization: Tuned for ~8GB RAM.
- Output Format: GGUF (quantized 4-bit or 8-bit).
- Final Model Name: Mimo
- Attribution: "Créé par ABDESSEMED Mohamed Redha"
- Compatibility: Ollama and LM Studio.
Setup Instructions
-
Create a Python virtual environment (recommended):
python3 -m venv venv source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
Note: Ensure you have the correct PyTorch version installed for your macOS system (CPU or Metal/MPS for Apple Silicon). Refer to the official PyTorch website for installation instructions. Or ```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
For Apple Silicon (MPS) : ```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Fine-tuning (QLoRA)
The train_qlora.py script handles the fine-tuning process. It loads the base model, applies 4-bit quantization and QLoRA, and trains on the specified dataset.
- To start training:
python train_qlora.py - Output: The fine-tuned model adapters will be saved in the
outputs/mimo-qloradirectory. - RAM Optimization: The script is configured with
per_device_train_batch_size=1andgradient_accumulation_steps=8to manage memory usage.max_stepsis set to 100 for a quick example; adjust as needed for longer training.gradient_checkpointing=Trueis also enabled for further memory savings.
Conversion to GGUF
The export_to_gguf.py script first merges the QLoRA adapters into the base model and saves it in Hugging Face format. It then provides instructions on how to convert this merged model into the GGUF format using the llama.cpp conversion tools.
-
Run the export script:
python export_to_gguf.pyThis will save the merged Hugging Face model in
gguf_model/merged_hf_model/and print instructions for the GGUF conversion. -
Convert to GGUF using
llama.cpp: Follow these steps after runningexport_to_gguf.py:- Ensure you have
llama-cpp-pythoninstalled:pip install llama-cpp-python
- Navigate to your
llama.cppdirectory (you might need to clone it from GitHub if you don't have it). - Run the
convert.pyscript fromllama.cpp, pointing it to your saved Hugging Face model directory and specifying the desired quantization type.
Example command:
# Assuming you are in the llama.cpp directory and your merged model is at /Users/mohamed/Downloads/mac_ai_project/gguf_model/merged_hf_model # And you want to quantize to 4-bit (q4_0) python convert.py /Users/mohamed/Downloads/mac_ai_project/gguf_model/merged_hf_model --outfile /Users/mohamed/Downloads/mac_ai_project/gguf_model/Mimo.gguf --outtype q4_0
- You can choose different quantization types like
q8_0for 8-bit,f16for float16, etc.q4_0is a good balance for 4-bit.
- Ensure you have
- Output: The final GGUF model will be saved as
gguf_model/Mimo.gguf.
Usage
Ollama
-
Create a
Modelfile: Create a file namedModelfile(no extension) in the same directory as yourMimo.gguffile with the following content:FROM ./Mimo.gguf TEMPLATE """{{ .System }} {{- if .Prompt }} USER: {{ .Prompt }} ASSISTANT: {{ .Response }} {{- end }}""" PARAMETER stop "USER:" PARAMETER stop "ASSISTANT:" PARAMETER temperature 0.7 PARAMETER top_k 40 PARAMETER top_p 0.9 PARAMETER num_ctx 2048 PARAMETER repeat_penalty 1.1Adjust
num_ctxand other parameters as needed. -
Import into Ollama: Navigate to the directory containing
Mimo.ggufand yourModelfilein your terminal, then run:ollama create mimo -f ./Modelfile
You can then interact with the model using
ollama run mimo.
LM Studio
- Open LM Studio.
- Go to the "Local Server" tab or the "AI Models" tab.
- Click the folder icon to browse for models.
- Navigate to the
gguf_model/directory and selectMimo.gguf. - The model should load, and you can start chatting.
Attribution
This model, Mimo, was created by ABDESSEMED Mohamed Redha.
Modèle Mimo — Créé par ABDESSEMED Mohamed Redha
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file Mimo-1B-0.1.1.tar.gz.
File metadata
- Download URL: Mimo-1B-0.1.1.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaf37e39294682bc750d86b93a15f33d059554c83075f2059f63da6e41155644
|
|
| MD5 |
b87b2d68620ecb7130dbd5e540b0a02b
|
|
| BLAKE2b-256 |
a07d171d9cbfb2275295d2c3fa45d4a268ac1f9b9b9d58b156d0a77e9c393798
|
File details
Details for the file Mimo_1B-0.1.1-py3-none-any.whl.
File metadata
- Download URL: Mimo_1B-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09d49b152a44037b890c14f98b4fa82bc9509521803bfed4d6f0b260de57c7a8
|
|
| MD5 |
36b16ccef1e12ef7999bf24f0934397f
|
|
| BLAKE2b-256 |
5e5669835c8e45683e122a3c35f6cc9cddfe654050b1d5d1c8f506d2c3ee871e
|