SingLoRA - Pytorch
Project description
SingLoRA: A Minimal Implementation
This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper "SingLoRA: Low Rank Adaptation Using a Single Matrix" by Bensaïd et al.
Overview
SingLoRA is a parameter-efficient fine-tuning method that simplifies the LoRA architecture by using a single trainable matrix instead of two. This implementation demonstrates how to apply SingLoRA to transformer models using PyTorch and the Hugging Face Transformers library.
Features
- Simple, self-contained implementation in a single Python file
- Compatible with Hugging Face Transformers models
- Includes a working example with DistilBERT
- Demonstrates parameter reduction compared to full fine-tuning
Installation
pip install -r requirements.txt
Usage
Basic Example
Here's a simple example of how to apply SingLoRA to a transformer model:
from singlora import apply_singlora_to_model
from transformers import AutoModelForSequenceClassification
# Load your model
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
# Apply SingLoRA
apply_singlora_to_model(
model=model,
rank=8, # Low-rank dimension (r in the paper)
alpha=8.0, # Scaling factor
ramp_up_steps=1000, # Steps for ramp-up function u(t)
target_modules=["q_lin", "k_lin", "v_lin"] # Target attention layers
)
# Now only the SingLoRA parameters are trainable
optimizer = torch.optim.AdamW(
filter(lambda p: p.requires_grad, model.parameters()),
lr=1e-3
)
Configuration Parameters
rank: The dimension of the low-rank adaptation (r). Lower values mean fewer parameters.alpha: Scaling factor for the adaptation. Higher values allow larger updates.ramp_up_steps: Number of steps (T) for the ramp-up function u(t) = min(t/T, 1).target_modules: List of layer names to apply SingLoRA to. Common targets:["query", "key", "value"]for standard transformers["q_lin", "k_lin", "v_lin"]for DistilBERT["q_proj", "k_proj", "v_proj"]for LLaMA models
Parameter Efficiency
SingLoRA significantly reduces the number of trainable parameters compared to full fine-tuning:
# Example parameter counts
original_params = sum(p.numel() for p in original_model.parameters() if p.requires_grad)
singlora_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
reduction = 100 * (1 - singlora_params / original_params)
print(f"Parameter reduction: {reduction:.2f}%")
For a complete working example, see example.py in the repository.
LLaMA Example
Here's how to apply SingLoRA to LLaMA models:
from singlora import apply_singlora_to_model
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch
# Load LLaMA model and tokenizer
model_name = "meta-llama/Llama-2-7b-hf" # or your local path
model = LlamaForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16, # Use float16 for efficiency
device_map="auto" # Automatically handle model placement
)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
# Apply SingLoRA to attention layers
apply_singlora_to_model(
model=model,
rank=16, # Can use larger rank for bigger models
alpha=16.0, # Increased alpha for stronger adaptation
ramp_up_steps=2000, # More steps for larger datasets
target_modules=[ # LLaMA-specific attention layer names
"q_proj",
"k_proj",
"v_proj"
]
)
# Example training setup
optimizer = torch.optim.AdamW(
filter(lambda p: p.requires_grad, model.parameters()),
lr=1e-4 # Lower learning rate for LLaMA
)
# Example inference
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=100,
temperature=0.7,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Key differences for LLaMA models:
- Use
LlamaForCausalLMinstead of standard transformer models - Target the LLaMA-specific projection layers (
q_proj,k_proj,v_proj) - Consider using
float16for memory efficiency - Adjust hyperparameters (
rank,alpha, learning rate) for larger models - Use
device_map="auto"for automatic model sharding on multiple GPUs
Citation
If you use this implementation in your research, please cite the original paper:
@misc{bensaïd2025singloralowrankadaptation,
title={SingLoRA: Low Rank Adaptation Using a Single Matrix},
author={David Bensaïd and Noam Rotstein and Roy Velich and Daniel Bensaïd and Ron Kimmel},
year={2025},
eprint={2507.05566},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.05566},
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file singlora-0.0.1.tar.gz.
File metadata
- Download URL: singlora-0.0.1.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfa95e4a32b68f8e072db8075c49811e36a84898d9ef1c0cbcd710b481fc5057
|
|
| MD5 |
c335cc6ebbc5019d591a1eb0d7abf7ed
|
|
| BLAKE2b-256 |
6a0a147907f6f9e896455065e99b9975649b2cbce3dc847dddcba8a377ed6080
|
File details
Details for the file singlora-0.0.1-py3-none-any.whl.
File metadata
- Download URL: singlora-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
937b355f4e89ecab4ac9f1aaf120b0aa61290bf4b131078fd5a2ef91278b297e
|
|
| MD5 |
88ed21b076af0054429715b215e41195
|
|
| BLAKE2b-256 |
3712297007d7c4ba8f8c8101648fa6163bb2ef62fb7585883c4871040a21361e
|