Skip to main content

SingLoRA - Pytorch

Project description

SingLoRA: A Minimal Implementation

This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper "SingLoRA: Low Rank Adaptation Using a Single Matrix" by Bensaïd et al.

Overview

SingLoRA is a parameter-efficient fine-tuning method that simplifies the LoRA architecture by using a single trainable matrix instead of two. This implementation demonstrates how to apply SingLoRA to transformer models using PyTorch and the Hugging Face Transformers library.

Features

  • Simple, self-contained implementation in a single Python file
  • Compatible with Hugging Face Transformers models
  • Includes a working example with DistilBERT
  • Demonstrates parameter reduction compared to full fine-tuning

Installation

pip install -r requirements.txt

Usage

Basic Example

Here's a simple example of how to apply SingLoRA to a transformer model:

from singlora import apply_singlora_to_model
from transformers import AutoModelForSequenceClassification

# Load your model
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

# Apply SingLoRA
apply_singlora_to_model(
    model=model,
    rank=8,              # Low-rank dimension (r in the paper)
    alpha=8.0,           # Scaling factor
    ramp_up_steps=1000,  # Steps for ramp-up function u(t)
    target_modules=["q_lin", "k_lin", "v_lin"]  # Target attention layers
)

# Now only the SingLoRA parameters are trainable
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=1e-3
)

Configuration Parameters

  • rank: The dimension of the low-rank adaptation (r). Lower values mean fewer parameters.
  • alpha: Scaling factor for the adaptation. Higher values allow larger updates.
  • ramp_up_steps: Number of steps (T) for the ramp-up function u(t) = min(t/T, 1).
  • target_modules: List of layer names to apply SingLoRA to. Common targets:
    • ["query", "key", "value"] for standard transformers
    • ["q_lin", "k_lin", "v_lin"] for DistilBERT
    • ["q_proj", "k_proj", "v_proj"] for LLaMA models

Parameter Efficiency

SingLoRA significantly reduces the number of trainable parameters compared to full fine-tuning:

# Example parameter counts
original_params = sum(p.numel() for p in original_model.parameters() if p.requires_grad)
singlora_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

reduction = 100 * (1 - singlora_params / original_params)
print(f"Parameter reduction: {reduction:.2f}%")

For a complete working example, see example.py in the repository.

LLaMA Example

Here's how to apply SingLoRA to LLaMA models:

from singlora import apply_singlora_to_model
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch

# Load LLaMA model and tokenizer
model_name = "meta-llama/Llama-2-7b-hf"  # or your local path
model = LlamaForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use float16 for efficiency
    device_map="auto"           # Automatically handle model placement
)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Apply SingLoRA to attention layers
apply_singlora_to_model(
    model=model,
    rank=16,              # Can use larger rank for bigger models
    alpha=16.0,           # Increased alpha for stronger adaptation
    ramp_up_steps=2000,   # More steps for larger datasets
    target_modules=[      # LLaMA-specific attention layer names
        "q_proj",
        "k_proj",
        "v_proj"
    ]
)

# Example training setup
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=1e-4  # Lower learning rate for LLaMA
)

# Example inference
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,
        temperature=0.7,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Key differences for LLaMA models:

  • Use LlamaForCausalLM instead of standard transformer models
  • Target the LLaMA-specific projection layers (q_proj, k_proj, v_proj)
  • Consider using float16 for memory efficiency
  • Adjust hyperparameters (rank, alpha, learning rate) for larger models
  • Use device_map="auto" for automatic model sharding on multiple GPUs

Citation

If you use this implementation in your research, please cite the original paper:

@misc{bensaïd2025singloralowrankadaptation,
      title={SingLoRA: Low Rank Adaptation Using a Single Matrix}, 
      author={David Bensaïd and Noam Rotstein and Roy Velich and Daniel Bensaïd and Ron Kimmel},
      year={2025},
      eprint={2507.05566},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.05566}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singlora-0.0.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

singlora-0.0.1-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file singlora-0.0.1.tar.gz.

File metadata

  • Download URL: singlora-0.0.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for singlora-0.0.1.tar.gz
Algorithm Hash digest
SHA256 bfa95e4a32b68f8e072db8075c49811e36a84898d9ef1c0cbcd710b481fc5057
MD5 c335cc6ebbc5019d591a1eb0d7abf7ed
BLAKE2b-256 6a0a147907f6f9e896455065e99b9975649b2cbce3dc847dddcba8a377ed6080

See more details on using hashes here.

File details

Details for the file singlora-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: singlora-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Darwin/24.5.0

File hashes

Hashes for singlora-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 937b355f4e89ecab4ac9f1aaf120b0aa61290bf4b131078fd5a2ef91278b297e
MD5 88ed21b076af0054429715b215e41195
BLAKE2b-256 3712297007d7c4ba8f8c8101648fa6163bb2ef62fb7585883c4871040a21361e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page