Skip to main content

SpaRTA adaptation wrapper. Invocation code to load and run SpaRTA adapters for inference

Project description

PEFT-SpaRTA

SpaRTA (Sparse Random parameTer Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) alternative to traditional LoRA that reduces the number of trainable parameters by randomly selecting a very small proportion of the model parameters to train on.

This Python package provides the invocation code necessary to load and run SpaRTA-adapted models for inference. In particular, it includes the classes

  • SpaRTAforSequenceClassification
  • SpaRTAforCausalLM

to load a SpaRTA adapter along its pre-trained base (transformer) model architectured, respectively, for sequence classification tasks and autoregressive text generation tasks.

We also include the class

  • SpaRTA

to facilitate sparse random parameter adaptation of a model and train your own SpaRTA adapters. This implementation is compatible with some of the most popular trainers, as shown in here.

For more details on how SpaRTA works see our paper. The original implementation of SpaRTA can be found in https://github.com/IBM/sparta.

Installation

pip install peft-sparta

How to use it for inference

Download a SpaRTA adapter from a Hugging Face repository

Let's download a SpaRTA adapter that spacializes the google/gemma-2b model to do sentiment classification of English sentences.

ADAPTER_DIR='/my_sparta_adapters/sparta-gemma_2b/'

mkdir -p $ADAPTER_DIR

hf download jesusriosal/sparta-gemma_2b-sst2 --local-dir $ADAPTER_DIR

Load the SpaRTA adapter and create the adapted model

from peft_sparta import SpaRTAforSequenceClassification

adapter_dir = '/my_sparta_adapters/sparta-gemma_2b/'

model = SpaRTAforSequenceClassification(
   adapter = adapter_dir,
   device = 'cuda')

print(model)
(SpaRTA)ModelForSeqClassification(
	adapter = '/my_sparta_adapters/sparta-gemma_2b/'
	model = 'google/gemma-2b'
	id2label = {0: 'negative', 1: 'positive'}
)

Inputs

Let's use our adapted model to classify a few sentences. For this adapter, the model consumes the sentences directly. No formating is needed

sentences = ["I enjoyed very much the movie.", 
             "It was painful to watch.", 
             "I couldn't enjoy more the movie.",
             "It was a bad movie."]

Inference

Probabilistic classification

The adapted model can give us its estimated probabilities that each sentence (row) has negative (first column) or positive (second column) sentiment.

class_probs = model.classify(sentences) 

print(class_probs)
tensor([[0.1152, 0.8848],
        [0.9497, 0.0503],
        [0.1689, 0.8311],
        [0.9720, 0.0280]], device='cuda:0')

To identify which column correspond to each class, use:

print(model.id2label)
{'0': 'negative', '1': 'positive'}

Here are the model's estimated probabilities of positive sentiment for each sentence

for sentence, pos_prob in zip(sentences, class_probs[:,1]):
    print(f"{pos_prob.item()*100:>4.0f}%\t{sentence}")
 Prob   Sentence
 ----   -----------------------------
  88%	I enjoyed very much the movie.
   5%	It was painful to watch.
  83%	I couldn't enjoy more the movie.
   3%	It was a bad movie.

Deciding the sentiment class of each sentence (deterministic classification)

We have seen how the model makes probabilistic assessments of the sentiment of each sentence. If we want the model to make a definitive decison on whether the sentence has positive or negative sentiment, we can use:

classes = model.decide_class(sentences) 

to obtain the model's predicted class of each sentence. Basically, the model takes the most likely class as its sentiment prediction of a sentence

for sentence, sentence_class in zip(sentences, classes):
    print(f"'{sentence_class}':  {sentence}")
 Sentiment   Sentence
-----------  -------------------------------
'positive':  I enjoyed very much the movie.
'negative':  It was painful to watch.
'positive':  I couldn't enjoy more the movie.
'negative':  It was a bad movie.

Input templates

Sometimes the input to the model may need to be formatted before our adapted model can processs it. This is typicaly the case when using instruction-following models, for which wrapping the input within an instruction, formatted with the model's chat template, can be advantageous. In these cases, we can use the following input_template argument to specify the formatting over raw inputs used during training, and needed during inference.

To see this, let's use another SpaRTA adapter for sentiment classification based on the google/gemma-2b-it model.

hf download jesusriosal/sparta-gemma_2b-sst2 --local-dir '/my_sparta_adapters/sparta-gemma_2b_it/'
from peft_sparta import SpaRTAforSequenceClassification

adapter_dir = '/my_sparta_adapters/sparta-gemma_2b_it/'

model = SpaRTAforSequenceClassification(
    adapter=adapter_dir,
    device='cuda',
    input_template = ("<start_of_turn>user\n"
                      "Determine the sentiment of the following sentence about a movie. "
                      "The sentiment can only be classified as positive or negative.\n"
                      "Sentence: {sentence}" 
                      "<end_of_turn>\n<start_of_turn>model\n"
                      "The sentiment of the sentence is")
    )

print(model)
(SpaRTA)ModelForSeqClassification(
	adapter = '/my_sparta_adapters/sparta-gemma_2b_it/'
	model = 'google/gemma-2b-it'
	id2label = {0: 'negative', 1: 'positive'}
)

This SpaRTA adapter was trained formating the input sentences to be classified with the input_template (see model.template printout below), which included a task instruction. This ensures that during inference the same formatting is used on the inputs to be classified.

print(model.template)
<start_of_turn>user
Determine the sentiment of the following sentence about a movie. The sentiment can only be classified as positive or negative.
Sentence: {sentence}<end_of_turn>
<start_of_turn>model
The sentiment of the sentence is

For example, the sentence

I enjoyed very much the movie.

is converted to

<start_of_turn>user
Determine the sentiment of the following sentence about a movie. The sentiment can only be classified as positive or negative.
Sentence: I enjoyed very much the movie.<end_of_turn>
<start_of_turn>model
The sentiment of the sentence is

before passing it to the model for classification

Thus, to classify the (raw, non-formatted) sentences above we proceed as follows

sentences = [{'sentence': sent} for sent in sentences]

class_probs = model.classify(sentences)

# prob of positive sentiment for each sentence 
for sentence, pos_prob in zip(sentences, class_probs[:,1]):
    print(f"{pos_prob.item()*100:>4.0f}%\t{sentence['sentence']}")
 100%	I enjoyed very much the movie.
   0%	It was painful to watch.
 100%	I couldn't enjoy more the movie.
   0%	It was a bad movie.
classes = model.decide_class(sentences)

for sentence, sentence_class in zip(sentences, classes):
    print(f"'{sentence_class}':  {sentence['sentence']}")
 Sentiment   Sentence
-----------  -------------------------------
'positive':  I enjoyed very much the movie.
'negative':  It was painful to watch.
'positive':  I couldn't enjoy more the movie.
'negative':  It was a bad movie.

Out-of-Distribution performance evaluations

If you have a labeled dataset with English sentences and their sentiment labels, like the one below, you can evaluate the performace of these models on that dataset as follows.

Given the following dataset of new, unseen sentences and their sentiment labels:

test_sentences = ["it's a charming journey. ",
                  "bleak and desperate",
                  "nolan is poised to embark a major career as a commercial yet inventive filmmaker.",
                  "the acting, costumes, music, cinematography and sound are all astounding. ",
                  "it's slow -- very, very slow. ",
                  "the film is a refreshingly serious look at young women.",
                  "a sometimes tedious film.",
                  "like doing last year's taxes with your ex-wife.",
                  "you don't have to know about music to appreciate the film. ",
                  "in exactly 89 minutes, most of which passed as slowly as if i'd been sitting naked on an igloo, the movie sank from quirky to jerky to utter turkey."]

test_labels = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]

where a label of 0 represents negative sentiment and a label of 1 positive.

We evaluate the performance of the model on this labeled dataset as follows. We will need to first put each sentence within a dictionary with a key named 'sentence' for the model with the input_template, so the sentences can be consumed by it accordingly.

test_sentences = [{'sentence': sent} for sent in test_sentences] # for the model with input_template

model.evaluate(test_sentences, test_labels, batch_size=64)
loss: 0.002
accuracy: 100%
confusion matrix: [5, 0
                   0, 5]
balanced accuracy: 100% 
MCC: 1.0
F1-score: 1.0

How to train a SpaRTA adapter

Given a pre-trained model, we prepare it for fine-tuning with SpaRTA by

from peft_sparta import SpaRTA

model = SpaRTA(model, sparsity=0.99)

This adds the adapter to the pre-trained model. The adapter consists of non-trainable randomly sampled indices and trainable deltas, representing the changes to the original model parameters for those indices. Note that in this case we have chosen a sparsity level of 99%, meaning that we target to keep only 1% of the model parameters to be trainable.

Our SpaRTA wrapper class supports the following arguments:

  • model (nn.Module) Pre-trained model to be adapted.

  • sparsity (float) Target fraction of the total number of model parameters to make non-trainable. Must be 0 < sparsity < 1.

  • frozen_modules (list[str], optional) List of layers name substrings to make entirely frozen (non-trainable). Classification heads ("score") will always be fully-trainable by default. Defaults to ["embed_tokens", "self_attn.q", "self_attn.k", "mlp", "norm"].

  • trainable_tokens (list[int], optional) List of (unique) token ids whose embeddings should be fully-trainable. Useful for newly added (special) tokens to the vocabulary. Defaults to None.

  • dropout (float, optional) Dropout probability applied to the trainable parameters during training. Must be 0 <= dropout < 1. Defaults to 0.

The following notebooks illustrate examples of how to train a SpaRTA adapter with several popular trainers.

  1. Linear regression
  2. Sequence classification
  3. Text generation

Citation

@article{rios2025sparsity,
  title={Sparsity may be all you need: Sparse random parameter adaptation},
  author={Rios, Jesus and Dognin, Pierre and Luss, Ronny and Ramamurthy, Karthikeyan N},
  journal={arXiv preprint arXiv:2502.15975},
  year={2025}
}
@software{rios2025sparta,
  title   = {{PEFT-SpaRTA}},
  author  = {Rios, Jesus},
  url     = {https://github.com/jmriosal/peft-sparta}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peft_sparta-0.0.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peft_sparta-0.0.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file peft_sparta-0.0.1.tar.gz.

File metadata

  • Download URL: peft_sparta-0.0.1.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for peft_sparta-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4a6fb3befca2a4758e47bbd5897cdccd9331cbcc70a36d1a0dcae577a715b37c
MD5 1b5e59b6d85ff215316ebb3419065422
BLAKE2b-256 759a84f6e9f8da35f8e26e1b4591173b775580179b727beaa35e7c2cad65c4cb

See more details on using hashes here.

File details

Details for the file peft_sparta-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: peft_sparta-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for peft_sparta-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2d1c58e87984991a558be5820c34371a2f65209ab700468401a1814db1103a9a
MD5 294dc7568c628cacc9bdcfae69207ac3
BLAKE2b-256 4b6856859ed62684e43e1eab0c6091afe2871428107a93049d1f4673a533ae3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page