SpaRTA adaptation wrapper. Invocation code to load and run SpaRTA adapters for inference
Project description
PEFT-SpaRTA
SpaRTA (Sparse Random parameTer Adaptation) is a Parameter-Efficient Fine-Tuning (PEFT) alternative to traditional LoRA that reduces the number of trainable parameters by randomly selecting a very small proportion of the model parameters to train on.
This Python package provides the invocation code necessary to load and run SpaRTA-adapted models for inference. In particular, it includes the classes
SpaRTAforSequenceClassificationSpaRTAforCausalLM
to load a SpaRTA adapter along its pre-trained base (transformer) model architectured, respectively, for sequence classification tasks and autoregressive text generation tasks.
We also include the class
SpaRTA
to facilitate sparse random parameter adaptation of a model and train your own SpaRTA adapters. This implementation is compatible with some of the most popular trainers, as shown in here.
For more details on how SpaRTA works see our paper. The original implementation of SpaRTA can be found in https://github.com/IBM/sparta.
Installation
pip install peft-sparta
How to use it for inference
Download a SpaRTA adapter from a Hugging Face repository
Let's download a SpaRTA adapter that spacializes the google/gemma-2b model to do sentiment classification of English sentences.
ADAPTER_DIR='/my_sparta_adapters/sparta-gemma_2b/'
mkdir -p $ADAPTER_DIR
hf download jesusriosal/sparta-gemma_2b-sst2 --local-dir $ADAPTER_DIR
Load the SpaRTA adapter and create the adapted model
from peft_sparta import SpaRTAforSequenceClassification
adapter_dir = '/my_sparta_adapters/sparta-gemma_2b/'
model = SpaRTAforSequenceClassification(
adapter = adapter_dir,
device = 'cuda')
print(model)
(SpaRTA)ModelForSeqClassification(
adapter = '/my_sparta_adapters/sparta-gemma_2b/'
model = 'google/gemma-2b'
id2label = {0: 'negative', 1: 'positive'}
)
Inputs
Let's use our adapted model to classify a few sentences. For this adapter, the model consumes the sentences directly. No formating is needed
sentences = ["I enjoyed very much the movie.",
"It was painful to watch.",
"I couldn't enjoy more the movie.",
"It was a bad movie."]
Inference
Probabilistic classification
The adapted model can give us its estimated probabilities that each sentence (row) has negative (first column) or positive (second column) sentiment.
class_probs = model.classify(sentences)
print(class_probs)
tensor([[0.1152, 0.8848],
[0.9497, 0.0503],
[0.1689, 0.8311],
[0.9720, 0.0280]], device='cuda:0')
To identify which column correspond to each class, use:
print(model.id2label)
{'0': 'negative', '1': 'positive'}
Here are the model's estimated probabilities of positive sentiment for each sentence
for sentence, pos_prob in zip(sentences, class_probs[:,1]):
print(f"{pos_prob.item()*100:>4.0f}%\t{sentence}")
Prob Sentence
---- -----------------------------
88% I enjoyed very much the movie.
5% It was painful to watch.
83% I couldn't enjoy more the movie.
3% It was a bad movie.
Deciding the sentiment class of each sentence (deterministic classification)
We have seen how the model makes probabilistic assessments of the sentiment of each sentence. If we want the model to make a definitive decison on whether the sentence has positive or negative sentiment, we can use:
classes = model.decide_class(sentences)
to obtain the model's predicted class of each sentence. Basically, the model takes the most likely class as its sentiment prediction of a sentence
for sentence, sentence_class in zip(sentences, classes):
print(f"'{sentence_class}': {sentence}")
Sentiment Sentence
----------- -------------------------------
'positive': I enjoyed very much the movie.
'negative': It was painful to watch.
'positive': I couldn't enjoy more the movie.
'negative': It was a bad movie.
Input templates
Sometimes the input to the model may need to be formatted before our adapted model can processs it.
This is typicaly the case when using instruction-following models, for which wrapping the input within an instruction, formatted with the model's chat template, can be advantageous. In these cases, we can use the following input_template argument to specify the formatting over raw inputs used during training, and needed during inference.
To see this, let's use another SpaRTA adapter for sentiment classification based on the google/gemma-2b-it model.
hf download jesusriosal/sparta-gemma_2b-sst2 --local-dir '/my_sparta_adapters/sparta-gemma_2b_it/'
from peft_sparta import SpaRTAforSequenceClassification
adapter_dir = '/my_sparta_adapters/sparta-gemma_2b_it/'
model = SpaRTAforSequenceClassification(
adapter=adapter_dir,
device='cuda',
input_template = ("<start_of_turn>user\n"
"Determine the sentiment of the following sentence about a movie. "
"The sentiment can only be classified as positive or negative.\n"
"Sentence: {sentence}"
"<end_of_turn>\n<start_of_turn>model\n"
"The sentiment of the sentence is")
)
print(model)
(SpaRTA)ModelForSeqClassification(
adapter = '/my_sparta_adapters/sparta-gemma_2b_it/'
model = 'google/gemma-2b-it'
id2label = {0: 'negative', 1: 'positive'}
)
This SpaRTA adapter was trained formating the input sentences to be classified with the input_template (see model.template printout below), which included a task instruction. This ensures that during inference the same formatting is used on the inputs to be classified.
print(model.template)
<start_of_turn>user
Determine the sentiment of the following sentence about a movie. The sentiment can only be classified as positive or negative.
Sentence: {sentence}<end_of_turn>
<start_of_turn>model
The sentiment of the sentence is
For example, the sentence
I enjoyed very much the movie.
is converted to
<start_of_turn>user
Determine the sentiment of the following sentence about a movie. The sentiment can only be classified as positive or negative.
Sentence: I enjoyed very much the movie.<end_of_turn>
<start_of_turn>model
The sentiment of the sentence is
before passing it to the model for classification
Thus, to classify the (raw, non-formatted) sentences above we proceed as follows
sentences = [{'sentence': sent} for sent in sentences]
class_probs = model.classify(sentences)
# prob of positive sentiment for each sentence
for sentence, pos_prob in zip(sentences, class_probs[:,1]):
print(f"{pos_prob.item()*100:>4.0f}%\t{sentence['sentence']}")
100% I enjoyed very much the movie.
0% It was painful to watch.
100% I couldn't enjoy more the movie.
0% It was a bad movie.
classes = model.decide_class(sentences)
for sentence, sentence_class in zip(sentences, classes):
print(f"'{sentence_class}': {sentence['sentence']}")
Sentiment Sentence
----------- -------------------------------
'positive': I enjoyed very much the movie.
'negative': It was painful to watch.
'positive': I couldn't enjoy more the movie.
'negative': It was a bad movie.
Out-of-Distribution performance evaluations
If you have a labeled dataset with English sentences and their sentiment labels, like the one below, you can evaluate the performace of these models on that dataset as follows.
Given the following dataset of new, unseen sentences and their sentiment labels:
test_sentences = ["it's a charming journey. ",
"bleak and desperate",
"nolan is poised to embark a major career as a commercial yet inventive filmmaker.",
"the acting, costumes, music, cinematography and sound are all astounding. ",
"it's slow -- very, very slow. ",
"the film is a refreshingly serious look at young women.",
"a sometimes tedious film.",
"like doing last year's taxes with your ex-wife.",
"you don't have to know about music to appreciate the film. ",
"in exactly 89 minutes, most of which passed as slowly as if i'd been sitting naked on an igloo, the movie sank from quirky to jerky to utter turkey."]
test_labels = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
where a label of 0 represents negative sentiment and a label of 1 positive.
We evaluate the performance of the model on this labeled dataset as follows. We will need to first put each sentence within a dictionary with a key named 'sentence' for the model with the input_template, so the sentences can be consumed by it accordingly.
test_sentences = [{'sentence': sent} for sent in test_sentences] # for the model with input_template
model.evaluate(test_sentences, test_labels, batch_size=64)
loss: 0.002
accuracy: 100%
confusion matrix: [5, 0
0, 5]
balanced accuracy: 100%
MCC: 1.0
F1-score: 1.0
How to train a SpaRTA adapter
Given a pre-trained model, we prepare it for fine-tuning with SpaRTA by
from peft_sparta import SpaRTA
model = SpaRTA(model, sparsity=0.99)
This adds the adapter to the pre-trained model. The adapter consists of non-trainable randomly sampled indices and trainable deltas, representing the changes to the original model parameters for those indices. Note that in this case we have chosen a sparsity level of 99%, meaning that we target to keep only 1% of the model parameters to be trainable.
Our SpaRTA wrapper class supports the following arguments:
-
model (nn.Module)Pre-trained model to be adapted. -
sparsity (float)Target fraction of the total number of model parameters to make non-trainable. Must be 0 < sparsity < 1. -
frozen_modules (list[str], optional)List of layers name substrings to make entirely frozen (non-trainable). Classification heads ("score") will always be fully-trainable by default. Defaults to ["embed_tokens", "self_attn.q", "self_attn.k", "mlp", "norm"]. -
trainable_tokens (list[int], optional)List of (unique) token ids whose embeddings should be fully-trainable. Useful for newly added (special) tokens to the vocabulary. Defaults to None. -
dropout (float, optional)Dropout probability applied to the trainable parameters during training. Must be 0 <= dropout < 1. Defaults to 0.
The following notebooks illustrate examples of how to train a SpaRTA adapter with several popular trainers.
Citation
@article{rios2025sparsity,
title={Sparsity may be all you need: Sparse random parameter adaptation},
author={Rios, Jesus and Dognin, Pierre and Luss, Ronny and Ramamurthy, Karthikeyan N},
journal={arXiv preprint arXiv:2502.15975},
year={2025}
}
@software{rios2025sparta,
title = {{PEFT-SpaRTA}},
author = {Rios, Jesus},
url = {https://github.com/jmriosal/peft-sparta}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peft_sparta-0.0.1.tar.gz.
File metadata
- Download URL: peft_sparta-0.0.1.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a6fb3befca2a4758e47bbd5897cdccd9331cbcc70a36d1a0dcae577a715b37c
|
|
| MD5 |
1b5e59b6d85ff215316ebb3419065422
|
|
| BLAKE2b-256 |
759a84f6e9f8da35f8e26e1b4591173b775580179b727beaa35e7c2cad65c4cb
|
File details
Details for the file peft_sparta-0.0.1-py3-none-any.whl.
File metadata
- Download URL: peft_sparta-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d1c58e87984991a558be5820c34371a2f65209ab700468401a1814db1103a9a
|
|
| MD5 |
294dc7568c628cacc9bdcfae69207ac3
|
|
| BLAKE2b-256 |
4b6856859ed62684e43e1eab0c6091afe2871428107a93049d1f4673a533ae3c
|