Rank transformer models for NLP tasks using transferability measures
Project description
TransformerRanker
A lightweight library to efficiently rank transformer language models for classification tasks.
There is a multitude of pre-trained language models available. Fine-tuning each to select which one scores best on your classification dataset is both time and resource expensive. TransformerRanker is a library that can be used for the model selection process, where you can choose any dataset from the HuggingFace collection of datasets, select different model candidates from the model hub, and let the tool rank them using transferability estimation metrics.
Installation
You can install the tool using pip:
pip install transformer-ranker
Three-step-interface
Step 1. Load your dataset
Choose any dataset from the datasets library:
from datasets import load_dataset
# Load your dataset using hf loader
dataset = load_dataset('conll2003')
Take a look how to load your custom dataset using HuggingFace datasets.
Step 2. Prepare a list of language models
Choose any model names from the model hub:
# Prepare a list of model handles
language_models = [
"sentence-transformers/all-mpnet-base-v2",
"xlm-roberta-large",
"google/electra-large-discriminator",
"microsoft/deberta-v3-large",
"nghuyong/ernie-2.0-large-en",
# ...
]
...or use our recommended list of models to try out:
language_models = prepare_popular_models('base')
Step 3. Rank Models
Initialize the ranker with your dataset and run it your models:
from transformer_ranker import TransformerRanker
# Initialize the ranker with your dataset
ranker = TransformerRanker(dataset, dataset_downsample=0.2)
# Run it with selected transformer models
results = ranker.run(language_models, batch_size=64)
Review ranked models:
print(results)
Display results showing models sorted by their transferability scores:
Rank 1. microsoft/deberta-v3-large: 2.7962
Rank 2. nghuyong/ernie-2.0-large-en: 2.7788
Rank 3. google/electra-large-discriminator: 2.7486
Rank 4. xlm-roberta-large: 2.6695
Rank 5. sentence-transformers/all-mpnet-base-v2: 2.5709
...
Using these results you can exclude the lower-ranked models to only focus on the top-ranked models for further exploration.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transformer-ranker-0.1.0.tar.gz.
File metadata
- Download URL: transformer-ranker-0.1.0.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15688cb9fac28ea3478ef7368b8555de70738bd1b16b30ebdc3c537c5d0a4426
|
|
| MD5 |
4df42f4a205bb7792755fe20cecf138f
|
|
| BLAKE2b-256 |
9729791c52c1661af3a9e5125ee9be66a2f9bb1baf10dbc52d4df8dca084ceea
|
File details
Details for the file transformer_ranker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: transformer_ranker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e26a5253ca48eb0464dc08086d68d7cbc850c8c777cbe4a61fa8ac2bff42bd36
|
|
| MD5 |
2133df4b4e68b8d2b3427892d4fb088a
|
|
| BLAKE2b-256 |
96afb838dab898d5be8ec145f9c5ff1d5669caa7f8913242cda46e965c4700d3
|