Skip to main content

Rank transformer models for NLP tasks using transferability measures

Project description

TransformerRanker

A lightweight library to efficiently rank transformer language models for classification tasks.

There is a multitude of pre-trained language models available. Fine-tuning each to select which one scores best on your classification dataset is both time and resource expensive. TransformerRanker is a library that can be used for the model selection process, where you can choose any dataset from the HuggingFace collection of datasets, select different model candidates from the model hub, and let the tool rank them using transferability estimation metrics.

Installation

You can install the tool using pip:

pip install transformer-ranker

Three-step-interface

Step 1. Load your dataset

Choose any dataset from the datasets library:

from datasets import load_dataset

# Load your dataset using hf loader
dataset = load_dataset('conll2003')

Take a look how to load your custom dataset using HuggingFace datasets.

Step 2. Prepare a list of language models

Choose any model names from the model hub:

# Prepare a list of model handles
language_models = [
    "sentence-transformers/all-mpnet-base-v2",
    "xlm-roberta-large",
    "google/electra-large-discriminator",
    "microsoft/deberta-v3-large",
    "nghuyong/ernie-2.0-large-en",
    # ...
]

...or use our recommended list of models to try out:

language_models = prepare_popular_models('base')

Step 3. Rank Models

Initialize the ranker with your dataset and run it your models:

from transformer_ranker import TransformerRanker

# Initialize the ranker with your dataset
ranker = TransformerRanker(dataset, dataset_downsample=0.2)

# Run it with selected transformer models
results = ranker.run(language_models, batch_size=64)

Review ranked models:

print(results)

Display results showing models sorted by their transferability scores:

Rank 1. microsoft/deberta-v3-large: 2.7962
Rank 2. nghuyong/ernie-2.0-large-en: 2.7788
Rank 3. google/electra-large-discriminator: 2.7486
Rank 4. xlm-roberta-large: 2.6695
Rank 5. sentence-transformers/all-mpnet-base-v2: 2.5709
...

Using these results you can exclude the lower-ranked models to only focus on the top-ranked models for further exploration.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer-ranker-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformer_ranker-0.1.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file transformer-ranker-0.1.0.tar.gz.

File metadata

  • Download URL: transformer-ranker-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for transformer-ranker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 15688cb9fac28ea3478ef7368b8555de70738bd1b16b30ebdc3c537c5d0a4426
MD5 4df42f4a205bb7792755fe20cecf138f
BLAKE2b-256 9729791c52c1661af3a9e5125ee9be66a2f9bb1baf10dbc52d4df8dca084ceea

See more details on using hashes here.

File details

Details for the file transformer_ranker-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for transformer_ranker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e26a5253ca48eb0464dc08086d68d7cbc850c8c777cbe4a61fa8ac2bff42bd36
MD5 2133df4b4e68b8d2b3427892d4fb088a
BLAKE2b-256 96afb838dab898d5be8ec145f9c5ff1d5669caa7f8913242cda46e965c4700d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page