Skip to main content

Finetune transformer-based models for the Named Entity Recognition task in a simple and fast way.

Project description

fastner

fastner is a Python package to finetune transformer-based models for the Named Entity Recognition task in a simple and fast way.
It is based on the torch and the transformer🤗 libraries.

Main features

The last version of fastner provides:

Models

The transformer-based models that you can use for the finetuning are:

  • Bert base uncased (bert-base-uncased)
  • DistilBert base uncased (distilbert-base-uncased)

Tagging scheme

The labels of the dataset given as input must comply with the tagging scheme:

  • IOB (Inside, Outside, Beginning), also known as BIO

Dataset scheme

The datasets given as input (train, validation, test) must have two columns named:

  • tokens: contains the tokens of the several examples
  • tags: contains the labels of the respective tokens

Example:

tokens tags
['Apple', 'CEO', 'Tim', 'Cook', 'introduces', 'the', 'new', 'iPhone'] ['B-ORG', 'O', ''B-PER', 'I-PER', 'O', 'O','O', 'O']

Installation

With pip

fastner can be installed using pip as follows:

pip install fastner

How to use it

Use fastner is very easy! All you need is a dataset that respects the format previously given. The core function is the train_test() function:

Parameters:

  • training_set (string or pandas DataFrame) - path of the .csv training set or the pandas.DataFrame object of the training set
  • validation_set (string or pandas DataFrame) - path of the .csv validation set or the pandas.DataFrame object of the validation set
  • test_set: default (optional, string or pandas DataFrame) - path of the .csv test set or the pandas.DataFrame object of the test set
  • model_name (string, default: 'bert-base-uncased') - name of the model to finetune (available: 'bert-base-uncased' or 'distilbert-base-uncased')
  • train_args (transformers.TrainingArguments) - arguments for the training (see hugginface documenation)
  • max_len (integer, default: 512) - input sequence length (tokenizer)
  • loss (string, default='CE') - loss function, the only one available at the moment is the 'CE' Cross Entropy
  • callbacks (optional, list of transformers callbacks) - list of transformers callbacks (see hugginface documentation)
  • device (integer, default: 0) - id of the device on which to perform the training

Outputs:

  • train_results (dict) - dict with training info (runtime, samples per second, steps per seconds, loss, epochs)
  • eval_results (dict) - dict with evaluation metrics on the validation set (precision, recall, f1 both overall and for the single entities, loss)
  • test_results (dict) - dict with evaluation metrics on the test set (precision, recall, f1 both overall and for the single entities, loss)
  • trainer (transofrmers.Trainer) - transformers.Trainer object used

Example

An example of fastner in action:

from transformers import TrainingArguments, EarlyStoppingCallback
from fastner import train_test

args = TrainingArguments(
            num_train_epochs = 5,
            per_device_train_batch_size = 32,
            per_device_eval_batch_size = 8,
            output_dir= "./models",
            evaluation_strategy="epoch",
            logging_strategy = "epoch",
            save_strategy = "epoch",
            load_best_model_at_end= True,
            metric_for_best_model = 'eval_loss')
						
train_results, eval_results, test_results, trainer = train_test(
						training_set = conll2003_train,
						validation_set = conll2003_val,
						test_set=conll2003_test,
						train_args = args,
						model_name='distilbert-base-uncased',
						max_len=128, 
						loss='CE',
						callbacks= [EarlyStoppingCallback(early_stopping_patience=3)],
						device=0)

Work in Progress

A few spoilers about future releases:

  • New models
  • New tagging formats
  • New function that takes as input the dataset without any tagging scheme and returns it with the chosen tagging scheme

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastner-0.1.3.tar.gz (18.0 kB view hashes)

Uploaded Source

Built Distribution

fastner-0.1.3-py3-none-any.whl (18.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page