Seamless integration of tasks with huggingface models
Project description
tasknet : simple multi-task transformer fine-tuning with Trainer and HuggingFace datasets.
tasknet
is an interface between Huggingface datasets and Huggingface Trainer.
Task templates
tasknet
relies on task templates to avoid boilerplate codes. The task templates correspond to Transformers AutoClasses:
SequenceClassification
TokenClassification
MultipleChoice
Seq2SeqLM
(experimental support)
The task templates follow the same interface. They implement preprocess_function
, a data collator and compute_metrics
.
Look at tasks.py and use existing templates as a starting point to implement a custom task template.
Installation and example
pip install tasknet
Each task template has fields that should be matched with specific dataset columns. Classification has two text fields s1
,s2
, and a label y
. Pass a dataset to a template, and fill in the mapping between the template fields and the dataset columns to instantiate a task.
import tasknet as tn; from datasets import load_dataset
rte = tn.Classification(
dataset=load_dataset("glue", "rte"),
s1="sentence1", s2="sentence2", y="label") #s2 is optional # See AutoTask for shorter code
class hparams:
model_name='microsoft/deberta-v3-base' # deberta models have the best results (and tasknet support)
learning_rate = 3e-5 # see hf.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
tasks = [rte]
model = tn.Model(tasks, hparams)
trainer = tn.Trainer(model, tasks, hparams)
trainer.train()
trainer.evaluate()
p = trainer.pipeline()
p([{'text':x.premise,'text_pair': x.hypothesis}]) # HuggingFace pipeline for inference
Tasknet is multitask by design. model.task_models_list
contains one model per task, with a shared encoder.
Balancing dataset sizes
tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2)
You can balance multiple datasets with nrows
and oversampling
. nrows
is the maximal number of examples. If a dataset has less than nrows
, it will be oversampled at most oversampling
times.
AutoTask
You can also leverage tasksource with tn.AutoTask and have one-line access to 600+ datasets, see implemented tasks.
rte = tn.AutoTask("glue/rte", nrows=5000)
AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).
Colab examples
Minimal-ish example:
https://colab.research.google.com/drive/15Xf4Bgs3itUmok7XlAK6EEquNbvjD9BD?usp=sharing
More complex example, where tasknet was scaled to 600 tasks:
https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
tasknet vs jiant
jiant is another library comparable to tasknet. tasknet is a minimal extension of Trainer
centered on task templates, while jiant builds a Trainer
equivalent from scratch called runner
.
tasknet
is leaner and closer to Huggingface native tools. Jiant is config-based and command line focused while tasknet is designed for interactive use and python scripting.
Credit
This code uses some part of the examples of the transformers library and some code from multitask-learning-transformers.
Contact
You can request features on github or reach me at damien.sileo@inria.fr
@misc{sileod22-tasknet,
author = {Sileo, Damien},
doi = {10.5281/zenodo.561225781},
month = {11},
title = {{tasknet, multitask interface between Trainer and datasets}},
url = {https://github.com/sileod/tasknet},
version = {1.5.0},
year = {2022}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tasknet-1.51.0.tar.gz
.
File metadata
- Download URL: tasknet-1.51.0.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 028357eb2235740cd0f0def9b5529e7ebee85a60c22df9499b40ebe06b7ea8a8 |
|
MD5 | 147429a972395c6fcd6fa60c968332c3 |
|
BLAKE2b-256 | 464c9ceef23b66776df485e90cdf44189fa99019761472b783eb68b60cb2b8ad |
Provenance
File details
Details for the file tasknet-1.51.0-py3-none-any.whl
.
File metadata
- Download URL: tasknet-1.51.0-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 473052f5508e4f0b8852349099ab71a6225626217b29b32a5d9684c6af6cab02 |
|
MD5 | 26f6002f00878b74628a780fe7a2a1e7 |
|
BLAKE2b-256 | bce25e94d34d9fb7946b4f1884b63a55bfe550c20ec521592710f6627f924e9a |