tasknet

Seamless integration of tasks with huggingface models

These details have not been verified by PyPI

Project links

Homepage

Project description

tasknet : simple multi-task transformer fine-tuning with Trainer and HuggingFace datasets.

tasknet is an interface between Huggingface datasets and Huggingface Trainer.

Task templates

tasknet relies on task templates to avoid boilerplate codes. The task templates correspond to Transformers AutoClasses:

SequenceClassification
TokenClassification
MultipleChoice
Seq2SeqLM (experimental support)

The task templates follow the same interface. They implement preprocess_function, a data collator and compute_metrics. Look at tasks.py and use existing templates as a starting point to implement a custom task template.

Installation and example

pip install tasknet

Each task template has fields that should be matched with specific dataset columns. Classification has two text fields s1,s2, and a label y. Pass a dataset to a template, and fill in the mapping between the template fields and the dataset columns to instantiate a task.

import tasknet as tn; from datasets import load_dataset

rte = tn.Classification(
    dataset=load_dataset("glue", "rte"),
    s1="sentence1", s2="sentence2", y="label") #s2 is optional # See AutoTask for shorter code

class hparams:
  model_name='microsoft/deberta-v3-base' # deberta models have the best results (and tasknet support)
  learning_rate = 3e-5 # see hf.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
 
tasks = [rte]
model = tn.Model(tasks, hparams)
trainer = tn.Trainer(model, tasks, hparams)
trainer.train()
trainer.evaluate()
p = trainer.pipeline()
p([{'text':'premise here','text_pair': 'hypothesis here'}]) # HuggingFace pipeline for inference

Tasknet is multitask by design. model.task_models_list contains one model per task, with a shared encoder.

AutoTask

You can also leverage tasksource with tn.AutoTask and have one-line access to 600+ datasets, see implemented tasks.

rte = tn.AutoTask("glue/rte", nrows=5000)

AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).

Balancing dataset sizes

tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2)

You can balance multiple datasets with nrows and oversampling. nrows is the maximal number of examples. If a dataset has less than nrows, it will be oversampled at most oversampling times.

Colab examples

Minimal-ish example:

https://colab.research.google.com/drive/15Xf4Bgs3itUmok7XlAK6EEquNbvjD9BD?usp=sharing

More complex example, where tasknet was scaled to 600 tasks:

https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

tasknet vs jiant

jiant is another library comparable to tasknet. tasknet is a minimal extension of Trainer centered on task templates, while jiant builds a Trainer equivalent from scratch called runner. tasknet is leaner and closer to Huggingface native tools. Jiant is config-based and command line focused while tasknet is designed for interactive use and python scripting.

Credit

This code uses some part of the examples of the transformers library and some code from multitask-learning-transformers.

Contact

You can request features on github or reach me at damien.sileo@inria.fr

@misc{sileod22-tasknet,
  author = {Sileo, Damien},
  doi = {10.5281/zenodo.561225781},
  month = {11},
  title = {{tasknet, multitask interface between Trainer and datasets}},
  url = {https://github.com/sileod/tasknet},
  version = {1.5.0},
  year = {2022}}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.54.0

May 21, 2024

This version

1.53.0

Mar 1, 2024

1.52.0

Nov 2, 2023

1.51.0

Nov 2, 2023

1.50.0

Nov 2, 2023

1.49.0

Sep 19, 2023

1.48.0

Aug 25, 2023

1.47.0

Jun 30, 2023

1.46.0

Jun 30, 2023

1.45.0

Jun 30, 2023

1.44.0

May 4, 2023

1.43.0

Apr 24, 2023

1.42.0

Apr 6, 2023

1.41.0

Mar 27, 2023

1.40.0

Mar 19, 2023

1.39.0

Mar 19, 2023

1.38.0

Mar 14, 2023

1.37.0

Mar 10, 2023

1.36.0

Mar 10, 2023

1.35.0

Feb 27, 2023

1.34.0

Feb 26, 2023

1.33.0

Feb 24, 2023

1.32.0

Feb 23, 2023

1.31.0

Feb 22, 2023

1.30.0

Feb 20, 2023

1.29.0

Feb 20, 2023

1.28.0

Feb 20, 2023

1.27.0

Feb 20, 2023

1.26.0

Feb 17, 2023

1.25.0

Feb 13, 2023

1.24.0

Feb 2, 2023

1.23.0

Feb 2, 2023

1.22.0

Jan 20, 2023

1.21.0

Jan 19, 2023

1.20.0

Jan 13, 2023

1.19.0

Jan 10, 2023

1.18.0

Jan 9, 2023

1.17.0

Dec 24, 2022

1.16.0

Dec 2, 2022

1.15.0

Nov 29, 2022

1.14.0

Nov 23, 2022

1.13.0

Nov 22, 2022

1.12.0

Nov 22, 2022

1.11.0

Nov 16, 2022

1.10.0

Nov 16, 2022

1.9.0

Nov 16, 2022

1.8.0

Nov 16, 2022

1.7.0

Nov 16, 2022

1.6.0

Nov 15, 2022

1.5

Nov 9, 2022

1.4

Nov 9, 2022

1.3

Nov 9, 2022

1.2

Nov 7, 2022

1.1

Nov 7, 2022

1.0.0

Nov 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasknet-1.53.0.tar.gz (31.1 kB view details)

Uploaded Mar 1, 2024 Source

Built Distribution

tasknet-1.53.0-py3-none-any.whl (28.5 kB view details)

Uploaded Mar 1, 2024 Python 3

File details

Details for the file tasknet-1.53.0.tar.gz.

File metadata

Download URL: tasknet-1.53.0.tar.gz
Upload date: Mar 1, 2024
Size: 31.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tasknet-1.53.0.tar.gz
Algorithm	Hash digest
SHA256	`b69486db82b948864d406caff253b4a9f98d9bd7e90b26bd079ba5c20c90423f`
MD5	`2df9400b5897f4bfd76051d642701748`
BLAKE2b-256	`aade4a68ff039c7573e2b42ad7baf7d753fe4b2a1f250ebacf872e89e28f0c14`

See more details on using hashes here.

Provenance

File details

Details for the file tasknet-1.53.0-py3-none-any.whl.

File metadata

Download URL: tasknet-1.53.0-py3-none-any.whl
Upload date: Mar 1, 2024
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tasknet-1.53.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d3cc5432cf83cccccfeb474a682a59b59525c9e826e07bc16bcddad4fe9eab0`
MD5	`7820de0ec6ac5cf0ed62c02d69e40c3d`
BLAKE2b-256	`1f15052b921365805993d03fa3ed3e0c345514cc3c5c43fcdca485e3344847ab`