Skip to main content

Seamless integration of tasks with huggingface models

Project description

tasknet : simple multi-task transformers fine-tuning with Trainer and HuggingFace datasets.

tasknet is an interface between Huggingface datasets and Huggingface Trainer.

Task templates

tasknet relies on task templates to avoid boilerplate codes. The task templates correspond to Transformers AutoClasses:

  • SequenceClassification
  • TokenClassification
  • MultipleChoice
  • Seq2SeqLM (experimental support)

The task templates follow the same interface. They implement preprocess_function, a data collator and compute_metrics. Look at tasks.py and use existing templates as a starting point to implement a custom task template.

Task instances

Each task template has fields that should be matched with specific dataset columns. Classification has two text fields s1,s2, and a label y. Pass a dataset to a template, and fill-in the mapping between the tempalte fields and the dataset columns to instanciate a task.

import tasknet as tn
from datasets import load_dataset

rte = tn.Classification(
    dataset=load_dataset("glue", "rte"),
    s1="sentence1", s2="sentence2", y="label"
)

class args:
  model_name='roberta-base'
  learning_rate = 3e-5 
  # see https://huggingface.co/docs/transformers/v4.24.0/en/main_classes/trainer#transformers.TrainingArguments

 
tasks = [rte]
model = tn.Model(tasks, args)
trainer = tn.Trainer(model, tasks, args)
trainer.train()

Tasknet is multitask by design. It works with list of tasks and the model creates a task_models_list attribute.

Installation

pip install tasknet

Additional examples:

Colab:

https://colab.research.google.com/drive/15Xf4Bgs3itUmok7XlAK6EEquNbvjD9BD?usp=sharing

tasknet vs jiant

jiant is another library comparable to tasknet. tasknet is a minimal extension of Trainer centered on task templates, while jiant builds a custom analog of Trainer from scratch called runner. tasknet is leaner and easier to extend. jiant is config-based while tasknet is designed for interative use and scripting.

Credit

This code uses some part of the examples of the transformers library and some code from multitask-learning-transformers.

Contact

You can request features on github or reach me at damien.sileo@inria.fr

@misc{sileod22-tasknet,
  author = {Sileo, Damien},
  doi = {10.5281/zenodo.561225781},
  month = {11},
  title = {{tasknet, multitask interface between Trainer and datasets}},
  url = {https://github.com/sileod/tasknet},
  version = {1.5.0},
  year = {2022}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasknet-1.36.0.tar.gz (42.6 kB view details)

Uploaded Source

Built Distribution

tasknet-1.36.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file tasknet-1.36.0.tar.gz.

File metadata

  • Download URL: tasknet-1.36.0.tar.gz
  • Upload date:
  • Size: 42.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for tasknet-1.36.0.tar.gz
Algorithm Hash digest
SHA256 ad18d650ea905c4a58381317f1c502c202c01063ee4f352130e6093c043c9c54
MD5 f60835b4a6c1c4708f4f4e0edaaeab5b
BLAKE2b-256 3945e6d1bb7e00890a29db21f02df6beae131bc4d3a8142ca0416544a23bbdfa

See more details on using hashes here.

Provenance

File details

Details for the file tasknet-1.36.0-py3-none-any.whl.

File metadata

  • Download URL: tasknet-1.36.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for tasknet-1.36.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f40d9248a4c60f6d2136804836d0a6da441ec2cb2d6068c355c7efc89d9a6c5c
MD5 c61e53a5e614ac5a7da0a58e4c09a463
BLAKE2b-256 e6abdbd2b5fa3541e5603e562d096ca184ad4ca22cf249ee4c258274c8c4e71a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page