Skip to main content

Seamless integration of tasks with huggingface models

Project description

tasknet : simple modernBERT fine-tuning, with multi-task support

tasknet is an interface between Huggingface datasets and Huggingface transformers Trainer.

Tasknet should work with all recent versions of Transformers.

Installation and example

pip install tasknet

Each task template has fields that should be matched with specific dataset columns. Classification has two text fields s1,s2, and a label y. Pass a dataset to a template, and fill in the mapping between the template fields and the dataset columns to instantiate a task.

import tasknet as tn; from datasets import load_dataset

rte = tn.Classification(
    dataset=load_dataset("glue", "rte"),
    s1="sentence1", s2="sentence2", y="label") #s2 is optional for classification, used to represent text pairs
 # See AutoTask for shorter code

class hparams:
  model_name = 'tasksource/ModernBERT-base-nli' # better performance for most tasks
  learning_rate = 3e-5 # see hf.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
 
model, trainer = tn.Model_Trainer(tasks=[rte],hparams)
trainer.train(), trainer.evaluate()
p = trainer.pipeline()
p([{'text':'premise here','text_pair': 'hypothesis here'}]) # HuggingFace pipeline for inference

Tasknet is multitask by design. model.task_models_list contains one model per task, with a shared encoder.

Task templates

tasknet relies on task templates to avoid boilerplate codes. The task templates correspond to Transformers AutoClasses:

  • SequenceClassification(s1, s2, y)
  • TokenClassification(tokens, labels) (tokens and labels are lists of words and assigned labels)
  • MultipleChoice(s1, choices) (s1 is a prompt/qusetion, choices is a list of texts, y is the index of the correct choice)
  • Seq2SeqLM (experimental support)

The task templates follow the same interface. They implement preprocess_function, a data collator and compute_metrics. Look at tasks.py and use existing templates as a starting point to implement a custom task template.

AutoTask

You can also leverage tasksource with tn.AutoTask and have one-line access to 600+ datasets, see implemented tasks.

rte = tn.AutoTask("glue/rte", nrows=5000)

AutoTask guesses a template based on the dataset structure. It also accepts a dataset as input, if it fits the template (e.g. after tasksource custom preprocessing).

Balancing dataset sizes

tn.Classification(dataset, nrows=5000, nrows_eval=500, oversampling=2)

You can balance multiple datasets with nrows and oversampling. nrows is the maximal number of examples. If a dataset has less than nrows, it will be oversampled at most oversampling times.

Colab examples

Minimal-ish example:

https://colab.research.google.com/drive/15Xf4Bgs3itUmok7XlAK6EEquNbvjD9BD?usp=sharing

More complex example, where tasknet was scaled to 600 tasks:

https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

Credit

This code uses some part of the examples of the transformers library and some code from multitask-learning-transformers.

Contact

You can request features on github or reach me at damien.sileo@inria.fr

@misc{sileod22-tasknet,
  author = {Sileo, Damien},
  doi = {10.5281/zenodo.561225781},
  month = {11},
  title = {{tasknet, multitask interface between Trainer and datasets}},
  url = {https://github.com/sileod/tasknet},
  version = {1.5.0},
  year = {2022}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasknet-1.58.tar.gz (65.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tasknet-1.58-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file tasknet-1.58.tar.gz.

File metadata

  • Download URL: tasknet-1.58.tar.gz
  • Upload date:
  • Size: 65.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tasknet-1.58.tar.gz
Algorithm Hash digest
SHA256 8de4bcf673399a7ca5a232f4d8e15e893a5966c9421fe53ca7f18598467b6780
MD5 1608d9c0d6b6dd842c6b4f8e0e78c35d
BLAKE2b-256 1b0500200943a1c8aaf871d717be1ab0025e935b2c1c6c7103a097b79067c59d

See more details on using hashes here.

File details

Details for the file tasknet-1.58-py3-none-any.whl.

File metadata

  • Download URL: tasknet-1.58-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tasknet-1.58-py3-none-any.whl
Algorithm Hash digest
SHA256 1bc5981d4906f937f4f19ab5a1311f8c044e2b8cbffdec81394abebc2ad3914d
MD5 fce149571f2021b9ef6e07fabc3c2e55
BLAKE2b-256 994b9367d62e81afb96930cda05bddab3e310b357593472c7f30741e7d7c8223

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page