Preprocessings to prepare datasets for a task
Project description
tasksource
Huggingface Datasets is a great library, but it lacks standardization, and datasets require preprocessings to be used interchangeably.
Meet tasksource
: a collection of task preprocessings to facilitate multi-task learning and reproducibility.
import tasksource
from datasets import load_dataset
tasksource.bigbench(load_dataset('bigbench', 'movie_recommendation'))
Each dataset is mapped to a MultipleChoice
, Classification
, or TokenClassification
task with standardized fields.
We do not support generation tasks as they are addressed by promptsource.
All implemented preprocessings can be found in tasks.py. Each preprocessing is a function that takes a dataset as input and returns a standardized dataset.
The annotation format is designed to be human readable. Adding a new preprocessing only takes a few lines, e.g:
cos_e = tasksource.MultipleChoice('question',
choices_list='choices',
labels= lambda x: x['choices_list'].index(x['answer']),
config_name='v1.0')
See supported tasks in tasks.md
contact
damien.sileo@inria.fr
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tasksource-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f64e904cccd6634f34c4935fbd675a91df99ec3d2a2a4174988862c0048b51c |
|
MD5 | f524d3d5a31407775934cbcf10dd139b |
|
BLAKE2b-256 | 80ba32b905c13269123768bf601e56d98f4cc6e7cf935b6d13646e38396a8923 |