Skip to main content

Preprocessings to prepare datasets for a task

Project description

tasksource

Huggingface Datasets is a great library, but it lacks standardization, and datasets require preprocessings to be used interchangeably. Meet tasksource: a collection of task preprocessings to facilitate multi-task learning and reproducibility.

import tasksource
from datasets import load_dataset

tasksource.bigbench(load_dataset('bigbench', 'movie_recommendation'))

Each dataset is mapped to a MultipleChoice, Classification, or TokenClassification task with standardized fields. We do not support generation tasks as they are addressed by promptsource.

All implemented preprocessings can be found in tasks.py. Each preprocessing is a function that takes a dataset as input and returns a standardized dataset.

The annotation format is designed to be human readable. Adding a new preprocessing only takes a few lines, e.g:

cos_e = tasksource.MultipleChoice('question',
    choices_list='choices',
    labels= lambda x: x['choices_list'].index(x['answer']),
    config_name='v1.0')

See supported tasks in tasks.md

contact

damien.sileo@inria.fr

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tasksource-0.0.1.tar.gz (37.5 kB view hashes)

Uploaded Source

Built Distribution

tasksource-0.0.1-py3-none-any.whl (30.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page