Reproducible Subjective Evaluation
Project description
Reproducible Subjective Evaluation (ReSEval)
ReSEval is a framework for quickly building subjective evaluation and annotation tasks that are deployed on crowdworker platforms like Amazon Mechanical Turk. ReSEval currently supports A/B, ABX, MOS, MUSHRA, and Word Selection tests on audio, image, text, and video data.
While our code is free to use, performing crowdsourced subjective
evaluation is not free.
We are not responsible for costs incurred
while using our code.
Citation
If you use ReSEval in an academic publication, please cite our paper.
IEEE
M. Morrison, B. Tang, G. Tan, and B. Pardo, "Reproducible Subjective Evaluation," ICLR Workshop on ML Evaluation Standards, April 2022.
BibTex
@inproceedings{morrison2022reproducible,
title={Reproducible Subjective Evaluation},
author={Morrison, Max and Tang, Brian and Tan, Gefei and Pardo, Bryan},
booktitle={ICLR Workshop on ML Evaluation Standards},
month={April},
year={2022}
}
Table of contents
Installation
First, install the Python module. ReSEval requires Python 3.9 or higher.
pip install reseval
Next, download Node.js. You can check that your installation is correct by running node --version
. ReSEval uses Node.js version 18.16.1 and is not guaranteed to work on all versions. If needed, Linux and OS X users can use n
to change their version of Node.js, and Windows users can use NVM for Windows.
# Linux or OS X
sudo npm install -g n
sudo n 18.16.1
# Windows
# Must be run with administrator privileges
nvm install 18.16.1
nvm use 18.16.1
Note - You must restart your terminal after changing versions of node for the change to take effect
Deploying locally
To be able to preview your subjective evaluation locally, you must setup a local MySQL database server and create a user with database creation privileges.
# Linux installation
sudo apt install mysql-server
# MySQL setup
sudo mysql_secure_installation
# Login to MySQL as root
sudo mysql -u root
-- Create a user (change new_user and new_password)
mysql> CREATE USER 'new_user'@'localhost' IDENTIFIED BY 'new_password';
-- Give user database creation privileges
mysql> GRANT ALL PRIVILEGES ON * . * TO 'new_user'@'localhost';
Run the following to store the username and password in
reseval.CACHE / '.env'
.
python -m reseval.credentials \
--mysql_local_user <mysql_user> \
--mysql_local_password <mysql_local_password>
The .env
file is used to set local environment variables and is not pushed to
GitHub or uploaded to any remote storage.
Configuration
All configuration is performed in a YAML configuration file. See examples/*.yaml
for examples and documentation of parameters.
Adding files
The files to be evaluated must be organized in a directory structure according
to the type of test being run. The directory structures for each test are as
follows. Examples of valid directories of evaluation files can be found in
examples/
.
AB
ab
├── <condition-1>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-2>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
ABX
abx
└── reference
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-1>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-2>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
MOS
mos
├── <condition-0>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-1>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-2>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-3>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
└── ...
MUSHRA
mushra
├── <condition-0>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-1>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-2>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
├── <condition-3>
│ ├── <file-0>
│ ├── <file-1>
│ ├── <file-2>
│ ├── ...
└── ...
WordSelect
wordselect
├── <file-0>
├── <words-0>
├── <file-1>
├── <words-1>
├── <file-2>
├── <words-2>
├── ...
<words-x>
is <file-x>
with -words.txt
extension.
Credentials
API keys are required to use the third-party services that ReSEval depends on. These are not required for local development. Do not share these API keys.
Amazon Web Services
Sign up for an AWS account. Go to Security Credentials
. Under Access keys
, click Create New Access Key
.
Note that this gives you a root access key. You can alternatively use the Identity & Access Management (IAM) system to setup more restrictive permissions for a user.
If you have never used AWS Elastic Beanstalk, one more step is required. Elastic Beanstalk instances created from Python (as opposed to the AWS console) do not instantiate the default instance profile IAM role. The solution is to either create (and then delete) an Elastic Beanstalk instance from the console, or to create the required IAM role manually. The IAM role is called aws-elasticbeanstalk-ec2-role
and contains three permissions: (1) AWSElasticBeanstalkWebTier
, (2) AWSElasticBeanstalkMulticontainerDocker
, and (3) AWSElasticBeanstalkWorkerTier
.
Amazon Mechanical Turk
Follow the instructions here for setting up MTurk and connecting it to your AWS account.
Heroku (Optional if you don't want to use AWS)
Sign up for a Heroku account. Go to Account Settings
. At the bottom of the page in the API Key
section is a Reveal
button.
You will also need to enable billing. You can do so here.
Usage
Once you have your configuration file and a properly formatted directory of evaluation files, you are ready to deploy a subjective evaluation. Example configuration files and corresponding evaluation files can be found in examples/
.
If you are not deploying locally, add your API keys.
# AWS credentials
python -m reseval.credentials \
--aws_api_key <aws_api_key> \
--aws_api_secret_key <aws_api_secret_key>
# (Optional) Heroku credentials
python -m reseval.credentials \
--heroku_api_key <heroku_api_key>
API keys are saved in reseval.CACHE / '.keys'
. The .keys
file is used to set local environment variables and is not pushed to GitHub or uploaded to any remote storage.
Command-line interface
Arguments for the following command-line interfaces are as follows, unless otherwise specified.
<config>
- The configuration file<directory>
- The directory of evaluation files<name>
- The name of the evaluation given in the configuration file
Create
Create a subjective evaluation either locally, in remote development mode (e.g., MTurk Sandbox), or in production mode.
Note - reseval.create
is not currently thread-safe. Wait until the first call has finished before calling it again. See this GitHub issue.
# Local development
python -m reseval.create <config> <directory> --local
# Remote development
python -m reseval.create <config> <directory>
# Production
python -m reseval.create <config> <directory> --production
Monitor
# Monitor all subjective evaluations
python -m reseval.monitor
# Monitor one subjective evaluation
# The name of the evaluation can be found in its configuration file
python -m reseval.monitor --name <name>
Note - By default, the monitor updates once every minute. You can update the monitor more or less often by providing an update interval in seconds.
# Update the monitor once every ten seconds
python -m reseval.monitor --interval 10
Results
# Get the results of a subjective evaluation.
# Results are stored in <directory>/<name>.
# <directory> defaults to the current directory.
python -m reseval.results <name> --directory <directory>
Pay
# Pay participants
python -m reseval.pay <name>
Destroy
# Destroy the compute resources of a subjective evaluation (e.g., any cloud
# storage, databases, or servers)
python -m reseval.destroy <name>
# Destroy a subjective evaluation even if it is still active.
# Participants who have taken the test so far will be paid.
python -m reseval.destroy <name> --force
Extend
# Add <participants> additional participants to a finished evaluation
python -m reseval.extend <name> <participants>
Application programming interface
Documentation for our API can be found here.
Advanced usage
Once you feel comfortable with using ReSEval step-by-step from the
command-line and after you have added your credentials with
reseval.credentials
, you can use the CLI or API to run your evaluation with
only a single command.
CLI
# Local development
python -m reseval <config> <directory> --local
# Remote development
python -m reseval <config> <directory>
# Production
python -m reseval <config> <directory> --production
API
import reseval
# Local development
reseval.run(config, directory, local=True)
# Remote development
reseval.run(config, directory)
# Production
reseval.run(config, directory, production=True)
Additional monitoring
AWS S3
To monitor, edit, or delete AWS S3 storage buckets, or see storage costs, use the AWS S3 console.
AWS Elastic Beanstalk
To monitor, edit, or delete the server compute, use the AWS Elastic Beanstalk console.
AWS Relational Database Service
To monitor, edit, or delete the database, use the AWS RDS console.
MTurk
HITs not created on the MTurk dashboard are not visible on the MTurk dashboard. You can use the MTurk CLI to monitor, edit, or delete HITs. MTurk costs appear on the AWS billing dashboard at the end of the billing period.
Heroku
To monitor, edit, or delete Heroku databases and servers, use the Heroku application dashboard. You can see any costs on the billing dashboard.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file reseval-0.1.6.tar.gz
.
File metadata
- Download URL: reseval-0.1.6.tar.gz
- Upload date:
- Size: 3.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a09c06d9ac9738e818bc7813ccc509fa4a1965dcd52b8189e83db69ed2eea2d6 |
|
MD5 | bc453241ad6fcb1966b60f749205e01e |
|
BLAKE2b-256 | a5164bfab6771f6c6fce6f710cc881a98e995efc6c6bb1a688bc369db31dcf17 |
File details
Details for the file reseval-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: reseval-0.1.6-py3-none-any.whl
- Upload date:
- Size: 3.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a798acaeecccdec974454e9bc48fa02d92ac665dc86c6e10b59908a93c96833 |
|
MD5 | 19b49490eaa75c42d085f2244da50cec |
|
BLAKE2b-256 | 87d5b3f2c34bf0940e4b9c13661d11e454d909d6b37af5e742ca77ab0e317b66 |