A library for evaluating language models.
Project description
Catwalk
Catwalk shows off models.
Catwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In practice, some combinations are not possible, but many are.
Here is the current list of tasks we have implemented. This list is not showing the `metaicl` and `p3` categories of tasks, because those are largely variants of the other tasks.
wikitext
piqa
squad
squadshifts-reddit
squadshifts-amazon
squadshifts-nyt
squadshifts-new-wiki
mrqa::race
mrqa::newsqa
mrqa::triviaqa
mrqa::searchqa
mrqa::hotpotqa
mrqa::naturalquestions
mrqa::bioasq
mrqa::drop
mrqa::relationextraction
mrqa::textbookqa
mrqa::duorc.paraphraserc
squad2
rte
superglue::rte
cola
mnli
mnli_mismatched
mrpc
qnli
qqp
sst
wnli
boolq
cb
copa
multirc
wic
wsc
drop
lambada
lambada_cloze
lambada_mt_en
lambada_mt_fr
lambada_mt_de
lambada_mt_it
lambada_mt_es
prost
mc_taco
pubmedqa
sciq
qa4mre_2011
qa4mre_2012
qa4mre_2013
triviaqa
arc_easy
arc_challenge
logiqa
hellaswag
openbookqa
race
headqa_es
headqa_en
mathqa
webqs
wsc273
winogrande
anli_r1
anli_r2
anli_r3
ethics_cm
ethics_deontology
ethics_justice
ethics_utilitarianism_original
ethics_utilitarianism
ethics_virtue
truthfulqa_gen
mutual
mutual_plus
math_algebra
math_counting_and_prob
math_geometry
math_intermediate_algebra
math_num_theory
math_prealgebra
math_precalc
math_asdiv
arithmetic_2da
arithmetic_2ds
arithmetic_3da
arithmetic_3ds
arithmetic_4da
arithmetic_4ds
arithmetic_5da
arithmetic_5ds
arithmetic_2dm
arithmetic_1dc
anagrams1
anagrams2
cycle_letters
random_insertion
reversed_words
raft::ade_corpus_v2
raft::banking_77
raft::neurips_impact_statement_risks
raft::one_stop_english
raft::overruling
raft::semiconductor_org_types
raft::systematic_review_inclusion
raft::tai_safety_research
raft::terms_of_service
raft::tweet_eval_hate
raft::twitter_complaints
Installation
Catwalk requires Python 3.9 or later.
Unfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to pypi.
Install from source:
git clone https://github.com/allenai/catwalk.git
cd catwalk
pip install -e .
Getting started
Let's run GPT2 on PIQA:
python -m catwalk --model rc::gpt2 --task piqa
This will load up GPT2 and use it to perform the PIQA task with the "ranked classification" approach.
You can specify multiple tasks at once:
python -m catwalk --model rc::gpt2 --task piqa arc_easy
It'll print you a nice table with all tasks and the metrics for each task:
arc_challenge acc 0.22440272569656372
arc_easy acc 0.3998316526412964
piqa acc 0.6256800889968872
Training / Finetuning
Catwalk can train models. It can train models on a single task, or on multiple tasks at once. To train, use this command line:
python -m catwalk.train --model rc::gpt2 --task piqa
You can train on multiple tasks at the same time, if you want to create a multi-task model:
python -m catwalk.train --model rc::gpt2 --task piqa arc_easy
Note that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in it.
Tango integration
Catwalk uses Tango for caching and executing evaluations. The command line interface internally constructs a Tango step graph and executes it. You can point the command line to a Tango workspace to cache results:
python -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/
The second time you run one of those tasks, it will be fast:
time python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/
arc_easy acc 0.39941078424453735
piqa acc 0.626224160194397
________________________________________________________
Executed in 9.82 secs fish external
usr time 6.51 secs 208.00 micros 6.51 secs
sys time 1.25 secs 807.00 micros 1.25 secs
Tango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is saved.
Team
ai2-catwalk is developed and maintained by the AllenNLP team, backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.
License
ai2-catwalk is licensed under Apache 2.0. A full copy of the license can be found on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ai2-catwalk-0.2.2.tar.gz
.
File metadata
- Download URL: ai2-catwalk-0.2.2.tar.gz
- Upload date:
- Size: 458.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20f58606f2d68bf8c1007e51857dc88755590f64b2df41c2c3c3e9ce39baba92 |
|
MD5 | 515799e9af5260d28b79fcfe7045ef59 |
|
BLAKE2b-256 | c3055ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1 |
File details
Details for the file ai2_catwalk-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: ai2_catwalk-0.2.2-py3-none-any.whl
- Upload date:
- Size: 790.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92fff9b89cb0bcc6eda841ad1185a2aacf7cdf1a4cda83b4002dfeedbf04373f |
|
MD5 | 6d508cceaa9f0e2b59210ca0098b1bb6 |
|
BLAKE2b-256 | 0f022a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717 |