Skip to main content

A library for evaluating language models.

Project description

Catwalk

Catwalk shows off models.

Catwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In practice, some combinations are not possible, but many are.

Here is the current list of tasks we have implemented. This list is not showing the `metaicl` and `p3` categories of tasks, because those are largely variants of the other tasks.
wikitext
piqa
squad
squadshifts-reddit
squadshifts-amazon
squadshifts-nyt
squadshifts-new-wiki
mrqa::race
mrqa::newsqa
mrqa::triviaqa
mrqa::searchqa
mrqa::hotpotqa
mrqa::naturalquestions
mrqa::bioasq
mrqa::drop
mrqa::relationextraction
mrqa::textbookqa
mrqa::duorc.paraphraserc
squad2
rte
superglue::rte
cola
mnli
mnli_mismatched
mrpc
qnli
qqp
sst
wnli
boolq
cb
copa
multirc
wic
wsc
drop
lambada
lambada_cloze
lambada_mt_en
lambada_mt_fr
lambada_mt_de
lambada_mt_it
lambada_mt_es
prost
mc_taco
pubmedqa
sciq
qa4mre_2011
qa4mre_2012
qa4mre_2013
triviaqa
arc_easy
arc_challenge
logiqa
hellaswag
openbookqa
race
headqa_es
headqa_en
mathqa
webqs
wsc273
winogrande
anli_r1
anli_r2
anli_r3
ethics_cm
ethics_deontology
ethics_justice
ethics_utilitarianism_original
ethics_utilitarianism
ethics_virtue
truthfulqa_gen
mutual
mutual_plus
math_algebra
math_counting_and_prob
math_geometry
math_intermediate_algebra
math_num_theory
math_prealgebra
math_precalc
math_asdiv
arithmetic_2da
arithmetic_2ds
arithmetic_3da
arithmetic_3ds
arithmetic_4da
arithmetic_4ds
arithmetic_5da
arithmetic_5ds
arithmetic_2dm
arithmetic_1dc
anagrams1
anagrams2
cycle_letters
random_insertion
reversed_words
raft::ade_corpus_v2
raft::banking_77
raft::neurips_impact_statement_risks
raft::one_stop_english
raft::overruling
raft::semiconductor_org_types
raft::systematic_review_inclusion
raft::tai_safety_research
raft::terms_of_service
raft::tweet_eval_hate
raft::twitter_complaints

Installation

Catwalk requires Python 3.9 or later.

Unfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to pypi.

Install from source:

git clone https://github.com/allenai/catwalk.git
cd catwalk
pip install -e .

Getting started

Let's run GPT2 on PIQA:

python -m catwalk --model rc::gpt2 --task piqa

This will load up GPT2 and use it to perform the PIQA task with the "ranked classification" approach.

You can specify multiple tasks at once:

python -m catwalk --model rc::gpt2 --task piqa arc_easy

It'll print you a nice table with all tasks and the metrics for each task:

arc_challenge   acc     0.22440272569656372
arc_easy        acc     0.3998316526412964
piqa    acc     0.6256800889968872

Training / Finetuning

Catwalk can train models. It can train models on a single task, or on multiple tasks at once. To train, use this command line:

python -m catwalk.train --model rc::gpt2 --task piqa

You can train on multiple tasks at the same time, if you want to create a multi-task model:

python -m catwalk.train --model rc::gpt2 --task piqa arc_easy

Note that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in it.

Tango integration

Catwalk uses Tango for caching and executing evaluations. The command line interface internally constructs a Tango step graph and executes it. You can point the command line to a Tango workspace to cache results:

python -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/

The second time you run one of those tasks, it will be fast:

time python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/
arc_easy	acc	0.39941078424453735
piqa	acc	0.626224160194397

________________________________________________________
Executed in    9.82 secs    fish           external
   usr time    6.51 secs  208.00 micros    6.51 secs
   sys time    1.25 secs  807.00 micros    1.25 secs

Tango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is saved.

Team

ai2-catwalk is developed and maintained by the AllenNLP team, backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.

License

ai2-catwalk is licensed under Apache 2.0. A full copy of the license can be found on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai2-catwalk-0.2.2.tar.gz (458.3 kB view details)

Uploaded Source

Built Distribution

ai2_catwalk-0.2.2-py3-none-any.whl (790.2 kB view details)

Uploaded Python 3

File details

Details for the file ai2-catwalk-0.2.2.tar.gz.

File metadata

  • Download URL: ai2-catwalk-0.2.2.tar.gz
  • Upload date:
  • Size: 458.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ai2-catwalk-0.2.2.tar.gz
Algorithm Hash digest
SHA256 20f58606f2d68bf8c1007e51857dc88755590f64b2df41c2c3c3e9ce39baba92
MD5 515799e9af5260d28b79fcfe7045ef59
BLAKE2b-256 c3055ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1

See more details on using hashes here.

File details

Details for the file ai2_catwalk-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: ai2_catwalk-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 790.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ai2_catwalk-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 92fff9b89cb0bcc6eda841ad1185a2aacf7cdf1a4cda83b4002dfeedbf04373f
MD5 6d508cceaa9f0e2b59210ca0098b1bb6
BLAKE2b-256 0f022a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page