Skip to main content

Explainable Leaderboards for Natural Language Processing

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Web Tool | API Tool | Download | Paper | Video | Bib



License GitHub stars PyPI Code Style

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Usage

We not only provide a Web-based Interactive Toolkit but also release an API that users can flexible evaluate their systems offline, which means, you can play with ExplainaBoard at following levels:

  • U1: Just playing with it: You can walk around, track NLP progress, understand relative merits of different top-performing systems.
  • U2: We help you analyze your model: You submit your model outputs and deploy them into online ExplainaBoard
  • U3: Do it by yourself: You can process your model outputs by yourself using our API.

API-based Toolkit: Quick Installation

Method 1: Simple installation from PyPI (Python 3 only)

pip install 

Method 2: Install from the source and develop locally (Python 3 only)

# Clone current repo
git clone https://github.com/neulab/ExplainaBoard.git
cd ExplainaBoard

# Requirements
pip install -r requirements.txt

# Install the package
python setup.py install

Then, you can run following examples via bash

   explainaboard --task chunk --systems ./explainaboard/example/test-conll00.tsv --output out.json

where test-conll00.tsv denotes your system output file whose format depends on different tasks. For each task we have provided one example output file to show how they are formated. The above command will generate a detailed report (saved in out.json) for your input system (test-conll00.tsv). Specifically, following statistics are included:

  • fine-grained performance
  • Confidence interval
  • Error Case

Web-based Toolkit: Quick Learning

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

So far, ExplainaBoard covers following tasks

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Submit Your Results

You can submit your system's output by this form following the format description.

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Currently Covered Systems

So far, ExplainaBoard support more than 10 NLP tasks, including sequence classification, labeling, extraction and generation. Click here to see more.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

explainaboard-0.1.7.tar.gz (283.4 kB view details)

Uploaded Source

Built Distributions

explainaboard-0.1.7-py3.7.egg (102.3 kB view details)

Uploaded Source

explainaboard-0.1.7-py2.py3-none-any.whl (52.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file explainaboard-0.1.7.tar.gz.

File metadata

  • Download URL: explainaboard-0.1.7.tar.gz
  • Upload date:
  • Size: 283.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for explainaboard-0.1.7.tar.gz
Algorithm Hash digest
SHA256 04db02b81c07430b258093b5175b54320c7046324ba7bc2c35152d3e88cbe4d9
MD5 8b646838a966304a9035f1fb9c8380d8
BLAKE2b-256 aee4ac6177abe00e8d4a63383472a024ef6e03766f561b093088ff96345bf476

See more details on using hashes here.

File details

Details for the file explainaboard-0.1.7-py3.7.egg.

File metadata

  • Download URL: explainaboard-0.1.7-py3.7.egg
  • Upload date:
  • Size: 102.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for explainaboard-0.1.7-py3.7.egg
Algorithm Hash digest
SHA256 07f3701c96387886f03f4b77401922b7b74f65a08b7e573850e0463551369250
MD5 015eaacd35a1b3764f77db9d88d58b69
BLAKE2b-256 7bcc84eb79bb0c15e3b72882ebe0454edd3a4a19fc0c02706711086dcdd4fdea

See more details on using hashes here.

File details

Details for the file explainaboard-0.1.7-py2.py3-none-any.whl.

File metadata

  • Download URL: explainaboard-0.1.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 52.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for explainaboard-0.1.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2a51f3b9374e59a29bd35b6524e39da390906cc4db897fa18fcf346058692f2e
MD5 7766890f9789bd746ff6836d8577ac64
BLAKE2b-256 1b16a5a7bd5d4106fcd12ed95941444cc49dac2439469819ee2df4ee934dec24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page