Skip to main content

Interpretable Evaluation for Natural Language Processing

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Web Tool | API Tool | Download | Paper | Video | Bib



License GitHub stars PyPI Code Style

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Usage

We not only provide a Web-based Interactive Toolkit but also release an API that users can flexible evaluate their systems offline, which means, you can play with ExplainaBoard at following levels:

  • U1: Just playing with it: You can walk around, track NLP progress, understand relative merits of different top-performing systems.
  • U2: We help you analyze your model: You submit your model outputs and deploy them into online ExplainaBoard
  • U3: Do it by yourself: You can process your model outputs by yourself using our API.

API-based Toolkit: Quick Installation

Method 1: Simple installation from PyPI (Python 3 only)

pip install interpret-eval

Method 2: Install from the source and develop locally (Python 3 only)

# Clone current repo
git clone https://github.com/neulab/ExplainaBoard.git
cd ExplainaBoard

# Requirements
pip install -r requirements.txt

# Install the package
python setup.py install

Then, you can run following examples via bash

  interpret-eval --task chunk --systems ./interpret_eval/example/test-conll00.tsv --output out.json

where test-conll00.tsv denotes your system output file whose format depends on different tasks. For each task we have provided one example output file to show how they are formated. The above command will generate a detailed report (saved in out.json) for your input system (test-conll00.tsv). Specifically, following statistics are included:

  • fine-grained performance
  • Confidence interval
  • Error Case

Web-based Toolkit: Quick Learning

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

So far, ExplainaBoard covers following tasks

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Submit Your Results

You can submit your system's output by this form following the format description.

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpret_eval-0.1.7.tar.gz (317.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

interpret_eval-0.1.7-py3.7.egg (184.4 kB view details)

Uploaded Egg

interpret_eval-0.1.7-py2.py3-none-any.whl (88.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file interpret_eval-0.1.7.tar.gz.

File metadata

  • Download URL: interpret_eval-0.1.7.tar.gz
  • Upload date:
  • Size: 317.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7.tar.gz
Algorithm Hash digest
SHA256 8c49cea6612f4bd689a188cab066f54d2ca58c73cf21136c528b6047c85d9280
MD5 e4a599f593ee04a5e0bf72af6ec51e58
BLAKE2b-256 7778f9df5499c0c1f355818ac73454ddc99da85ecf83bc1d7f530cc73434dce8

See more details on using hashes here.

File details

Details for the file interpret_eval-0.1.7-py3.7.egg.

File metadata

  • Download URL: interpret_eval-0.1.7-py3.7.egg
  • Upload date:
  • Size: 184.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7-py3.7.egg
Algorithm Hash digest
SHA256 12fa65f75433b09f3fef7cde5b223957c0426f18702f489982b5c2d3de4dc340
MD5 4acfa49e3c416e3097a4f6e8bf4415e7
BLAKE2b-256 91f19845ecbd6fe28c69e7d5d1f8089afbe8098fa71f8891afca61b61a93d846

See more details on using hashes here.

File details

Details for the file interpret_eval-0.1.7-py2.py3-none-any.whl.

File metadata

  • Download URL: interpret_eval-0.1.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d6d5145aed8267098b6e76127cbf65a1bb050fb7f6f5ed079eda59e09a26c211
MD5 dd9e167392dfd91e7c0e6f7116cb48e1
BLAKE2b-256 e3ad58af0f30af66aa42237272408e95a8c02d1ab79d624c9fbb82a1dc13118d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page