Interpretable Evaluation for Natural Language Processing

These details have not been verified by PyPI

Project links

Homepage

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

F1: Single-system Analysis: What is a system good or bad at?
F2: Pairwise Analysis: Where is one system better (worse) than another?
F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
F5: Common errors: What are common mistakes that top-5 systems made?
F6: Fine-grained errors: where will errors occur?
F7: System Combination: Is there potential complementarity between different systems?

Usage

We not only provide a Web-based Interactive Toolkit but also release an API that users can flexible evaluate their systems offline, which means, you can play with ExplainaBoard at following levels:

U1: Just playing with it: You can walk around, track NLP progress, understand relative merits of different top-performing systems.
U2: We help you analyze your model: You submit your model outputs and deploy them into online ExplainaBoard
U3: Do it by yourself: You can process your model outputs by yourself using our API.

API-based Toolkit: Quick Installation

Method 1: Simple installation from PyPI (Python 3 only)

pip install interpret-eval

Method 2: Install from the source and develop locally (Python 3 only)

# Clone current repo
git clone https://github.com/neulab/ExplainaBoard.git
cd ExplainaBoard

# Requirements
pip install -r requirements.txt

# Install the package
python setup.py install

Then, you can run following examples via bash

  interpret-eval --task chunk --systems ./interpret_eval/example/test-conll00.tsv --output out.json

where test-conll00.tsv denotes your system output file whose format depends on different tasks. For each task we have provided one example output file to show how they are formated. The above command will generate a detailed report (saved in out.json) for your input system (test-conll00.tsv). Specifically, following statistics are included:

fine-grained performance
Confidence interval
Error Case

Web-based Toolkit: Quick Learning

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

So far, ExplainaBoard covers following tasks

Task	Sub-task	Dataset	Model	Attribute
	Sentiment	8	40	2
Text Classification	Topics	4	18	2
	Intention	1	3	2
Text-Span Classification	Aspect Sentiment	4	20	4
Text pair Classification	NLI	2	6	7
	NER	3	74	9
Sequence Labeling	POS	3	14	4
	Chunking	3	14	9
	CWS	7	64	7
Structure Prediction	Semantic Parsing	4	12	4
Text Generation	Summarization	2	36	7

Submit Your Results

You can submit your system's output by this form following the format description.

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.7

Jul 10, 2021

0.1.6

Jul 10, 2021

0.1.5

Jun 2, 2021

0.1.4

Jun 2, 2021

0.1.3

Jun 2, 2021

0.1.2

Jun 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpret_eval-0.1.7.tar.gz (317.7 kB view details)

Uploaded Jul 10, 2021 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

interpret_eval-0.1.7-py3.7.egg (184.4 kB view details)

Uploaded Jul 10, 2021 Egg

interpret_eval-0.1.7-py2.py3-none-any.whl (88.6 kB view details)

Uploaded Jul 10, 2021 Python 2Python 3

File details

Details for the file interpret_eval-0.1.7.tar.gz.

File metadata

Download URL: interpret_eval-0.1.7.tar.gz
Upload date: Jul 10, 2021
Size: 317.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`8c49cea6612f4bd689a188cab066f54d2ca58c73cf21136c528b6047c85d9280`
MD5	`e4a599f593ee04a5e0bf72af6ec51e58`
BLAKE2b-256	`7778f9df5499c0c1f355818ac73454ddc99da85ecf83bc1d7f530cc73434dce8`

See more details on using hashes here.

File details

Details for the file interpret_eval-0.1.7-py3.7.egg.

File metadata

Download URL: interpret_eval-0.1.7-py3.7.egg
Upload date: Jul 10, 2021
Size: 184.4 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7-py3.7.egg
Algorithm	Hash digest
SHA256	`12fa65f75433b09f3fef7cde5b223957c0426f18702f489982b5c2d3de4dc340`
MD5	`4acfa49e3c416e3097a4f6e8bf4415e7`
BLAKE2b-256	`91f19845ecbd6fe28c69e7d5d1f8089afbe8098fa71f8891afca61b61a93d846`

See more details on using hashes here.

File details

Details for the file interpret_eval-0.1.7-py2.py3-none-any.whl.

File metadata

Download URL: interpret_eval-0.1.7-py2.py3-none-any.whl
Upload date: Jul 10, 2021
Size: 88.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.7.0 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.6

File hashes

Hashes for interpret_eval-0.1.7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6d5145aed8267098b6e76127cbf65a1bb050fb7f6f5ed079eda59e09a26c211`
MD5	`dd9e167392dfd91e7c0e6f7116cb48e1`
BLAKE2b-256	`e3ad58af0f30af66aa42237272408e95a8c02d1ab79d624c9fbb82a1dc13118d`

See more details on using hashes here.

interpret-eval 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

Usage

API-based Toolkit: Quick Installation

Method 1: Simple installation from PyPI (Python 3 only)

Method 2: Install from the source and develop locally (Python 3 only)

Then, you can run following examples via bash

Web-based Toolkit: Quick Learning

So far, ExplainaBoard covers following tasks

Submit Your Results

Download System Outputs

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes