Interpretable Evaluation for Natural Language Processing

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

F1: Single-system Analysis: What is a system good or bad at?
F2: Pairwise Analysis: Where is one system better (worse) than another?
F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
F5: Common errors: What are common mistakes that top-5 systems made?
F6: Fine-grained errors: where will errors occur?
F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task	Sub-task	Dataset	Model	Attribute
	Sentiment	8	40	2
Text Classification	Topics	4	18	2
	Intention	1	3	2
Text-Span Classification	Aspect Sentiment	4	20	4
Text pair Classification	NLI	2	6	7
	NER	3	74	9
Sequence Labeling	POS	3	14	4
	Chunking	3	14	9
	CWS	7	64	7
Structure Prediction	Semantic Parsing	4	12	4
Text Generation	Summarization	2	36	7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .
meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.
- calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
```
    cd ./meta-eval/
    ./run-allTasks.sh
```
- merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
```
    cd ./meta-eval/genCSV/json2csv.py
    python json2csv.py > explainabord.csv
```
src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.7

Jul 10, 2021

0.1.6

Jul 10, 2021

0.1.5

Jun 2, 2021

0.1.4

Jun 2, 2021

This version

0.1.3

Jun 2, 2021

0.1.2

Jun 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpret_eval-0.1.3.tar.gz (46.4 kB view hashes)

Uploaded Jun 2, 2021 Source

Built Distribution

interpret_eval-0.1.3-py2.py3-none-any.whl (64.6 kB view hashes)

Uploaded Jun 2, 2021 Python 2 Python 3

Hashes for interpret_eval-0.1.3.tar.gz

Hashes for interpret_eval-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`6de8ae3bb2386dc3b09f81ef5be7a97e7e9801e13ece583d9d8251362f049e8e`
MD5	`96dd6d08ffb31adfddbc2e0693267d92`
BLAKE2b-256	`8240cf5831a3fc1dd666f199286bb5ce4db797bb2a6088bdb1263f7183449101`

Hashes for interpret_eval-0.1.3-py2.py3-none-any.whl

Hashes for interpret_eval-0.1.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e8919ae425e31c6b249a70fe4b597d780258cda32206f231a019b3040d889579`
MD5	`073f30d9374103f9ed5cb4db5a53fd78`
BLAKE2b-256	`8774dfcda2d43fe76af5713e52dca6553f9dc22f4fd05d6cb2d586faa5b2cebf`