Skip to main content

Interpretable Evaluation for Natural Language Processing

Project description

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Website | Download | Backend | Paper | Video | Bib

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

  • task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .

  • meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.

    • calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
        cd ./meta-eval/
        ./run-allTasks.sh
    
    • merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
        cd ./meta-eval/genCSV/json2csv.py
        python json2csv.py > explainabord.csv
    
  • src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

interpret_eval-0.1.3.tar.gz (46.4 kB view hashes)

Uploaded Source

Built Distribution

interpret_eval-0.1.3-py2.py3-none-any.whl (64.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page