Automatic SOTA (state-of-the-art) extraction.

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Natural Language
- English
Operating System
- POSIX
Programming Language
- Python :: 3

Project description

Automatic SOTA (state-of-the-art) extraction

Aggregate public SOTA tables that are shared under a free licences.

Download the scrapped data or run the scrappers yourself the get the latest data.

In the future, we are planning to automate the process of extracting tasks, datasets and results from papers.

Getting the data

The data is kept in the data directory. All data is shared under the CC-BY-SA-4 licence.

The data has been parsed into a consistent JSON format, described below.

JSON format description

The format consists of five primary data types: Task, Dataset, Sota, SotaRow and Link.

A valid JSON file is a list of Task objects. You can see examples in the data/tasks folder.

`Task`

A Task consists of the following fields:

task - name of the task (string)
description - short description of the task, in markdown (string)
subtasks - a list of zero or more Task objects that are children to this task (list)
datasets - a list of zero or more Dataset objects on which the tasks are evaluated (list)
source_link - an optional Link object to the original source of the task

`Dataset`

A Dataset consists of the following fields:

dataset - name of the dataset (string)
description - a short description in markdown (string)
subdatasets - zero or more children Dataset objects (e.g. dataset subsets or dataset partitions) (list)
dataset_links - zero or more Link objects, representing the links to the dataset download page or any other relevant external pages (list)
dataset_citations" - zero or more Link objects, representing the papers that are the primary citations for the dataset.
sota - the Sota object representing the state-of-the-art table on this dataset.

`Link`

A Link object describes a URL, and has these two fields:

title - title of the link, i.e. anchor text (string)
url - target URL (string)

`Sota`

A Sota object represents one state-of-the-art table, with these fields:

metrics - a list of metric names used to evaluate the methods (list of strings)
rows a list of rows in the SOTA table, a list of SotaRow objects (list)

`SotaRow`

A SotaRow object represents one line of the SOTA table, it has these fields:

model_name - Name of the model evaluated (string)
paper_title - Primary paper's title (string)
paper_url - Primary paper's URL (string)
paper_date - Paper date of publishing, if available (string)
code_links - a list of zero or more Link objects, with links to relevant code implementations (list)
model_links - a list of zero or more Link objects, with links to relevant pretrained model files (list)
metrics - a dictionary of values, where the keys are string from the parent Sota.rows list, and the values are the measured performance. (dictionary)

Running the scrapers

Installation

Requires Python 3.6+.

pip install -r requirements.txt

NLP-progress

NLP-progress is a hand-annotated collection of SOTA results from NLP tasks.

The scraper is part of the NLP-progress project.

Licence: MIT

EFF

EFF has annotated a set of SOTA results on a small number of tasks, and produced this great report.

To convert the current content run:

python -m scrapers.eff

Licence: CC-BY-SA-4

SQuAD

The Stanford Question Answering Dataset is an active project for evaluating the question answering task using a hidden test set.

To scrape the current content run:

python -m scrapers.squad

Licence: CC-BY-SA-4

RedditSota

The RedditSota repository lists the best method for a variety of tasks across all of ML.

To scrape the current content run:

python -m scrapers.redditsota

Licence: Apache-2

SNLI

The The Stanford Natural Language Inference (SNLI) Corpus is an active project for Natural Language Inference.

To scrape the current content run:

python -m scrapers.snli

Licence: CC-BY-SA

Cityscapes

Cityscapes is a benchmark for semantic segmentation.

To scrape the current content run:

python -m scrapers.cityscapes

Evaluating the SOTA extraction performance

In the future, this repository will also contain the automatic SOTA extraction pipeline. The aim is to automatically extract tasks, datasets and results from papers.

To evaluate the current prediction performance for all tasks:

python -m extractor.eval_all

The most current report can be seen here: eval_all_report.csv.

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Natural Language
- English
Operating System
- POSIX
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.29

Jan 31, 2022

0.0.28

Jul 16, 2021

0.0.27

Jul 14, 2021

0.0.26

Jul 1, 2021

0.0.25

Jun 28, 2021

0.0.24

May 20, 2021

0.0.23

May 20, 2021

0.0.22

May 20, 2021

0.0.21

May 11, 2021

0.0.20

May 11, 2021

0.0.19

May 11, 2021

0.0.18

Mar 4, 2021

0.0.17

Feb 12, 2021

0.0.16

Feb 11, 2021

0.0.15

Feb 11, 2021

0.0.11

May 28, 2020

0.0.10

May 12, 2020

0.0.9

Sep 13, 2019

0.0.8

Aug 26, 2019

0.0.7

Aug 26, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sota-extractor-0.0.29.tar.gz (30.6 kB view details)

Uploaded Jan 31, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sota_extractor-0.0.29-py3-none-any.whl (46.0 kB view details)

Uploaded Jan 31, 2022 Python 3

File details

Details for the file sota-extractor-0.0.29.tar.gz.

File metadata

Download URL: sota-extractor-0.0.29.tar.gz
Upload date: Jan 31, 2022
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11

File hashes

Hashes for sota-extractor-0.0.29.tar.gz
Algorithm	Hash digest
SHA256	`2a4f0b51ed17a6cfad596123ebcc0ef2acf1642824a9a88d1f94f9f0e27af792`
MD5	`e389778de482c529ace3c7f0f7f8a0aa`
BLAKE2b-256	`0c19855f5d0fb8445289e5100c77bd7932d38cad4953c8a1f207be5ca5e30b89`

See more details on using hashes here.

File details

Details for the file sota_extractor-0.0.29-py3-none-any.whl.

File metadata

Download URL: sota_extractor-0.0.29-py3-none-any.whl
Upload date: Jan 31, 2022
Size: 46.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.11

File hashes

Hashes for sota_extractor-0.0.29-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e13779cda14fb61fdf82bff80b364859f8e588c92c2678fa041311e35c80899b`
MD5	`7c39e9d6b1d530fa036f9aaad809d059`
BLAKE2b-256	`6ab4a4b4f8348a6672fef6327a7b314a3febcc89b620fbc3e359ec629445a0f3`

See more details on using hashes here.

sota-extractor 0.0.29

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Automatic SOTA (state-of-the-art) extraction

Getting the data

JSON format description

Task

Dataset

Link

Sota

SotaRow

Running the scrapers

Installation

NLP-progress

EFF

SQuAD

RedditSota

SNLI

Cityscapes

Evaluating the SOTA extraction performance

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Task`

`Dataset`

`Link`

`Sota`

`SotaRow`