Skip to main content

Ask2Transformers is a library for zero-shot classification based on Transformers.

Project description

Ask2Transformers - Zero Shot Topic Classification with Pretrained Transformers

Work in progress.

This library contains the code for the Ask2Transformers project.

Topic classification just with non task specific pretrained models

>>> from a2t.topic_classification import NLITopicClassifier
>>> topics = ['politics', 'culture', 'economy', 'biology', 'legal', 'medicine', 'business']
>>> context = "hospital: a health facility where patients receive treatment."

>>> clf = NLITopicClassifier('roberta-large-mnli', topics)

>>> predictions = clf(context)[0]
>>> print(sorted(list(zip(predictions, topics)), reverse=True))

[(0.77885467, 'medicine'),
 (0.08395168, 'biology'),
 (0.040319894, 'business'),
 (0.027866213, 'economy'),
 (0.02357693, 'politics'),
 (0.023382403, 'legal'),
 (0.02204825, 'culture')]

WordNet Dataset (BabelNet Domains)

  • 1540 annotated glosses
  • 34 domains (classes)

Results (Micro-average):

Method Precision Recall F1-Score
Distributional (Camacho-Collados et al. 2016) 84.0 59.8 69.9
BabelDomains (Camacho-Collados et al. 2017) 81.7 68.7 74.6
Ask2Transformers 92.14 92.14 92.14

Approach evaluation

Next table shows the weighted averaged Precision, Recall and F1-Score along with Top-1, Top-3 and Top-5 Accuracy of each of the implemented approaches.

Method Precision Recall F1-Score Top-1 Top-3 Top-5
MNLI (roberta-large-mnli) 91.6 78.44 82.4 78.44 87.46 89.74
MNLI (bart-large-mnli) 85.63 61.81 66.38 61.81 79.85 87.59
NSP (bert-large-uncased) 49.78 2.07 2.83 2.07 8.57 16.49
NSP (bert-base-uncased) 18.59 2.85 1.84 2.85 10.32 16.88
MLM (roberta-large) 71.21 12.92 16.24 12.91 30.9 45.84
MLM (roberta-base) 67.74 23.7 32.35 23.7 46.23 62.53

Top-K Accuracy curve

MNLI Query phrase exploration

Next table shows the weighted averaged Precision, Recall and F1-Score along with Top-1, Top-3 and Top-5 Accuracy of the MNLI (roberta-large-mnli) system with different query phrases.

Query Phrase Precision Recall F1-Score Top-1 Top-3 Top-5
"Topic: " 89.36 59.61 66.88 59.61 69.48 74.02
"Domain: " 89.62 58.50 65.98 58.50 67.40 72.27
"Theme: " 90.28 59.67 67.08 59.67 73.96 81.36
"Subject: " 89.83 60.58 67.65 60.58 69.74 74.35
"Is about " 91.54 73.37 79.15 73.37 87.72 91.94
"Topic or domain about " 91.6 78.44 82.4 78.44 87.46 89.74
"The topic of the sentence is about " 92.02 80.71 84.79 80.71 92.92 95.77
"The domain of the sentence is about " 92.20 81.62 85.44 81.62 93.96 96.42
"The topic or domain of the sentence is about " 91.91 76.62 82.02 76.62 88.63 91.23

Label mapping

Sometimes the defined labels are very general or very precise. For instance, the label "Art, architecture, and archaeology" is a composed label formed by "Art", "Architecture" and "Archeology" topics. That composition can derive in an unappropiate performance of the system when the "Art" topic appears but not the "Architecture" or "Archeology". That's why we decided to define a better set of labels for the system and map them to our actual ones. The way that we generate the new label set is the following: given a composed label, generate the new labels based on the topics that forms the label. Following that strategy and running the system again, we get the following performance:

Precision Recall F1-Score Top-1 Top-3 Top-5
Without mapping 92.20 81.62 85.44 81.62 93.96 96.42
Splitted labels 96.51 92.14 93.88 92.14 98.18 99.02

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2t-0.1.tar.gz (8.5 kB view hashes)

Uploaded Source

Built Distribution

a2t-0.1-py3-none-any.whl (14.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page