Aspect extraction based on word embeddings
Project description
cat-aspect-extraction🐈
Easy to use library for implement Contrastive Attention Topic Modeling describe in Embarrassingly Simple Unsupervised Aspect Extraction
Read the paper & the original repository for details about the algorithm !
- PAPER : https://aclanthology.org/2020.acl-main.290/
- REPOSITORY : https://github.com/clips/cat/
Installation
pip install cat-aspect-extraction
or
git clone
python -m pip install .
Example
from cat-aspect-extraction import CAt, RBFAttention # for using the model
from reach import Reach # for loading word embeddings
# Load in-domain word embeddings and create a CAt instance
r = Reach.load("path/to/embeddings", unk_word="UNK")
cat = CAt(r)
# Initialize candidate aspects
candidates = [
"food",
"service",
"ambiance",
"price",
"location",
"experience"
]
for aspect in candidates:
cat.add_candidate(aspect)
# Add topics
cat.add_topic("food", ["taste", "flavor", "quality", "portion", "menu", "dish", "cuisine", "ingredient"])
cat.add_topic("service", ["staff", "waiter", "waitress", "service", "server", "host", "manager", "bartender"])
cat.add_topic("ambiance", ["atmosphere", "decor", "interior", "design", "lighting", "music", "noise", "vibe"])
# Compute topic score
sentence = "The food was great !".split() # tokenize your sentence
cat.get_scores(sentence, attention=RBFAttention())
>>> [('food', 1), ('service', 0.5), ('ambiance', 0.0)]
Citations
I'm not the author of the original paper, so if you use this library, please cite the original paper :
@inproceedings{tulkens2020embarrassingly,
title = "Embarrassingly Simple Unsupervised Aspect Extraction",
author = "Tulkens, St{\'e}phan and van Cranenburgh, Andreas",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.290",
doi = "10.18653/v1/2020.acl-main.290",
pages = "3182--3187",
}
License
GNU General Public License v3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cat-aspect-extraction-1.tar.gz
(16.9 kB
view hashes)
Built Distribution
Close
Hashes for cat_aspect_extraction-1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a18c1b9c2e357eac343ffed4db40e6f1f486efe082841347a0662b885a01ecd |
|
MD5 | 78af5a2bfa52239e464dbc6ed6afd4e1 |
|
BLAKE2b-256 | 2faaa773fd4e1136c72d333c89927bac3f1e6ff2ecf7a151fb1538e5b29b686b |