Aspect extraction based on word embeddings
Project description
cat-aspect-extraction🐈
Easy to use library for implement Contrastive Attention Topic Modeling describe in Embarrassingly Simple Unsupervised Aspect Extraction
Read the paper & the original repository for details about the algorithm !
- PAPER : https://aclanthology.org/2020.acl-main.290/
- REPOSITORY : https://github.com/clips/cat/
Installation
pip install cat-aspect-extraction
or
git clone
python -m pip install .
Example
from cat_aspect_extraction import CAt, RBFAttention # for using the model
from reach import Reach # for loading word embeddings
# Load in-domain word embeddings and create a CAt instance
r = Reach.load("path/to/embeddings", unk_word="UNK")
cat = CAt(r)
# Initialize candidate aspects
candidates = [
"food",
"service",
"ambiance",
"price",
"location",
"experience"
]
for aspect in candidates:
cat.add_candidate(aspect)
# Add topics
cat.add_topic("food", ["taste", "flavor", "quality", "portion", "menu", "dish", "cuisine", "ingredient"])
cat.add_topic("service", ["staff", "waiter", "waitress", "service", "server", "host", "manager", "bartender"])
cat.add_topic("ambiance", ["atmosphere", "decor", "interior", "design", "lighting", "music", "noise", "vibe"])
# Compute topic score
sentence = "The food was great !".split() # tokenize your sentence
cat.get_scores(sentence, attention=RBFAttention())
>>> [('food', 1), ('service', 0.5), ('ambiance', 0.0)]
Citations
I'm not the author of the original paper, so if you use this library, please cite the original paper :
@inproceedings{tulkens2020embarrassingly,
title = "Embarrassingly Simple Unsupervised Aspect Extraction",
author = "Tulkens, St{\'e}phan and van Cranenburgh, Andreas",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.290",
doi = "10.18653/v1/2020.acl-main.290",
pages = "3182--3187",
}
License
GNU General Public License v3.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for cat_aspect_extraction-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eafd5729ec2950327945cfee0f0540eaf460062084fd80532c7fcdd45b5f0a5 |
|
MD5 | 58619ece8d1388acfd6836492f3c3612 |
|
BLAKE2b-256 | 4ad37b85eeedf09d1c150a473c18c658dd9f94acac4845e35586bbdb6855632e |