Aspect extraction based on word embeddings
Project description
cat-aspect-extraction🐈
Easy to use library for implement Contrastive Attention Topic Modeling describe in Embarrassingly Simple Unsupervised Aspect Extraction
Read the paper & the original repository for details about the algorithm !
- PAPER : https://aclanthology.org/2020.acl-main.290/
- REPOSITORY : https://github.com/clips/cat/
Installation
pip install cat-aspect-extraction
or
git clone
python -m pip install .
Example
from cat-aspect-extraction import CAt, RBFAttention # for using the model
from reach import Reach # for loading word embeddings
# Load in-domain word embeddings and create a CAt instance
r = Reach.load("path/to/embeddings", unk_word="UNK")
cat = CAt(r)
# Initialize candidate aspects
candidates = [
"food",
"service",
"ambiance",
"price",
"location",
"experience"
]
for aspect in candidates:
cat.add_candidate(aspect)
# Add topics
cat.add_topic("food", ["taste", "flavor", "quality", "portion", "menu", "dish", "cuisine", "ingredient"])
cat.add_topic("service", ["staff", "waiter", "waitress", "service", "server", "host", "manager", "bartender"])
cat.add_topic("ambiance", ["atmosphere", "decor", "interior", "design", "lighting", "music", "noise", "vibe"])
# Compute topic score
sentence = "The food was great !".split() # tokenize your sentence
cat.get_scores(sentence, attention=RBFAttention())
>>> [('food', 1), ('service', 0.5), ('ambiance', 0.0)]
Citations
I'm not the author of the original paper, so if you use this library, please cite the original paper :
@inproceedings{tulkens2020embarrassingly,
title = "Embarrassingly Simple Unsupervised Aspect Extraction",
author = "Tulkens, St{\'e}phan and van Cranenburgh, Andreas",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.290",
doi = "10.18653/v1/2020.acl-main.290",
pages = "3182--3187",
}
License
GNU General Public License v3.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cat-aspect-extraction-1.tar.gz.
File metadata
- Download URL: cat-aspect-extraction-1.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b531287a0f643f10eb0dbb4851447be9ec8ff5663ab32cb66622a36c4e65436
|
|
| MD5 |
c2ea19de83f60c7d24a4c3ebf1c38bfb
|
|
| BLAKE2b-256 |
cd82f7debdbacf627239aaa713682fe70ed1931160790f74f534c2242f9f759d
|
File details
Details for the file cat_aspect_extraction-1-py3-none-any.whl.
File metadata
- Download URL: cat_aspect_extraction-1-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a18c1b9c2e357eac343ffed4db40e6f1f486efe082841347a0662b885a01ecd
|
|
| MD5 |
78af5a2bfa52239e464dbc6ed6afd4e1
|
|
| BLAKE2b-256 |
2faaa773fd4e1136c72d333c89927bac3f1e6ff2ecf7a151fb1538e5b29b686b
|