Skip to main content

Traditional Chinese sentiment analysis tool based on BERT.

Project description

senti_c (Traditional Chinese sentiment analysis tool based on BERT)

Introduction

senti_c is a sentiment analysis tool constructed based on the transformer-based Bidirectional Encoder Representations from Transformers (BERT). We adopted the bert-base-chinese pre-trained model provided by Huggingface transformer implementation.

senti_c provides two functions:

  1. Sentence-level sentiment classification.
  2. Aspect extraction and aspect-based sentiment analysis.

If you use senti_c, please cite our work:
Yu-Ting Tu, 2020, Developing Sentiment Analysis Toolkit for Traditional Chinese Using Sequential Transfer Learning, (Master Thesis), Retrieved from https://hdl.handle.net/11296/er7s7w.

There is a vignettee written in Chinese. You may also want to look at the Google Colab demo online.

Requirements

senti_c has been tested with Python 3.7 and 3.8. It requires transformers Version 2.11.0. Transformers, in terms, need Pytorch 1.x and tensorflow 2.2.0. You do not need GPU to use senti_c. However, using senti_c with GPU can significantly improve its speed.

Because of the specific versions of Python packages requires by senti_c, there is a big chance that these requirements may conflict with your existing setup. To address this issue, we strongly recommend you use Python Virtual Environment to provide a dedicate environemtn for senti_c.

Install senti_c

To install senti_c, we need to (1) Setup a Python Virtual Environment, and (2) Install senti_c in this virtual environment.

(1) Setup the Python Virtual Environment

You need to execute these tasks in a terminal. First switch to a working directory, say /service/redstar/senti_c by

cd /service/redstar/senti_c

To setup a virtual environment named vm4sentic, run the following command:

python3 -m venv vm4sentic

Next initiate the virtual environment:

source vm4sentic/bin/activate

(2) Install senti_c

Run this command to install senti_c:

pip3 install senti_c --no-binary=wrapt,termcolor,sacremoses

The parameter--no-binary=wrapt,termcolor,sacremoses asks pip3 to install wrapt,termcolor,sacremoses without bdist_wheel. Alternatively, you can just run pip3 install senti_c if you do not mind seeing some error messages.

Sentence-level Sentiment Classification

Below is an example to do sentence-level sentiment classification. Create a file name sent_pred.pywith the following content:

from senti_c import SentenceSentimentClassification

sentence_classifier = SentenceSentimentClassification(logging_level = "warning")
test_data = ["我很喜歡這家店!超級無敵棒!",
             "這個服務生很不親切...",
             "這間Fridays的空間不大,座位安排略顯擁擠,尤其是有隔板的兩人桌,真的超級小。",
             "唯一印象深刻的事... 蛤蜊好大顆,大蝦毛毛蟲好吃!"]
result = sentence_classifier.predict(test_data, run_split = True, aggregate_strategy = False)
print(result.iloc[:, 1:])

Run this script (via: python3 sent_pred.py). The results are:

                                   Sentences Preds
0                                   我很喜歡這家店!    正面
1                                     超級無敵棒!    正面
2                               這個服務生很不親切...    負面
3  這間Fridays的空間不大,座位安排略顯擁擠,尤其是有隔板的兩人桌,真的超級小。    負面
4                 唯一印象深刻的事... 蛤蜊好大顆,大蝦毛毛蟲好吃!    正面

Aspect-based Sentiment Analysis

Below is a sample script for aspect-based sentiment analysis. Create a script named aspect_pred.py with the following content:

from senti_c import AspectSentimentAnalysis
aspect_classifier = AspectSentimentAnalysis(logging_level = "warning")
test_data = ["""這間Fridays的空間不大,座位安排略顯擁擠,尤其是有隔板的兩人桌,真的超級小。服務人員態度很好,只是因為客人太多,感覺人手不足,要求東西常常要等好一陣子才來。如果希望有好一點的服務品質,建議避開週末用餐時段。""", 
             """每次經過都會被台灣炒飯給吸引,決定給它一個機會踏進去嚐鮮!有點失望,因為炒飯一般般,飯糰好難吃,冷氣超冷,串燒不推薦! 唯一印象深刻的事... 蛤蜊好大顆,大蝦毛毛蟲好吃! 整體環境不差,服務也可以,但餐點很一般"""]
result = aspect_classifier.predict(test_data, output_result = "all")

print("Extracted aspect terms and their polarity:")
for i, aterms in enumerate(result['AspectTermAndSentimentExtraction']):
    print(f"Sentence {i}: {aterms}")

print("\n ---\nLabels for individual tokens:")
nseg = len(result['InputWords'])
# result['AspectTermTags']
for seg in range(nseg):
    print(f"\n* Sentence {seg}:")
    a1 = result['InputWords'][seg]
    a2 = result['AspectTermAndSentimentTags'][seg]
    for x1, x2 in zip(a1, a2):
        print(f"{x1}({x2}) ", end = "")

print("")

Run this script (via: python3 aspect_pred.py)。The results are:

Extracted aspect terms and their polarity:
Sentence 0: [('空間', 'NEG'), ('座位安排', 'NEG'), ('服務人員態度', 'POS'), ('人', 'NEG'), ('服務品質', 'NEG')]
Sentence 1: [('炒飯', 'POS'), ('炒飯', 'NEG'), ('飯糰', 'NEG'), ('串燒', 'NEG'), ('蛤蜊', 'POS'), ('環境', 'POS'), ('服
務', 'POS'), ('餐點', 'NEG')]

 ---
Labels for individual tokens:

* Sentence 0:
這(O-O) 間(O-O) F(O-O) r(O-O) i(O-O) d(O-O) a(O-O) y(O-O) s(O-O) 的(O-O) 空(B-NEG) 間(I-NEG) 不(O-O) 大(O-O) ,(O-O) 座(B-NEG) 位(I-NEG) 安(I-NEG) 排(I-NEG) 略(O-O) 顯(O-O) 擁(O-O) 擠(O-O) ,(O-O) 尤(O-O) 其(O-O) 是(O-O) 有(O-O) 隔(O-O) 板(O-O) 的(O-O) 兩(O-O) 人(O-O) 桌(O-O) ,(O-O) 真(O-O) 的(O-O) 超(O-O) 級(O-O) 小(O-O) 。(O-O) 服(B-POS) 務(I-POS) 人(I-POS) 員(I-POS) 態(I-POS) 度(I-POS) 很(O-O) 好(O-O) ,(O-O) 只(O-O) 是(O-O) 因(O-O) 為(O-O) 客(O-O) 人(O-O) 太(O-O) 多(O-O) ,(O-O) 感(O-O) 覺(O-O) 人(B-NEG) 手(O-NEG) 不(O-O) 足(O-O) ,(O-O) 要(O-O) 求(O-O) 東(O-O) 西(O-O) 常(O-O) 常(O-O) 要(O-O) 等(O-O) 好(O-O) 一(O-O) 陣(O-O) 子(O-O) 才(O-O) 來(O-O) 。(O-O) 如(O-O) 果(O-O) 希(O-O) 望(O-O) 有(O-O) 好(O-O) 一(O-O) 點(O-O) 的(O-O) 服(B-NEG) 務(I-NEG) 品(I-NEG) 質(I-NEG) ,(O-O) 建(O-O) 議(O-O) 避(O-O) 開(O-O) 週(O-O) 末(O-O) 用(O-O) 餐(O-O) 時(O-O) 段(O-O) 。(O-O)
* Sentence 1:
每(O-O) 次(O-O) 經(O-O) 過(O-O) 都(O-O) 會(O-O) 被(O-O) 台(O-O) 灣(O-O) 炒(B-POS) 飯(I-POS) 給(O-O) 吸(O-O) 引(O-O) ,(O-O) 決(O-O) 定(O-O) 給(O-O) 它(O-O) 一(O-O) 個(O-O) 機(O-O) 會(O-O) 踏(O-O) 進(O-O) 去(O-O) 嚐(O-O) 鮮(O-O) !(O-O) 有(O-O) 點(O-O) 失(O-O) 望(O-O) ,(O-O) 因(O-O) 為(O-O) 炒(B-NEG) 飯(I-NEG) 一(O-O) 般(O-O) 般(O-O) ,(O-O) 飯(B-NEG) 糰(I-NEG) 好(O-O) 難(O-O) 吃(O-O) ,(O-O) 冷(O-O) 氣(O-O) 超(O-O) 冷(O-O) ,(O-O) 串(B-NEG) 燒(I-NEG) 不(O-O) 推(O-O) 薦(O-O)
!(O-O) 唯(O-O) 一(O-O) 印(O-O) 象(O-O) 深(O-O) 刻(O-O) 的(O-O) 事(O-O) .(O-O) .(O-O) .(O-O) 蛤(B-POS) 蜊(I-POS) 好(O-O) 大(O-O) 顆(O-O) ,(O-O) 大(O-POS) 蝦(I-POS) 毛(O-O) 毛(O-O) 蟲(O-O) 好(O-O) 吃(O-O) !(O-O) 整(O-O) 體(O-O) 環(B-POS)
境(I-POS) 不(O-O) 差(O-O) ,(O-O) 服(B-POS) 務(I-POS) 也(O-O) 可(O-O) 以(O-O) ,(O-O) 但(O-O) 餐(B-NEG) 點(I-NEG) 很(O-O) 一(O-O) 般(O-O)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

senti_c-0.2.0.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

senti_c-0.2.0-py3-none-any.whl (50.9 kB view details)

Uploaded Python 3

File details

Details for the file senti_c-0.2.0.tar.gz.

File metadata

  • Download URL: senti_c-0.2.0.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.10

File hashes

Hashes for senti_c-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6a0973c3a87776e2307d58f841fc8c6ed08a98980133e383c811c7da84281f40
MD5 39707be6a8a38aa1d83b0d65011bff41
BLAKE2b-256 eb6c9eb76d6c0b938b212ceec3a54ccde2f3e1664aae1d5b7184e90bdca0af20

See more details on using hashes here.

File details

Details for the file senti_c-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: senti_c-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 50.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.10

File hashes

Hashes for senti_c-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4bff42581a7b2e2cb6f440080d445fd903795dd3b25c74fbded468b45386981
MD5 e4b0756dd9e0e0eed1b35acf19a9405f
BLAKE2b-256 3c323d5740c59404ac25c3a7a983d51c8e52ede3ffc01b263237a5b548920d5d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page