Skip to main content

This tool provides the state-of-the-art models for aspect term extraction (ATE), aspect polarity classification (APC), and text classification.

Project description

PyABSA - Open Framework for Aspect-based Sentiment Analysis

PyPI - Python Version PyPI PyPI_downloads License

total views total views per week total clones total clones per week

PWC

All Contributors

Aspect Term Extraction (ATE) & Aspect Polarity Classification (APC)

Fast & Low Memory requirement & Enhanced implementation of Local Context Focus

Build from LC-ABSA / LCF-ABSA / LCF-BERT and LCF-ATEPC.

PyTorch Implementations (CPU & CUDA supported).

If you are willing to support PyABSA project, please star this repository as your contribution.

Annotate Your Own Dataset

The repo ABSADatasets provides an open-soruce dataset annotating tool, you can easily annotate your dataset before using PyABSA.

1. Package Overview

pyabsa package root (including all interfaces)
pyabsa.functional recommend interface entry
pyabsa.functional.checkpoint checkpoint manager entry, inference model entry
pyabsa.functional.dataset datasets entry
pyabsa.functional.config predefined config manager
pyabsa.functional.trainer training module, every trainer return a inference model

2. Important: Read the Tips

2.1 Use your custom dataset

PyABSA use the FindFile to find the target file which means you can specify a dataset/checkpoint by keywords instead of using absolute path. e.g.,

  • First, refer to ABSADatasets to prepare your dataset into acceptable format.
  • You can PR to contribute your dataset and use it like ABDADatasets.your_dataset, or use it by dataset absolute / relative path, or dataset dir name
dataset = './laptop' # relative path
dataset = 'ABSOLUTE_PATH/laptop/' # absolute path
dataset = 'laptop' # dataset directory name, keyword case doesn't matter
dataset = 'lapto' # search any directory whose path contains the 'lapto' or 'aptop'

checkpoint = 'lcfs' # checkpoint assignment is similar to above methods

2.2 Auto select the free cuda for training & inference

PyABSA use the AutoCUDA to support automatic cuda assignment, but you can still set a preferred device.

auto_device = True  # to auto assign a cuda device for training / inference
auto_device = False  # to use cpu
auto_device = 'cuda:1'  # to specify a preferred device
auto_device = 'cpu'  # to specify a preferred device

2.3 Flexible labels than others

PyABSA encourages you to use string labels instead of numbers. e.g., sentiment labels = {negative, positive, unknown}

  • What labels you labeled in the dataset, what labels will be output in inference
  • The version information of PyABSA is also available in the output while loading checkpoints training args.
  • You can train a model using multiple datasets with same sentiment labels, and you can even contribute and define a combination of datasets here!

2.4 Get/Set config options

The default spaCy english model is en_core_web_sm, if you didn't install it, PyABSA will download/install it automatically.

If you would like to change english model (or other pre-defined options), you can get/set as following:

from pyabsa.functional.config.apc_config_manager import APCConfigManager
from pyabsa.functional.config.atepc_config_manager import ATEPCConfigManager
from pyabsa.functional.config.classification_config_manager import ClassificationConfigManager

# Set
APCConfigManager.set_apc_config_english({'spacy_model': 'en_core_web_lg'})
ATEPCConfigManager.set_atepc_config_english({'spacy_model': 'en_core_web_lg'})
ClassificationConfigManager.set_classification_config_english({'spacy_model': 'en_core_web_lg'})

# Get
APCConfigManager.get_apc_config_english()
ATEPCConfigManager.get_atepc_config_english()
ClassificationConfigManager.get_classification_config_english()

# Manually Set spaCy nlp Language object
from pyabsa.core.apc.dataset_utils.apc_utils import configure_spacy_model

nlp = configure_spacy_model(APCConfigManager.get_apc_config_english())

3. Quick Tutorial

  • Create a new python environment and install pyabsa
  • Find a suitable demo script (ATEPC , APC , Text Classification) to prepare your work. (Welcome to share your demo script)
  • Format/Annotate your dataset referring to ABSADatasets or use public dataset in ABSADatasets
  • Init your config to specify Model, Dataset, hyper-parameters
  • Training your model and get checkpoints
  • Share your checkpoint and dataset

4. Installation

Please do not install the version without corresponding release note to avoid installing a test version.

4.1 install via pip

To use PyABSA, install the latest version from pip or source code:

pip install -U pyabsa

4.2 install via source

git clone https://github.com/yangheng95/PyABSA --depth=1
cd PyABSA 
python setup.py install

5. Learning to Use Checkpoint

5.1 How to get available checkpoints from Google Drive

PyABSA will check the latest available checkpoints before and load the latest checkpoint from Google Drive. To view available checkpoints, you can use the following code and load the checkpoint by name:

from pyabsa import available_checkpoints

checkpoint_map = available_checkpoints()  # show available checkpoints of PyABSA of current version 

If you can not access to Google Drive, you can download our checkpoints and load the unzipped checkpoint manually. 如果您无法访问谷歌Drive,您可以从此处 (提取码:ABSA) 下载我们预训练的模型,并加载模型(百度云上的checkpoints更新较慢,版本较为滞后,请注意使用对应版本的PyABSA)。

5.2 How to use our pretrained checkpoints on your dataset

5.3 How to share checkpoints (e.g., checkpoints trained on your custom dataset) with community

6. Datasets

More datasets are available at ABSADatasets.

  1. Twitter
  2. Laptop14
  3. Restaurant14
  4. Restaurant15
  5. Restaurant16
  6. Phone
  7. Car
  8. Camera
  9. Notebook
  10. MAMS
  11. TShirt
  12. Television
  13. MOOC
  14. Shampoo
  15. Multilingual (The sum of all datasets.)

You don't have to download the datasets, as the datasets will be downloaded automatically.

7. Model Support

Except for the following models, we provide a template model involving LCF vec, you can develop your model based on the LCF-APC model template or LCF-ATEPC model template.

7.1 ATEPC

  1. LCF-ATEPC
  2. LCF-ATEPC-LARGE (Dual BERT)
  3. FAST-LCF-ATEPC
  4. LCFS-ATEPC
  5. LCFS-ATEPC-LARGE (Dual BERT)
  6. FAST-LCFS-ATEPC
  7. BERT-BASE

7.2 APC

Bert-based APC models

  1. SLIDE-LCF-BERT (Faster & Performs Better than LCF/LCFS-BERT)
  2. SLIDE-LCFS-BERT (Faster & Performs Better than LCF/LCFS-BERT)
  3. LCF-BERT (Reimplemented & Enhanced)
  4. LCFS-BERT (Reimplemented & Enhanced)
  5. FAST-LCF-BERT (Faster with slightly performance loss)
  6. FAST_LCFS-BERT (Faster with slightly performance loss)
  7. LCF-DUAL-BERT (Dual BERT)
  8. LCFS-DUAL-BERT (Dual BERT)
  9. BERT-BASE
  10. BERT-SPC
  11. LCA-Net
  12. DLCF-DCA-BERT *

Bert-based APC baseline models

  1. AOA_BERT
  2. ASGCN_BERT
  3. ATAE_LSTM_BERT
  4. Cabasc_BERT
  5. IAN_BERT
  6. LSTM_BERT
  7. MemNet_BERT
  8. MGAN_BERT
  9. RAM_BERT
  10. TD_LSTM_BERT
  11. TC_LSTM_BERT
  12. TNet_LF_BERT

GloVe-based APC baseline models

  1. AOA
  2. ASGCN
  3. ATAE-LSTM
  4. Cabasc
  5. IAN
  6. LSTM
  7. MemNet
  8. MGAN
  9. RAM
  10. TD-LSTM
  11. TD-LSTM
  12. TNet_LF

Contribution

We expect that you can help us improve this project, and your contributions are welcome. You can make a contribution in many ways, including:

  • Share your custom dataset in PyABSA and ABSADatasets
  • Integrates your models in PyABSA. (You can share your models whether it is or not based on PyABSA. if you are interested, we will help you)
  • Raise a bug report while you use PyABSA or review the code (PyABSA is a individual project driven by enthusiasm so your help is needed)
  • Give us some advice about feature design/refactor (You can advise to improve some feature)
  • Correct/Rewrite some error-messages or code comment (The comments are not written by native english speaker, you can help us improve documents)
  • Create an example script in a particular situation (Such as specify a SpaCy model, pretrainedbert type, some hyperparameters)
  • Star this repository to keep it active

Notice

The LCF is a simple and adoptive mechanism proposed for ABSA. Many models based on LCF has been proposed and achieved SOTA performance. Developing your models based on LCF will significantly improve your ABSA models. If you are looking for the original proposal of local context focus, please redirect to the introduction of LCF. If you are looking for the original codes of the LCF-related papers, please redirect to LC-ABSA / LCF-ABSA or LCF-ATEPC.

Acknowledgement

This work build from LC-ABSA/LCF-ABSA and LCF-ATEPC, and other impressive works such as PyTorch-ABSA and LCFS-BERT.

License

MIT

Contributors ✨

Thanks goes to these wonderful people (emoji key):


XuMayi

💻

YangHeng

📆

brtgpy

🔣

Ryan

💻

lpfy

💻

Jackie Liu

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pyabsa-1.6.17-py3-none-any.whl (218.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page