Skip to main content

Target Dependent Sentiment Analysis (TDSA) framework.

Project description

Bella

Build Status

Target Dependent Sentiment Analysis (TDSA) framework.

Requirements and Installation

  1. Python 3.6
  2. pip install bella-tdsa
  3. Install docker
  4. Start Stanford CoreNLP server: docker run -p 9000:9000 -d --rm mooreap/corenlp
  5. Start the TweeboParser API server: docker run -p 8000:8000 -d --rm mooreap/tweeboparserdocker

The docker Stanford and Tweebo server are only required if you are going to use the TDParse methods/models or if you are going to use any of the Stanford Tools else you do not need them.

To stop the docker servers running:

  1. Find the name assigned to the docker image using: docker ps
  2. Then stop the relevant docker image: docker stop name_of_image

NOTE Both of these servers will run with as many threads as your machine has CPUs to limit this do the following:

  1. For stanford: docker run -p 9000:9000 -d --rm mooreap/corenlp -threads 6 will run it with 6 threads
  2. For TweeboParser: docker run -p 8000:8000 -d --rm mooreap/tweeboparserdocker --threads 6 will run it with 6 threads

Dataset

All of the dataset are required to be downloaded and are not stored in this repository. We recomend using the config file to state where the datasets are stored like we did but this is not a requirement as you can state where they are stored explictly in the code. For more details on the datasets and downloading them see the dataset notebook The datasets used:

  1. SemEval 2014 Resturant dataset. We used Train dataset version 2 and the test dataset of which the gold standatd test can be found here.
  2. SemEval 2014 Laptop dataset. We used Train dataset version2 and the test dataset of which the gold standard test can be found here.
  3. Election dataset
  4. Dong et al. Twitter dataset
  5. Youtubean dataset by Marrese-Taylor et al.
  6. Mitchell dataset which was released with this paper.

NOTE Before using Mitchell and YouTuBean datasets please go through these pre-processing notebooks: Mitchell YouTuBean for splitting their data and also in Mitchell case which train test split to use.

Lexicons

These lexicons are required to be downloaded if you use any methods that require them. Please see the use of the config file for storing the location of the lexicons:

  1. MPQA can be found here
  2. NRC here
  3. Hu and Liu here

Word Vectors

All the word vectors are automatically downloaded for you and they are stored in the root directory called .Bella/Vectors which is created in your user directory e.g. on Linux that would be ~/.Bella/Vectors/. The word vectors included in this repository are the following:

  1. SSWE
  2. Word Vectors trained on sentences that contain emojis
  3. Glove Common Crawl
  4. Glove Twitter
  5. Glove Wiki Giga

Model Zoo

The model zoo can be found on the Git Lab repository here.

These models can be automatically downloaded through the code like the word vectors and stored in the .Bella/Models directory which is automatically placed in your home directory for instance on Linux that would be ~/.Bella/Models. An example of how to download and use a model is shown below:

from bella import helper
from bella.models.target import TargetDep

target_dep = helper.download_model(TargetDep, 'SemEval 14 Restaurant')
test_example_multi = [{'text' : 'This bread is tasty but the sauce is too rich', 'target': 'sauce', 
                       'spans': [(28, 33)]}]

target_dep.predict(test_example_multi)

This example will download the Target Dependent model which is from Vo and Zhang paper that has been trained on the SemEval 2014 Resturant data and predict the sentiment of sauce from that example. As you can see the example is not simple as it has two different sentiments within the same sentence with two targets; 1. bread with a positive sentiment and 2. sauce which has a negative sentiment of which that target is the one being predicted for in this example.

To see a more in depth guide to the pre-trained models and output from them go to this notebook.

The notebooks

Can be found here

The best order to look at the notebooks is first look at the data with this notebook. Then looking at the notebook that describes how to load and use the saved models from the model zoo. Then go and explore the rest if you would like:

  1. The Mass evaluation notebooks are the following
  2. For the analysis of the reproduction of the Target Dependent model of Vo and Zhang see this notebook
  3. For the analysis of the reproduction of the TDParse model of Wang et al. see this notebook
  4. For the analysis of the reproduction of the LSTM models of Tang et al. see this notebook
  5. For the statistics of the datasets and where to find them see this notebook
  6. For the code on creating training and test splits for the YouTuBean dataset see this notebook
  7. For the code on creating training and test splits for Mitchell et al. dataset see this notebook
  8. Pre-Trained Model examples notebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bella_tdsa-0.2.1.tar.gz (71.6 kB view hashes)

Uploaded Source

Built Distribution

bella_tdsa-0.2.1-py3-none-any.whl (80.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page