Python package for loading Stanford Sentiment Treebank corpus
Project description
SST Utils
---------
Utilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.
See examples below for usage. Tested in Python `3.4.3` and `2.7.12`.
![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)
Javascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).
[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)
[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)
### Visualization
Allows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:
```python
import pytreebank
# load the sentiment treebank corpus in the parenthesis format,
# e.g. "(4 (2 very ) (3 good))"
dataset = pytreebank.load_sst()
# add Javascript and CSS to the Ipython notebook
pytreebank.LabeledTree.inject_visualization_javascript()
# select and example to visualize
example = dataset["train"][0]
# display it in the page
example.display()
```
![Example visualization using pytreebank](visualization_example.png)
### Lines and Labels
To use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:
```python
import pytreebank
dataset = pytreebank.load_sst()
example = dataset["train"][0]
# extract spans from the tree.
for label, sentence in example.to_labeled_lines():
print("%s has sentiment label %s" % (
sentence,
["very negative", "negative", "neutral", "positive", "very positive"][label]
))
```
### Download/Loading control:
Change the save/load directory by passing a path (this will look for
`train.txt`, `dev.txt` and `test.txt` files under the directory).
```
dataset = pytreebank.load_sst("/path/to/sentiment/")
```
To just load a single dataset file:
```
train_data = pytreebank.import_tree_corpus("/path/to/sentiment/train.txt")
```
---------
Utilities for downloading, importing, and visualizing the [Stanford Sentiment Treebank](http://nlp.stanford.edu/sentiment/treebank.html), a dataset capturing fine-grained sentiment over movie reviews.
See examples below for usage. Tested in Python `3.4.3` and `2.7.12`.
![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)
Javascript code by Jason Chuang and Stanford NLP modified and taken from [Stanford NLP Sentiment Analysis demo](http://nlp.stanford.edu:8080/sentiment/rntnDemo.html).
[![PyPI version](https://badge.fury.io/py/pytreebank.svg)](https://badge.fury.io/py/pytreebank)
[![Build Status](https://travis-ci.org/JonathanRaiman/pytreebank.svg?branch=master)](https://travis-ci.org/JonathanRaiman/pytreebank)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)
### Visualization
Allows for visualization using Jason Chuang's Javascript and CSS within an IPython notebook:
```python
import pytreebank
# load the sentiment treebank corpus in the parenthesis format,
# e.g. "(4 (2 very ) (3 good))"
dataset = pytreebank.load_sst()
# add Javascript and CSS to the Ipython notebook
pytreebank.LabeledTree.inject_visualization_javascript()
# select and example to visualize
example = dataset["train"][0]
# display it in the page
example.display()
```
![Example visualization using pytreebank](visualization_example.png)
### Lines and Labels
To use the corpus to output spans from the different trees you can call the `to_labeled_lines` and `to_lines` method of a `LabeledTree`. The first returned sentence in those lists is always the root sentence:
```python
import pytreebank
dataset = pytreebank.load_sst()
example = dataset["train"][0]
# extract spans from the tree.
for label, sentence in example.to_labeled_lines():
print("%s has sentiment label %s" % (
sentence,
["very negative", "negative", "neutral", "positive", "very positive"][label]
))
```
### Download/Loading control:
Change the save/load directory by passing a path (this will look for
`train.txt`, `dev.txt` and `test.txt` files under the directory).
```
dataset = pytreebank.load_sst("/path/to/sentiment/")
```
To just load a single dataset file:
```
train_data = pytreebank.import_tree_corpus("/path/to/sentiment/train.txt")
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytreebank-0.2.6.tar.gz
(33.4 kB
view details)
File details
Details for the file pytreebank-0.2.6.tar.gz
.
File metadata
- Download URL: pytreebank-0.2.6.tar.gz
- Upload date:
- Size: 33.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.14.2 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a178aea27768492c78790b80cfbd114c3c55399f63435b3e16d9a1f931d5777d |
|
MD5 | 975d043746d93124e1c4e671642b6a59 |
|
BLAKE2b-256 | ebeb58ba1445b0c9e93bfb48d81209ebe5cad7a3916246345ff165f9493143ef |