Sentiment analysis library for russian language
Project description
# Dostoevsky [![Build Status](https://travis-ci.org/bureaucratic-labs/dostoevsky.svg?branch=master)](https://travis-ci.org/bureaucratic-labs/dostoevsky)
<img align="right" src="https://i.imgur.com/uLMWPuL.png">
Library for sentiment analysis of russian language
Currently, contains only one model: for classification of social networks comments / text messengers messages
## Install
Please note that `Dostoevsky` supports only Python 3.6 (3.7+ version'll be supported when tensorflow get it support, sorry)
```bash
$ pip install dostoevsky
```
## Social networks comment model
This model was trained on [RuSentiment dataset](https://github.com/text-machine-lab/rusentiment) and achieves up to ~0.70 F1 score
![](https://i.imgur.com/bGAEWvg.png)
### Usage
First of all, you'll need to download pretrained word embeddings and model:
```bash
$ python -m doestoevsky.data download vk-embeddings cnn-social-network-model
```
Then, we can build our pipeline: `text -> tokenizer -> word embeddings -> CNN`
```python
from dostoevsky.tokenization import UDBaselineTokenizer
from dostoevsky.word_vectors import SocialNetworkWordVectores
from dostoevsky.models import SocialNetworkModel
tokenizer = UDBaselineTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')]
word_vectors_container = SocialNetworkWordVectores()
vectors = word_vectors_container.get_word_vectors(tokens)
vectors.shape # (3, 300) - three words/vectors with dim=300
model = SocialNetworkModel(
tokenizer=tokenizer,
word_vectors_container=word_vectors_container,
lemmatize=False,
)
model.predict(['наступили на ногу', 'всё суперски']) # array(['negative', 'positive'], dtype='<U8')
```
<img align="right" src="https://i.imgur.com/uLMWPuL.png">
Library for sentiment analysis of russian language
Currently, contains only one model: for classification of social networks comments / text messengers messages
## Install
Please note that `Dostoevsky` supports only Python 3.6 (3.7+ version'll be supported when tensorflow get it support, sorry)
```bash
$ pip install dostoevsky
```
## Social networks comment model
This model was trained on [RuSentiment dataset](https://github.com/text-machine-lab/rusentiment) and achieves up to ~0.70 F1 score
![](https://i.imgur.com/bGAEWvg.png)
### Usage
First of all, you'll need to download pretrained word embeddings and model:
```bash
$ python -m doestoevsky.data download vk-embeddings cnn-social-network-model
```
Then, we can build our pipeline: `text -> tokenizer -> word embeddings -> CNN`
```python
from dostoevsky.tokenization import UDBaselineTokenizer
from dostoevsky.word_vectors import SocialNetworkWordVectores
from dostoevsky.models import SocialNetworkModel
tokenizer = UDBaselineTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')]
word_vectors_container = SocialNetworkWordVectores()
vectors = word_vectors_container.get_word_vectors(tokens)
vectors.shape # (3, 300) - three words/vectors with dim=300
model = SocialNetworkModel(
tokenizer=tokenizer,
word_vectors_container=word_vectors_container,
lemmatize=False,
)
model.predict(['наступили на ногу', 'всё суперски']) # array(['negative', 'positive'], dtype='<U8')
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dostoevsky-0.1.0.tar.gz
(8.5 kB
view hashes)
Built Distribution
dostoevsky-0.1.0-py3-none-any.whl
(11.7 kB
view hashes)
Close
Hashes for dostoevsky-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e76ee388b4b946ec4ca0e39a7675605f71e4c33d4865016238993de74508fb19 |
|
MD5 | c8e125d338aa957411f14693120e17bd |
|
BLAKE2b-256 | 5fb9aecb4cdd44b764262e9c4abaf08d49c69d05476b2a129625c07e805a1b9c |