Sentiment analysis library for russian language
Project description
Dostoevsky
Sentiment analysis library for russian language
Install
Please note that Dostoevsky
supports only Python 3.6+
$ pip install dostoevsky
Social networks comment model
This model was trained on RuSentiment dataset and achieves up to ~0.70 F1 score
Usage
First of all, you'll need to download pretrained word embeddings and model:
$ dostoevsky download vk-embeddings cnn-social-network-model
Then, we can build our pipeline: text -> tokenizer -> word embeddings -> CNN
from dostoevsky.tokenization import UDBaselineTokenizer, RegexTokenizer
from dostoevsky.embeddings import SocialNetworkEmbeddings
from dostoevsky.models import SocialNetworkModel
tokenizer = UDBaselineTokenizer() or RegexTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', 'ADJ'), ('очень', 'ADV'), ('плохо', 'ADV')]
embeddings_container = SocialNetworkEmbeddings()
vectors = embeddings_container.get_word_vectors(tokens)
vectors.shape # (3, 300) - three words/vectors with dim=300
model = SocialNetworkModel(
tokenizer=tokenizer,
embeddings_container=embeddings_container,
lemmatize=False,
)
messages = [
'наступили на ногу',
'всё суперски',
]
results = model.predict(messages)
for message, sentiment in zip(messages, results):
print(message, '->', sentiment) # наступили на ногу -> negative
License
](https://app.fossa.io/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky?ref=badge_large)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dostoevsky-0.2.0.tar.gz
(8.9 kB
view hashes)
Built Distributions
dostoevsky-0.2.0-py3-none-any.whl
(12.0 kB
view hashes)
Close
Hashes for dostoevsky-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f2a29d69d2d337aeffa2fa91901074dcef97d629155356bcecbb2c592af87d2 |
|
MD5 | 3a67cc902acf2e0cdac0b4538adfbead |
|
BLAKE2b-256 | 4ef704894ebfbfc0a04244cbfcc6e3b102144c87a54667e66a5d127a7ec42ead |
Close
Hashes for dostoevsky-0.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dac7e8cdfdb818441b81a20eaa79b2ef0d5f695ccbdc3d4aef9ff99616598c12 |
|
MD5 | 1ca44b637210e022fb553ff855192199 |
|
BLAKE2b-256 | 57e3a1ab98e9f22be97c3f770567d18b03cea17ce3fd68aa7bf2e23e956b53e6 |