PyTorch models for polish language sentiment regression based on allegro/herbert and CLARIN-PL dataset
Project description
sentimentPL
PyTorch models for Polish language sentiment regression based on allegro/herbert and CLARIN-PL dataset
Installation
sentimentPL is available on PyPI, so You can just run:
$ pip3 install sentimentpl
Basic Usage
For a given sentence, the model produces output value from (-1;1) range (from most negative to most positive).
from sentimentpl.models import SentimentPLModel
model = SentimentPLModel(from_pretrained='latest')
print(model('Jestem wesoły Romek').item())
Note: The model uses transformers API to load pretrained embedding models from their repository. They should be downloaded and cached on Your machine.
Note: The model loads pretrained state dicts for final regression layers from a file included in the package files (as its size does not exceed 1MB). This will be changed in the future, so the model would be loaded from external repository.
Training
For training You would probably want to download the source code by cloning the repository:
$ git clone https://github.com/philvec/sentimentPL.git
Download training data from
https://clarin-pl.eu/dspace/bitstream/handle/11321/710/dataset_conll.zip
and unzip it to sentimentpl/data.
In the main repository dir, run
$ python3 ./sentimentpl/train.py
Version history
v.0.0.6 latest
model better trained to MSE ~0.307, added HerBERT finetuning option
v.0.0.5
Basic 3-layer MLP with ReLU and input Dropout.
References:
- Kocoń, Jan; Zaśko-Zielińska, Monika and Miłkowski, Piotr, 2019, PolEmo 2.0 Sentiment Analysis Dataset for CoNLL, CLARIN-PL digital repository, http://hdl.handle.net/11321/710.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi,P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer,P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers:State-of-the-art natural language processing,” inProceedings of the2020 Conference on Empirical Methods in Natural LanguageProcessing: System Demonstrations, (Online), pp. 38–45, Associationfor Computational Linguistics, Oct. 2020.
- P. Rybak, R. Mroczkowski, J. Tracz, and I. Gawlik, “Klej:Comprehensive benchmark for polish language understanding,” 2020
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sentimentpl-0.0.6.tar.gz.
File metadata
- Download URL: sentimentpl-0.0.6.tar.gz
- Upload date:
- Size: 759.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d91980140715198bc9b899440b0c3dc1d487d5fa387e08ce7ee646fd2e29a62
|
|
| MD5 |
2f001bcd86f9b79c0ff3174e55d5a07d
|
|
| BLAKE2b-256 |
fd775d73e4c1361eb52d478b67527a4b882c0c0c2475850293c8e65e0372707d
|
File details
Details for the file sentimentpl-0.0.6-py3-none-any.whl.
File metadata
- Download URL: sentimentpl-0.0.6-py3-none-any.whl
- Upload date:
- Size: 769.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c4a7edaafff528e812b5d79de936504e486461c5c071737467f7420935d0306
|
|
| MD5 |
1180f2a9b4a54f5cf97f1e644b2a5a8a
|
|
| BLAKE2b-256 |
1445120d99854246040f2b75fa580a39c1eb8f068415c0257526e1964a71b30e
|