Skip to main content

PyTorch models for polish language sentiment regression based on allegro/herbert and CLARIN-PL dataset

Project description

sentimentPL

PyTorch models for Polish language sentiment regression based on allegro/herbert and CLARIN-PL dataset

PyPI - License PyPI GitHub Repo stars GitHub last commit

Installation

sentimentPL is available on PyPI, so You can just run:

$ pip3 install sentimentpl

Basic Usage

For a given sentence, the model produces output value from (-1;1) range (from most negative to most positive).

from sentimentpl.models import SentimentPLModel

model = SentimentPLModel(from_pretrained='latest')
print(model('Jestem wesoły Romek').item())

Note: The model uses transformers API to load pretrained embedding models from their repository. They should be downloaded and cached on Your machine.

Note: The model loads pretrained state dicts for final regression layers from a file included in the package files (as its size does not exceed 1MB). This will be changed in the future, so the model would be loaded from external repository.

Training

For training You would probably want to download the source code by cloning the repository:

$ git clone https://github.com/philvec/sentimentPL.git

Download training data from
https://clarin-pl.eu/dspace/bitstream/handle/11321/710/dataset_conll.zip
and unzip it to sentimentpl/data.

In the main repository dir, run

$ python3 ./sentimentpl/train.py

Version history

v.0.0.6 latest

model better trained to MSE ~0.307, added HerBERT finetuning option

v.0.0.5

Basic 3-layer MLP with ReLU and input Dropout.

References:

  • Kocoń, Jan; Zaśko-Zielińska, Monika and Miłkowski, Piotr, 2019, PolEmo 2.0 Sentiment Analysis Dataset for CoNLL, CLARIN-PL digital repository, http://hdl.handle.net/11321/710.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi,P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer,P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao,S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers:State-of-the-art natural language processing,” inProceedings of the2020 Conference on Empirical Methods in Natural LanguageProcessing: System Demonstrations, (Online), pp. 38–45, Associationfor Computational Linguistics, Oct. 2020.
  • P. Rybak, R. Mroczkowski, J. Tracz, and I. Gawlik, “Klej:Comprehensive benchmark for polish language understanding,” 2020

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentimentpl-0.0.6.tar.gz (759.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentimentpl-0.0.6-py3-none-any.whl (769.6 kB view details)

Uploaded Python 3

File details

Details for the file sentimentpl-0.0.6.tar.gz.

File metadata

  • Download URL: sentimentpl-0.0.6.tar.gz
  • Upload date:
  • Size: 759.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sentimentpl-0.0.6.tar.gz
Algorithm Hash digest
SHA256 8d91980140715198bc9b899440b0c3dc1d487d5fa387e08ce7ee646fd2e29a62
MD5 2f001bcd86f9b79c0ff3174e55d5a07d
BLAKE2b-256 fd775d73e4c1361eb52d478b67527a4b882c0c0c2475850293c8e65e0372707d

See more details on using hashes here.

File details

Details for the file sentimentpl-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: sentimentpl-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 769.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sentimentpl-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3c4a7edaafff528e812b5d79de936504e486461c5c071737467f7420935d0306
MD5 1180f2a9b4a54f5cf97f1e644b2a5a8a
BLAKE2b-256 1445120d99854246040f2b75fa580a39c1eb8f068415c0257526e1964a71b30e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page