Skip to main content

A Transformer-based library for SocialNLP tasks

Project description

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Tests Test it in Colab

A Transformer-based library for SocialNLP tasks.

Currently supports:

Task Languages
Sentiment Analysis es, en, it, pt
Hate Speech Detection es, en, it, pt
Irony Detection es, en, it, pt
Emotion Analysis es, en, it, pt
NER & POS tagging es, en
Contextualized Hate Speech Detection es
Targeted Sentiment Analysis es

Just do pip install pysentimiento and start using it:

Getting Started

from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="es")

analyzer.predict("Qué gran jugador es Messi")
# returns AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer.predict("Esto es pésimo")
# returns AnalyzerOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer.predict("Qué es esto?")
# returns AnalyzerOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

analyzer.predict("jejeje no te creo mucho")
# AnalyzerOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""

emotion_analyzer = create_analyzer(task="emotion", lang="en")

emotion_analyzer.predict("yayyy")
# returns AnalyzerOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer.predict("fuck off")
# returns AnalyzerOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})

"""
Hate Speech (misogyny & racism)
"""
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="es")

hate_speech_analyzer.predict("Esto es una mierda pero no es odio")
# returns AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})
hate_speech_analyzer.predict("Esto es odio porque los inmigrantes deben ser aniquilados")
# returns AnalyzerOutput(output=['hateful'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})

hate_speech_analyzer.predict("Vaya guarra barata y de poca monta es XXXX!")
# returns AnalyzerOutput(output=['hateful', 'targeted', 'aggressive'], probas={hateful: 0.987, targeted: 0.978, aggressive: 0.969})

See TASKS for more details on the supported tasks and languages, and also for reported performance for each benchmarked model.

Also, check these notebooks with examples of how to use pysentimiento for each language:

Preprocessing

pysentimiento features a tweet preprocessor specially suited for tweet classification with transformer-based models.

from pysentimiento.preprocessing import preprocess_tweet

# Replaces user handles and URLs by special tokens
preprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"

# Shortens repeated characters
preprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"

# Normalizes laughters
preprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"

# Handles hashtags
preprocess_tweet("esto es #UnaGenialidad")
# "esto es una genialidad"

# Handles emojis
preprocess_tweet("🎉🎉", lang="en")
# 'emoji party popper emoji emoji party popper emoji'

Instructions for developers

  1. Clone and install
git clone https://github.com/pysentimiento/pysentimiento
pip install poetry
poetry shell
poetry install
  1. Run script to train models

Check TRAIN.md for further information on how to train your models

Note: you need access to the datasets, which are not public for the time being. Send us an email to get access to them.

  1. Upload models to Huggingface's Model Hub

Check "Model sharing and upload" instructions in huggingface docs.

License

pysentimiento is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use

  1. TASS Dataset license (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)

  2. SEMEval 2017 Dataset license (Sentiment Analysis in English)

  3. LinCE Datasets (License for NER & POS tagging)

Suggestions and bugfixes

Please use the repository issue tracker to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)

Citation

If you use pysentimiento in your work, please cite this paper

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks}, 
      author={Juan Manuel Pérez and Mariela Rajngewerc and Juan Carlos Giudici and Damián A. Furman and Franco Luque and Laura Alonso Alemany and María Vanina Martínez},
      year={2023},
      eprint={2106.09462},a
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Also, pleace cite related pre-trained models and datasets for the specific models you use. Check REFERENCES for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysentimiento-0.7.3.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

pysentimiento-0.7.3-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file pysentimiento-0.7.3.tar.gz.

File metadata

  • Download URL: pysentimiento-0.7.3.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/43.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.2.1 tqdm/4.66.2 importlib-metadata/7.0.1 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.16

File hashes

Hashes for pysentimiento-0.7.3.tar.gz
Algorithm Hash digest
SHA256 0742cb8c78e500aa9cb3bb414a1d41c98bbac363c78b2a9d215c0bb401259e50
MD5 eafb3411e01ecd8eab9e4462a6125df7
BLAKE2b-256 ff14f5209bc64b34500145cc7d06eeb8e4f65e1baceff3b051ab14688f6b69bb

See more details on using hashes here.

File details

Details for the file pysentimiento-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: pysentimiento-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/43.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.2.1 tqdm/4.66.2 importlib-metadata/7.0.1 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.16

File hashes

Hashes for pysentimiento-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9b646885733dc755f8bff5d232fb3a6438a07918d451ffe891bebba6de585675
MD5 418aefd495151200a34e7543901216d3
BLAKE2b-256 b4705664402073473663484cac4a49323f3e57a566ba185c2e593fa2760c14da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page