Skip to main content

multilingual emoji prediction

Project description

Bertmoticon

The Bertmoticon package is fine-tuned from the BERT model, to the emoji prediction task. It can predict emojis in 102 languages. In this package we include two functions that enable the use of it: bertmoticon.infer and bertmoticon.infer_mappings. The number of emojis available for this model are 80; and are listed in bertmoticon.emojis.

Paper

link to paper

TwitterEmoticon Dataset

You can download the TwitterEmoticon Dataset here: link. Abiding by the TwitterAPI's guidelines we are only allowed to share the IDs of the tweets. The link contains the test,train and valiadation splits that we used for our paper. The duplicates that exist signal that the tweet had more than one emoji.

TwitterCovid Dataset

You can download the TwitterCovid dataset that is mentioned in the paper through here: (5 parts) a, b , c , d , e. Abiding by TwitterAPI's guidelines we are only allowed to provide the IDs of the tweets.

Installation

Installing the Bertmoticon package from PyPI using:

pip3 install bertmoticon

Importing in python

Importing the package can be done as:

import bertmoticon

If the model is not already downloaded; upon first run it will download and extract the model automatically as such:

Downloading bermoticon model
[=                                                          ]
...
[==================                                         ]
...
[===========================================================]
Extracting the model

The model is not included with the pypi installation. It requires 1.34 GB. Loads it either into CUDA or CPU based on CUDA availability.

Usage

bertmoticon.emojis

The model can predict up to 80 emojis. Acceessing the emojis can be done by calling the global variable emojis called as bertmoticon.emojis.

>>> print(bertmoticon.emojis)
['๐Ÿ˜‚', '๐Ÿ˜ญ', '๐Ÿ˜', '๐Ÿ˜Š', '๐Ÿ™', '๐Ÿ˜…', '๐Ÿ˜', '๐Ÿ™„', '๐Ÿ˜˜', '๐Ÿ˜”', '๐Ÿ˜ฉ', '๐Ÿ˜‰', '๐Ÿ˜Ž', '๐Ÿ˜ข', '๐Ÿ˜†', '๐Ÿ˜‹', '๐Ÿ˜Œ', '๐Ÿ˜ณ', '๐Ÿ˜', '๐Ÿ™‚', '๐Ÿ˜ƒ', '๐Ÿ™ƒ', '๐Ÿ˜’', '๐Ÿ˜œ', '๐Ÿ˜€', '๐Ÿ˜ฑ', '๐Ÿ™ˆ', '๐Ÿ˜„', '๐Ÿ˜ก', '๐Ÿ˜ฌ', '๐Ÿ™Œ', '๐Ÿ˜ด', '๐Ÿ˜ซ', '๐Ÿ˜ช', '๐Ÿ˜ค', '๐Ÿ˜‡', '๐Ÿ˜ˆ', '๐Ÿ˜ž', '๐Ÿ˜ท', '๐Ÿ˜ฃ', '๐Ÿ˜ฅ', '๐Ÿ˜', '๐Ÿ˜‘', '๐Ÿ˜“', '๐Ÿ˜•', '๐Ÿ˜น', '๐Ÿ˜', '๐Ÿ˜ป', '๐Ÿ˜–', '๐Ÿ˜›', '๐Ÿ˜ ', '๐Ÿ™Š', '๐Ÿ˜ฐ', '๐Ÿ˜š', '๐Ÿ˜ฒ', '๐Ÿ˜ถ', '๐Ÿ˜ฎ', '๐Ÿ™', '๐Ÿ˜ต', '๐Ÿ˜—', '๐Ÿ˜Ÿ', '๐Ÿ˜จ', '๐Ÿ™‡', '๐Ÿ™‹', '๐Ÿ˜™', '๐Ÿ˜ฏ', '๐Ÿ™†', '๐Ÿ™‰', '๐Ÿ˜ง', '๐Ÿ˜ฟ', '๐Ÿ˜ธ', '๐Ÿ™€', '๐Ÿ˜ฆ', '๐Ÿ˜ฝ', '๐Ÿ˜บ', '๐Ÿ˜ผ', '๐Ÿ™…', '๐Ÿ˜พ', '๐Ÿ™', '๐Ÿ™Ž']

bertmoticon.infer

Takes in a list of strings and an int number of guesses. It returns a list of dictionaries, where each dictionary contains an emoji and a corresponding percentage.

>>> ls_of_strings =  ["Vote #TRUMP2020ToSaveAmerica from corrupt Joe Biden and the radical left.","Je veux aller dormir. #fatiguรฉ"]
>>> print(bertmoticon.infer(ls_of_strings,3))
    [{'๐Ÿ˜‚': '0.1938', '๐Ÿ˜ก': '0.1866', '๐Ÿ™„': '0.0847'}, {'๐Ÿ˜ด': '0.1547', '๐Ÿ˜ญ': '0.1507', '๐Ÿ˜ฉ': '0.0892'}]

bertmoticon.infer_mappings

Takes in a list of strings, a dictionary dict of the emoji mappings, and an int number of guesses. It returns the number of occurences of each key value. We define the dictionary and the list as follows:

>>> mappings = {"Anger":['๐Ÿ˜ก'], "Other":['๐Ÿ˜‚','๐Ÿ˜ญ']}
>>> ls_of_strings =  ["Vote #TRUMP2020ToSaveAmerica from corrupt Joe Biden and the radical left.","Je veux aller dormir. #fatiguรฉ"]

The key values are the category names and the values are lists of the emojis contained in that category. Then parsed into the bertmoticon.infer_mappings returns:

>>>print(bertmoticon.infer_mappings(ls_of_strings,mappings,3))
{'Anger': 1, 'Other': 2}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bertmoticon-1.0.2.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bertmoticon-1.0.2-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file bertmoticon-1.0.2.tar.gz.

File metadata

  • Download URL: bertmoticon-1.0.2.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for bertmoticon-1.0.2.tar.gz
Algorithm Hash digest
SHA256 38045e5fa62fa37dc62734e6d1f903b0b659d103a7c2414b5b008bdcafe48c7c
MD5 d5407312921e42ccc580304752bfa272
BLAKE2b-256 cb2e47fb48f178dcd86896b29d955b5120bcec5573e8bd073858813ecfbb96c6

See more details on using hashes here.

File details

Details for the file bertmoticon-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: bertmoticon-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for bertmoticon-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d91b6d690caf04599874c8093efcf89231ca98cb0bdbb2621ed9854659996faf
MD5 f6921d54911fc3f380777e0d6e299a34
BLAKE2b-256 0a35190b96002745c6e1c34ff80e5801e3ac3be3245240b5ee0b6e471916908e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page