A lightweight and efficient library for transforming emoticons into their semantic meanings
Project description
emoticon_fix
A lightweight and efficient library for transforming emoticons into their semantic meanings. This is particularly useful for NLP preprocessing where emoticons need to be preserved as meaningful text.
Table of Contents
- What are emoticons?
- What are kaomoji?
- Why transform emoticons to text?
- Installation
- Usage
- Examples
- Contributing
- Testing
- License
What are emoticons?
An emoticon (short for "emotion icon") is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings or mood. The first ASCII emoticons, :-)
and :-(
, were written by Scott Fahlman in 1982, but emoticons actually originated on the PLATO IV computer system in 1972.
What are kaomoji?
Kaomoji (顔文字) are Japanese emoticons that are read horizontally and are more elaborate than traditional Western emoticons. They often use Unicode characters to create more complex expressions and can represent a wider range of emotions and actions. For example, (。♥‿♥。)
represents being in love, and (ノ°益°)ノ
shows rage. Unlike Western emoticons that you read by tilting your head sideways, kaomoji are meant to be viewed straight on.
emoticon_fix supports a wide variety of kaomoji, making it particularly useful for processing text from Asian social media or any platform where kaomoji are commonly used.
Why transform emoticons to text?
When preprocessing text for NLP models, simply removing punctuation can leave emoticons and kaomoji as meaningless characters. For example, :D
(laugh) would become just D
, and (。♥‿♥。)
(in love) would be completely lost. This can negatively impact model performance. By transforming emoticons and kaomoji to their textual meanings, we preserve the emotional context in a format that's more meaningful for NLP tasks.
Installation
pip install emoticon-fix
Usage
from emoticon_fix import emoticon_fix, remove_emoticons, replace_emoticons
# Basic usage - transform emoticons to their meanings
text = 'Hello :) World :D'
result = emoticon_fix(text)
print(result) # Output: 'Hello Smile World Laugh'
# Remove emoticons completely
stripped_text = remove_emoticons(text)
print(stripped_text) # Output: 'Hello World'
# Replace with NER-friendly tags (customizable format)
ner_text = replace_emoticons(text, tag_format="__EMO_{tag}__")
print(ner_text) # Output: 'Hello __EMO_Smile__ World __EMO_Laugh__'
# Works with multiple emoticons
text = 'I am :-) but sometimes :-( and occasionally :-D'
result = emoticon_fix(text)
print(result) # Output: 'I am Smile but sometimes Sad and occasionally Laugh'
Examples
Basic Example
from emoticon_fix import emoticon_fix
text = 'test :) test :D test'
result = emoticon_fix(text)
print(result) # Output: 'test Smile test Laugh test'
Complex Example with Kaomoji
from emoticon_fix import emoticon_fix
text = 'Feeling (。♥‿♥。) today! When things go wrong ┗(^0^)┓ keep dancing!'
result = emoticon_fix(text)
print(result) # Output: 'Feeling In Love today! When things go wrong Dancing Joy keep dancing!'
Mixed Emoticons Example
from emoticon_fix import emoticon_fix
text = 'Western :) meets Eastern (◕‿◕✿) style!'
result = emoticon_fix(text)
print(result) # Output: 'Western Smile meets Eastern Sweet Smile style!'
Removing Emoticons Example
from emoticon_fix import remove_emoticons
text = 'This message :D contains some (。♥‿♥。) emoticons that need to be removed!'
result = remove_emoticons(text)
print(result) # Output: 'This message contains some emoticons that need to be removed!'
NER-Friendly Tagging Example
from emoticon_fix import replace_emoticons
# Default format: __EMO_{tag}__
text = 'Happy customers :) are returning customers!'
result = replace_emoticons(text)
print(result) # Output: 'Happy customers __EMO_Smile__ are returning customers!'
# Custom format
text = 'User feedback: Product was great :D but shipping was slow :('
result = replace_emoticons(text, tag_format="<EMOTION type='{tag}'>")
print(result) # Output: 'User feedback: Product was great <EMOTION type='Laugh'> but shipping was slow <EMOTION type='Sad'>'
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Testing
The package includes a test suite. To run the tests:
pip install -e ".[dev]"
pytest
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file emoticon_fix-0.1.4.tar.gz
.
File metadata
- Download URL: emoticon_fix-0.1.4.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a0b6c585785666df78145ca4e8f3e7642f50f2b0e58716b6ef955b00ef307626
|
|
MD5 |
19e1ed40067e955a727dcde503f60078
|
|
BLAKE2b-256 |
6f6e8ee5d14ad36bad65e1939a058af854855e9b8a1ca312540857c10948786f
|
File details
Details for the file emoticon_fix-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: emoticon_fix-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
8a6ec41d50ff2f2dd341cf3dd79cf7c212404a4c902b7615d071b6c760efbbca
|
|
MD5 |
b9f7c96f14e0c398d89919765ccd5ce6
|
|
BLAKE2b-256 |
22782326aa090a4492b5b98901fd1f03f828eb7f976b2d9af536690b628da2c1
|