Skip to main content

Easy to read and edit text annotations for NLP tasks.

Project description

Forschungszentrum Juelich Logo

emoji-annotations

Easy to read and edit text annotations for NLP tasks.

Using colorful emojis is an easy and effective way to annotate text. Emoji annotations are easy to read and edit without requiring specialized software. Simply use your preferred text editor to curate your data.

Why emojis?

They are easy to spot, distinguish, and edit — and are fun to use! Data formats used to store annotations for sequence annotation tasks are often difficult for humans to read and are usually viewed and edited with specialized software. Using emojis makes annotations easily recognizable and editable in any text editor. Ideally, use emojis that resemble the entity type (e.g., 📆,⏰️,📍,🏛️,🎨, etc.) or that are of different colors (e.g., 🍎,🥝,🍊,🍌,🍉,🍇, etc.).

Emoji annotations are:

  • Easy to setup: No need for special software, just use your favorite text editor.
  • Easy to read: Colors pop out and are easy to distinguish.
  • Easy to edit: Edits are quick and easy because emojis are just one character to move per annotation boundary. In addition, you can use the search and replace function and other features of your favorite text editor to efficiently edit many annotations.

Limitations

  • Works only for text genres in which the emojis selected as annotation boundaries are unlikely to appear in the text.
  • Because the same emoji is used as start and end markers, nested or overlapping annotations of the same entity type are not supported.

Installation

Create and activate a virtual environment. Then, install the package via pip:

pip install emoji-annotations

Supported tasks

Emoji annotations are best suited for tasks that are typically approached with sequence labeling, such as named entity recognition (NER). You can also use them for relation extraction, template filling, or event extraction, provided that you use relation-specific tagging and only annotate one n-ary relation, template, or event per record.

Usage

Create a new emoji annotation object using whichever emoji mapping you prefer. A mapping is a dictionary that associates entity types with emojis.

from emoji_annotations import EmojiAnnotator
emoji_mapping = {
        "artwork": "🎨",
        "painter": "👨‍🎨",
        "museum": "🏛️",
        "location": "📍",
        "year": "📆",
    }
emoji_nlp = EmojiAnnotator(emoji_mapping)

Convert annotated text to plain text and annnoations as char offsets.

text = "The 🎨Mona Lisa🎨 is believed to have been painted by 👨‍🎨Leonardo da Vinci👨‍🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍."
plain_text, annotations = emoji_nlp.from_inline_annotations(text)
print(plain_text)
print(annotations)
The Mona Lisa is believed to have been painted by Leonardo da Vinci between 1503 and 1506 and is now displayed in the Louvre, Paris.
{'artwork': [(4, 13)], 'painter': [(50, 67)], 'year': [(76, 80), (85, 89)], 'location': [(118, 131)], 'museum': [(118, 124)]}

Convert plain text and annotations back to annotated text.

annotated_text = emoji_nlp.to_inline_annotations(plain_text, annotations)
print(annotated_text)
The 🎨Mona Lisa🎨 is believed to have been painted by 👨‍🎨Leonardo da Vinci👨‍🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍.

If two emoji have the same character offset, the emoji that closes the active annotation is placed first. If the order was different in the original text, the order is not preserved (e.g., "🏛️📍Louvre🏛️, Paris📍" would become "📍🏛️Louvre🏛️, Paris📍").

Use the command line to curate annotations by integrating emoji_nlp.get_user_feedback() in a Python script. This function will prompt the user to confirm or edit the annotations in the text.

Computer says 🗯️

🌶️Andalusia🌶️ has a 🍊surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.

Correct? y/n
(To edit the n-th annotation, enter its number n, e.g. '3', press enter, use the arrow keys to move it, press enter to see the changes, and press enter again to confirm the changes. To delete all annotations press 'd'.)
User input: 3 
🌶️Andalusia🌶️ has a 🔻surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.
User input: → →
🌶️Andalusia🌶️ has a su🔻rface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.

Correct? y/n

Comparison of NER annotation formats

Comparing different NER annotation formats, we can see that the colorful emoji annotations are much easier to read and edit than the other formats. While this small example already makes the difference obvious, it becomes even more pronounced with larger datasets.

Colorful emoji annotations

🏢U.N.🏢 official 🙋Ekeus🙋 heads for 📍Baghdad📍.

CoNLL 2003 NER format

(https://www.cnts.ua.ac.be/conll2003/ner/)

U.N.         NNP  I-NP  I-ORG 
official     NN   I-NP  O 
Ekeus        NNP  I-NP  I-PER 
heads        VBZ  I-VP  O 
for          IN   I-PP  O 
Baghdad      NNP  I-NP  I-LOC 
.            .    O     O 

brat standoff format

(https://brat.nlplab.org/standoff.html)

T1  Organization 0 3 U.N.
T2  Person 4 10 Ekeus
T3  Location 20 28 Baghdad

XML-based formats

<p><EM ID="1" CATEG="ORGANIZATION">U.N.</EM> official <EM ID="2" CATEG="PERSON">Ekeus</EM> heads for <EM ID="3" CATEG="LOCATION">Baghdad</EM>.</p>

Development

To update the list of supported emojis, run

python src/emoji_annotations/scripts/update_emoji_list.py

About Us

Institute image ICE-2

We are the Institute of Climate and Energy Systems (ICE) - Jülich Systems Analysis belonging to the Forschungszentrum Jülich. Our interdisciplinary department's research is focusing on energy-related process and systems analyses. Data searches and system simulations are used to determine energy and mass balances, as well as to evaluate performance, emissions and costs of energy systems. The results are used for performing comparative assessment studies between the various systems. Our current priorities include the development of energy strategies, in accordance with the German Federal Government’s greenhouse gas reduction targets, by designing new infrastructures for sustainable and secure energy supply chains and by conducting cost analysis studies for integrating new technologies into future energy market frameworks.

Acknowledgements

The authors would like to thank the German Federal Government, the German state governments, and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) – project number: 442146713. Furthermore, this work was supported by the Helmholtz Association under the program "Energy System Design".

NFDI4Ing LogoHelmholtz Logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emoji_annotations-0.0.0.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emoji_annotations-0.0.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file emoji_annotations-0.0.0.tar.gz.

File metadata

  • Download URL: emoji_annotations-0.0.0.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for emoji_annotations-0.0.0.tar.gz
Algorithm Hash digest
SHA256 a0798f7eaa77f6a0403263e14a87b60d40a61ab4cda9b6dd27767a326a50c608
MD5 467095228d2e7c452898dbd22d53f389
BLAKE2b-256 995296c216bc4eab5889b8886783b9be9b63ccd4ce2b4a49f185d98c91959732

See more details on using hashes here.

File details

Details for the file emoji_annotations-0.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for emoji_annotations-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc0f0149d7f0a6ea6f6a5816a6478a819709072512c770a65557c40610dafda
MD5 8b8ce1dc50991444633caf4b5d3db018
BLAKE2b-256 947addcadc94c7c4e9553273523f998e3efa01c7a7e83f7cf144c1fbccc43039

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page