Easy to read and edit text annotations for NLP tasks.
Project description
emoji-annotations
Easy to read and edit text annotations for NLP tasks.
Using colorful emojis is an easy and effective way to annotate text. Emoji annotations are easy to read and edit without requiring specialized software. Simply use your preferred text editor to curate your data.
Why emojis?
They are easy to spot, distinguish, and edit — and are fun to use! Data formats used to store annotations for sequence annotation tasks are often difficult for humans to read and are usually viewed and edited with specialized software. Using emojis makes annotations easily recognizable and editable in any text editor. Ideally, use emojis that resemble the entity type (e.g., 📆,⏰️,📍,🏛️,🎨, etc.) or that are of different colors (e.g., 🍎,🥝,🍊,🍌,🍉,🍇, etc.).
Emoji annotations are:
- Easy to setup: No need for special software, just use your favorite text editor.
- Easy to read: Colors pop out and are easy to distinguish.
- Easy to edit: Edits are quick and easy because emojis are just one character to move per annotation boundary. In addition, you can use the search and replace function and other features of your favorite text editor to efficiently edit many annotations.
Limitations
- Works only for text genres in which the emojis selected as annotation boundaries are unlikely to appear in the text.
- Because the same emoji is used as start and end markers, nested or overlapping annotations of the same entity type are not supported.
Installation
Create and activate a virtual environment. Then, install the package via pip:
pip install emoji-annotations
Supported tasks
Emoji annotations are best suited for tasks that are typically approached with sequence labeling, such as named entity recognition (NER). You can also use them for relation extraction, template filling, or event extraction, provided that you use relation-specific tagging and only annotate one n-ary relation, template, or event per record.
Usage
Create a new emoji annotation object using whichever emoji mapping you prefer. A mapping is a dictionary that associates entity types with emojis.
from emoji_annotations import EmojiAnnotator
emoji_mapping = {
"artwork": "🎨",
"painter": "👨🎨",
"museum": "🏛️",
"location": "📍",
"year": "📆",
}
emoji_nlp = EmojiAnnotator(emoji_mapping)
Convert annotated text to plain text and annnoations as char offsets.
text = "The 🎨Mona Lisa🎨 is believed to have been painted by 👨🎨Leonardo da Vinci👨🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍."
plain_text, annotations = emoji_nlp.from_inline_annotations(text)
print(plain_text)
print(annotations)
The Mona Lisa is believed to have been painted by Leonardo da Vinci between 1503 and 1506 and is now displayed in the Louvre, Paris.
{'artwork': [(4, 13)], 'painter': [(50, 67)], 'year': [(76, 80), (85, 89)], 'location': [(118, 131)], 'museum': [(118, 124)]}
Convert plain text and annotations back to annotated text.
annotated_text = emoji_nlp.to_inline_annotations(plain_text, annotations)
print(annotated_text)
The 🎨Mona Lisa🎨 is believed to have been painted by 👨🎨Leonardo da Vinci👨🎨 between 📆1503📆 and 📆1506📆 and is now displayed in the 📍🏛️Louvre🏛️, Paris📍.
If two emoji have the same character offset, the emoji that closes the active annotation is placed first. If the order was different in the original text, the order is not preserved (e.g., "🏛️📍Louvre🏛️, Paris📍" would become "📍🏛️Louvre🏛️, Paris📍").
Use the command line to curate annotations by integrating emoji_nlp.get_user_feedback() in a Python script. This function will prompt the user to confirm or edit the annotations in the text.
Computer says 🗯️
🌶️Andalusia🌶️ has a 🍊surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.
Correct? y/n
(To edit the n-th annotation, enter its number n, e.g. '3', press enter, use the arrow keys to move it, press enter to see the changes, and press enter again to confirm the changes. To delete all annotations press 'd'.)
User input: 3
🌶️Andalusia🌶️ has a 🔻surface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.
User input: → →
🌶️Andalusia🌶️ has a su🔻rface area🍊 of 🍏87,597🍏 🍓square kilometres🍓.
Correct? y/n
Comparison of NER annotation formats
Comparing different NER annotation formats, we can see that the colorful emoji annotations are much easier to read and edit than the other formats. While this small example already makes the difference obvious, it becomes even more pronounced with larger datasets.
Colorful emoji annotations
🏢U.N.🏢 official 🙋Ekeus🙋 heads for 📍Baghdad📍.
CoNLL 2003 NER format
(https://www.cnts.ua.ac.be/conll2003/ner/)
U.N. NNP I-NP I-ORG
official NN I-NP O
Ekeus NNP I-NP I-PER
heads VBZ I-VP O
for IN I-PP O
Baghdad NNP I-NP I-LOC
. . O O
brat standoff format
(https://brat.nlplab.org/standoff.html)
T1 Organization 0 3 U.N.
T2 Person 4 10 Ekeus
T3 Location 20 28 Baghdad
XML-based formats
<p><EM ID="1" CATEG="ORGANIZATION">U.N.</EM> official <EM ID="2" CATEG="PERSON">Ekeus</EM> heads for <EM ID="3" CATEG="LOCATION">Baghdad</EM>.</p>
Development
To update the list of supported emojis, run
python src/emoji_annotations/scripts/update_emoji_list.py
About Us
We are the Institute of Climate and Energy Systems (ICE) - Jülich Systems Analysis belonging to the Forschungszentrum Jülich. Our interdisciplinary department's research is focusing on energy-related process and systems analyses. Data searches and system simulations are used to determine energy and mass balances, as well as to evaluate performance, emissions and costs of energy systems. The results are used for performing comparative assessment studies between the various systems. Our current priorities include the development of energy strategies, in accordance with the German Federal Government’s greenhouse gas reduction targets, by designing new infrastructures for sustainable and secure energy supply chains and by conducting cost analysis studies for integrating new technologies into future energy market frameworks.
Acknowledgements
The authors would like to thank the German Federal Government, the German state governments, and the Joint Science Conference (GWK) for their funding and support as part of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) – project number: 442146713. Furthermore, this work was supported by the Helmholtz Association under the program "Energy System Design".
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emoji_annotations-0.0.0.tar.gz.
File metadata
- Download URL: emoji_annotations-0.0.0.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0798f7eaa77f6a0403263e14a87b60d40a61ab4cda9b6dd27767a326a50c608
|
|
| MD5 |
467095228d2e7c452898dbd22d53f389
|
|
| BLAKE2b-256 |
995296c216bc4eab5889b8886783b9be9b63ccd4ce2b4a49f185d98c91959732
|
File details
Details for the file emoji_annotations-0.0.0-py3-none-any.whl.
File metadata
- Download URL: emoji_annotations-0.0.0-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fc0f0149d7f0a6ea6f6a5816a6478a819709072512c770a65557c40610dafda
|
|
| MD5 |
8b8ce1dc50991444633caf4b5d3db018
|
|
| BLAKE2b-256 |
947addcadc94c7c4e9553273523f998e3efa01c7a7e83f7cf144c1fbccc43039
|