Streamlit custom component for interactive NER-style text annotation
Project description
Struggle Annotator 
A Streamlit custom component for interactive text annotation, useful for NER-style labeling tasks. The Python wrapper is published as struggle_annotator; the frontend is built with TypeScript and React per the standard Streamlit Components pattern.
Installation
pip install struggle-annotator
Quick Start
import streamlit as st
from struggle_annotator import txt_annotator
text = (
"Yesterday, at 3 PM, Emily Johnson and Michael Smith met at the Central Park "
"in New York to discuss the merger between TechCorp and Global Solutions.\n\n"
"The deal, worth approximately 500 million dollars, is expected to "
"significantly impact the tech industry. Later, at 6 PM, they joined a "
"conference call with the CEO of TechCorp, David Brown, who was in London "
"for a technology summit. During the call, they discussed the market trends "
"in Asia and Europe and planned for the next quarterly meeting, which is "
"scheduled for January 15th, 2024, in Paris."
)
label_dict = {
"Personal names": {"color": "red"},
"Organizations": {"color": "blue"},
"Locations": {"color": "green"},
"Time": {"color": "orange"},
"Money": {"color": "purple"},
}
label_dict = txt_annotator(text, label_dict)
st.json(label_dict)
UI Layout
The component renders two stacked regions:
- Top — Entity legend. One button per entity, styled with that entity's color. Clicking a button makes it the active entity; the active button is visually emphasized (border + slight scale).
- Bottom — Annotatable document. The full
textis shown with any existing annotations highlighted in their entity's color. Below the text, a live status line shows the currently selected span and the active label.
API
Signature
txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict
Parameters
text(str): The raw text to annotate. Treated as a Python string; offsets are measured in Pythonstrindices (UTF-16-independent, code-point–based).label_dict(dict[str, dict]): Defines entities. Each key is the label name; each value must contain:color(str, required): Any valid CSS color ("red","#ff8800","rgb(0, 128, 255)").annotation(list[dict], optional): Pre-existing spans rendered on load. Each entry has the shape{"start": int, "end": int, "value": str}. If omitted, it is initialized to[].
key(str, optional): Standard Streamlit component key. Required if you render multiple annotators on the same page.
Returns
The same label_dict shape, with every entity's annotation list reflecting the current state of the UI. The function returns on every interaction (Streamlit's standard component re-run model), so the latest annotations are always available after the call.
Offsets
- Half-open interval
[start, end), matching Python slicing:text[start:end] == value. - Indices are over the raw
textstring, including newlines and whitespace.
Annotation Workflow
- The user clicks an entity button in the legend. That entity becomes active.
- The user selects a span of text with the mouse.
- On
mouseup, leading/trailing whitespace is trimmed from the selection. If the trimmed span is empty, the selection is ignored. - The trimmed span is highlighted in the active entity's color and appended to that entity's
annotationlist. - The status line below the text updates to show the selected text and label.
- To remove an annotation, the user clicks an existing highlighted span (a single click with no drag). The highlight is removed and the corresponding entry is deleted from
label_dict.
If text is selected while no entity is active, the selection is ignored and a hint is shown in the status line ("Select an entity first").
Overlap Policy
New spans that overlap an existing annotation are rejected by default, and a brief warning is shown in the status line. This avoids ambiguous nested annotations in v1. (Allowing nesting or replacement is out of scope for the initial release; see Non-goals below.)
Click vs. Drag
A click on a highlighted span is interpreted as "remove" only when the mousedown and mouseup positions are within the same span and no selection range was produced. Any drag that produces a non-empty selection is treated as a new annotation attempt, never as a remove.
State Model
The component uses Streamlit's standard component value mechanism. Internally, the frontend keeps its own annotation state and sends the updated label_dict back to Python on every change. Streamlit re-runs the script with the new return value; no st.session_state plumbing is required from the caller.
If you want to persist annotations across page reloads or sessions, store the returned label_dict in st.session_state or write it to disk yourself.
Data Examples
Input
label_dict = {
"Personal names": {"color": "red"},
"Organizations": {"color": "blue"},
"Locations": {"color": "green"},
"Time": {"color": "orange"},
"Money": {"color": "purple"},
}
Output (after annotation)
{
"Personal names": {
"color": "red",
"annotation": [
{"start": 20, "end": 33, "value": "Emily Johnson"},
{"start": 38, "end": 51, "value": "Michael Smith"},
{"start": 327, "end": 338, "value": "David Brown"},
],
},
"Organizations": {
"color": "blue",
"annotation": [
{"start": 118, "end": 126, "value": "TechCorp"},
{"start": 131, "end": 147, "value": "Global Solutions"},
],
},
"Locations": {
"color": "green",
"annotation": [
{"start": 63, "end": 75, "value": "Central Park"},
{"start": 79, "end": 87, "value": "New York"},
{"start": 351, "end": 357, "value": "London"},
{"start": 436, "end": 440, "value": "Asia"},
{"start": 445, "end": 451, "value": "Europe"},
{"start": 542, "end": 547, "value": "Paris"},
],
},
"Time": {
"color": "orange",
"annotation": [
{"start": 0, "end": 9, "value": "Yesterday"},
{"start": 14, "end": 18, "value": "3 PM"},
{"start": 265, "end": 269, "value": "6 PM"},
{"start": 519, "end": 531, "value": "January 15th"},
{"start": 533, "end": 537, "value": "2024"},
],
},
"Money": {
"color": "purple",
"annotation": [
{"start": 179, "end": 198, "value": "500 million dollars"},
],
},
}
Annotations within each entity are sorted by start ascending. Key order within each annotation is start, end, value.
Non-goals (v1)
- Nested or overlapping annotations.
- Relation annotation between spans.
- Multi-document workflows or document navigation.
- Keyboard shortcuts (planned for a future release).
- Annotation history / undo-redo beyond the most recent action.
Development
The frontend lives in frontend/ (TypeScript + React, built with Vite). The Python wrapper in struggle_annotator/__init__.py declares the component via streamlit.components.v1.declare_component and re-exports txt_annotator.
# Frontend dev
cd frontend
npm install
npm run dev
# Python (editable install)
pip install -e .
Set _RELEASE = False in struggle_annotator/__init__.py during local development to point at the Vite dev server.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file struggle_annotator-0.2.4.tar.gz.
File metadata
- Download URL: struggle_annotator-0.2.4.tar.gz
- Upload date:
- Size: 102.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5e3a320830708f2e622652ce392bab2533cfc11b67cabce8330915a746c6b99
|
|
| MD5 |
fbceff52e8d3a18b3cb1e114be5cdb96
|
|
| BLAKE2b-256 |
cd3a819d58565708cf1a634b480f0bc54cb415a8ca9e172b99eb5c749c679aff
|
File details
Details for the file struggle_annotator-0.2.4-py3-none-any.whl.
File metadata
- Download URL: struggle_annotator-0.2.4-py3-none-any.whl
- Upload date:
- Size: 100.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b47802a53ccda0f936a1101ac0a5c672cc79e6ddb8ad5c6114c754a841a7f7f
|
|
| MD5 |
9b39ce5bdf8db312b90f3d6df97e4b8e
|
|
| BLAKE2b-256 |
99cc7bfeb24aa127b82c99272ae68c800d3e607a1a88b28f1190f5c1c6f9cc69
|