Skip to main content

Streamlit custom component for interactive NER-style text annotation

Project description

Struggle Annotator Try it on Streamlit

A Streamlit custom component for interactive text annotation, useful for NER-style labeling tasks. The Python wrapper is published as struggle_annotator; the frontend is built with TypeScript and React per the standard Streamlit Components pattern.

Example screenshot

Installation

pip install struggle-annotator

Quick Start

import streamlit as st
from struggle_annotator import txt_annotator

text = (
    "Yesterday, at 3 PM, Emily Johnson and Michael Smith met at the Central Park "
    "in New York to discuss the merger between TechCorp and Global Solutions.\n\n"
    "The deal, worth approximately 500 million dollars, is expected to "
    "significantly impact the tech industry. Later, at 6 PM, they joined a "
    "conference call with the CEO of TechCorp, David Brown, who was in London "
    "for a technology summit. During the call, they discussed the market trends "
    "in Asia and Europe and planned for the next quarterly meeting, which is "
    "scheduled for January 15th, 2024, in Paris."
)

label_dict = {
    "Personal names": {"color": "red"},
    "Organizations":  {"color": "blue"},
    "Locations":      {"color": "green"},
    "Time":           {"color": "orange"},
    "Money":          {"color": "purple"},
}

label_dict = txt_annotator(text, label_dict)
st.json(label_dict)

UI Layout

The component renders two stacked regions:

  1. Top — Entity legend. One button per entity, styled with that entity's color. Clicking a button makes it the active entity; the active button is visually emphasized (border + slight scale).
  2. Bottom — Annotatable document. The full text is shown with any existing annotations highlighted in their entity's color. Below the text, a live status line shows the currently selected span and the active label.

API

Signature

txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict

Parameters

  • text (str): The raw text to annotate. Treated as a Python string; offsets are measured in Python str indices (UTF-16-independent, code-point–based).
  • label_dict (dict[str, dict]): Defines entities. Each key is the label name; each value must contain:
    • color (str, required): Any valid CSS color ("red", "#ff8800", "rgb(0, 128, 255)").
    • annotation (list[dict], optional): Pre-existing spans rendered on load. Each entry has the shape {"start": int, "end": int, "value": str}. If omitted, it is initialized to [].
  • key (str, optional): Standard Streamlit component key. Required if you render multiple annotators on the same page.

Returns

The same label_dict shape, with every entity's annotation list reflecting the current state of the UI. The function returns on every interaction (Streamlit's standard component re-run model), so the latest annotations are always available after the call.

Offsets

  • Half-open interval [start, end), matching Python slicing: text[start:end] == value.
  • Indices are over the raw text string, including newlines and whitespace.

Annotation Workflow

  1. The user clicks an entity button in the legend. That entity becomes active.
  2. The user selects a span of text with the mouse.
  3. On mouseup, leading/trailing whitespace is trimmed from the selection. If the trimmed span is empty, the selection is ignored.
  4. The trimmed span is highlighted in the active entity's color and appended to that entity's annotation list.
  5. The status line below the text updates to show the selected text and label.
  6. To remove an annotation, the user clicks an existing highlighted span (a single click with no drag). The highlight is removed and the corresponding entry is deleted from label_dict.

If text is selected while no entity is active, the selection is ignored and a hint is shown in the status line ("Select an entity first").

Overlap Policy

New spans that overlap an existing annotation are rejected by default, and a brief warning is shown in the status line. This avoids ambiguous nested annotations in v1. (Allowing nesting or replacement is out of scope for the initial release; see Non-goals below.)

Click vs. Drag

A click on a highlighted span is interpreted as "remove" only when the mousedown and mouseup positions are within the same span and no selection range was produced. Any drag that produces a non-empty selection is treated as a new annotation attempt, never as a remove.

State Model

The component uses Streamlit's standard component value mechanism. Internally, the frontend keeps its own annotation state and sends the updated label_dict back to Python on every change. Streamlit re-runs the script with the new return value; no st.session_state plumbing is required from the caller.

If you want to persist annotations across page reloads or sessions, store the returned label_dict in st.session_state or write it to disk yourself.

Data Examples

Input

label_dict = {
    "Personal names": {"color": "red"},
    "Organizations":  {"color": "blue"},
    "Locations":      {"color": "green"},
    "Time":           {"color": "orange"},
    "Money":          {"color": "purple"},
}

Output (after annotation)

{
    "Personal names": {
        "color": "red",
        "annotation": [
            {"start": 20,  "end": 33,  "value": "Emily Johnson"},
            {"start": 38,  "end": 51,  "value": "Michael Smith"},
            {"start": 327, "end": 338, "value": "David Brown"},
        ],
    },
    "Organizations": {
        "color": "blue",
        "annotation": [
            {"start": 118, "end": 126, "value": "TechCorp"},
            {"start": 131, "end": 147, "value": "Global Solutions"},
        ],
    },
    "Locations": {
        "color": "green",
        "annotation": [
            {"start": 63,  "end": 75,  "value": "Central Park"},
            {"start": 79,  "end": 87,  "value": "New York"},
            {"start": 351, "end": 357, "value": "London"},
            {"start": 436, "end": 440, "value": "Asia"},
            {"start": 445, "end": 451, "value": "Europe"},
            {"start": 542, "end": 547, "value": "Paris"},
        ],
    },
    "Time": {
        "color": "orange",
        "annotation": [
            {"start": 0,   "end": 9,   "value": "Yesterday"},
            {"start": 14,  "end": 18,  "value": "3 PM"},
            {"start": 265, "end": 269, "value": "6 PM"},
            {"start": 519, "end": 531, "value": "January 15th"},
            {"start": 533, "end": 537, "value": "2024"},
        ],
    },
    "Money": {
        "color": "purple",
        "annotation": [
            {"start": 179, "end": 198, "value": "500 million dollars"},
        ],
    },
}

Annotations within each entity are sorted by start ascending. Key order within each annotation is start, end, value.

Non-goals (v1)

  • Nested or overlapping annotations.
  • Relation annotation between spans.
  • Multi-document workflows or document navigation.
  • Keyboard shortcuts (planned for a future release).
  • Annotation history / undo-redo beyond the most recent action.

Development

The frontend lives in frontend/ (TypeScript + React, built with Vite). The Python wrapper in struggle_annotator/__init__.py declares the component via streamlit.components.v1.declare_component and re-exports txt_annotator.

# Frontend dev
cd frontend
npm install
npm run dev

# Python (editable install)
pip install -e .

Set _RELEASE = False in struggle_annotator/__init__.py during local development to point at the Vite dev server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

struggle_annotator-0.2.4.tar.gz (102.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

struggle_annotator-0.2.4-py3-none-any.whl (100.0 kB view details)

Uploaded Python 3

File details

Details for the file struggle_annotator-0.2.4.tar.gz.

File metadata

  • Download URL: struggle_annotator-0.2.4.tar.gz
  • Upload date:
  • Size: 102.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for struggle_annotator-0.2.4.tar.gz
Algorithm Hash digest
SHA256 a5e3a320830708f2e622652ce392bab2533cfc11b67cabce8330915a746c6b99
MD5 fbceff52e8d3a18b3cb1e114be5cdb96
BLAKE2b-256 cd3a819d58565708cf1a634b480f0bc54cb415a8ca9e172b99eb5c749c679aff

See more details on using hashes here.

File details

Details for the file struggle_annotator-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for struggle_annotator-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9b47802a53ccda0f936a1101ac0a5c672cc79e6ddb8ad5c6114c754a841a7f7f
MD5 9b39ce5bdf8db312b90f3d6df97e4b8e
BLAKE2b-256 99cc7bfeb24aa127b82c99272ae68c800d3e607a1a88b28f1190f5c1c6f9cc69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page