Skip to main content

Streamlit custom component for interactive NER-style text annotation

Project description

Struggle Annotator Try it on Streamlit

A Streamlit custom component for interactive text annotation, useful for NER-style labeling tasks. The Python wrapper is published as struggle_annotator; the frontend is built with TypeScript and React per the standard Streamlit Components pattern.

Example screenshot

Installation

pip install struggle-annotator

Quick Start

import streamlit as st
from struggle_annotator import txt_annotator

text = (
    "Yesterday, at 3 PM, Emily Johnson and Michael Smith met at the Central Park "
    "in New York to discuss the merger between TechCorp and Global Solutions.\n\n"
    "The deal, worth approximately 500 million dollars, is expected to "
    "significantly impact the tech industry. Later, at 6 PM, they joined a "
    "conference call with the CEO of TechCorp, David Brown, who was in London "
    "for a technology summit. During the call, they discussed the market trends "
    "in Asia and Europe and planned for the next quarterly meeting, which is "
    "scheduled for January 15th, 2024, in Paris."
)

label_dict = {
    "Personal names": {"color": "red"},
    "Organizations":  {"color": "blue"},
    "Locations":      {"color": "green"},
    "Time":           {"color": "orange"},
    "Money":          {"color": "purple"},
}

label_dict = txt_annotator(text, label_dict)
st.json(label_dict)

UI Layout

The component renders two stacked regions:

  1. Top — Entity legend. One button per entity, styled with that entity's color. Clicking a button makes it the active entity; the active button is visually emphasized (border + slight scale).
  2. Bottom — Annotatable document. The full text is shown with any existing annotations highlighted in their entity's color. Below the text, a live status line shows the currently selected span and the active label.

API

Signature

txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict

Parameters

  • text (str): The raw text to annotate. Treated as a Python string; offsets are measured in Python str indices (UTF-16-independent, code-point–based).
  • label_dict (dict[str, dict]): Defines entities. Each key is the label name; each value must contain:
    • color (str, required): Any valid CSS color ("red", "#ff8800", "rgb(0, 128, 255)").
    • annotation (list[dict], optional): Pre-existing spans rendered on load. Each entry has the shape {"start": int, "end": int, "value": str}. If omitted, it is initialized to [].
  • key (str, optional): Standard Streamlit component key. Required if you render multiple annotators on the same page.

Returns

The same label_dict shape, with every entity's annotation list reflecting the current state of the UI. The function returns on every interaction (Streamlit's standard component re-run model), so the latest annotations are always available after the call.

Offsets

  • Half-open interval [start, end), matching Python slicing: text[start:end] == value.
  • Indices are over the raw text string, including newlines and whitespace.

Annotation Workflow

  1. The user clicks an entity button in the legend. That entity becomes active.
  2. The user selects a span of text with the mouse.
  3. On mouseup, leading/trailing whitespace is trimmed from the selection. If the trimmed span is empty, the selection is ignored.
  4. The trimmed span is highlighted in the active entity's color and appended to that entity's annotation list.
  5. The status line below the text updates to show the selected text and label.
  6. To remove an annotation, the user clicks an existing highlighted span (a single click with no drag). The highlight is removed and the corresponding entry is deleted from label_dict.

If text is selected while no entity is active, the selection is ignored and a hint is shown in the status line ("Select an entity first").

Overlap Policy

New spans that overlap an existing annotation are rejected by default, and a brief warning is shown in the status line. This avoids ambiguous nested annotations in v1. (Allowing nesting or replacement is out of scope for the initial release; see Non-goals below.)

Click vs. Drag

A click on a highlighted span is interpreted as "remove" only when the mousedown and mouseup positions are within the same span and no selection range was produced. Any drag that produces a non-empty selection is treated as a new annotation attempt, never as a remove.

State Model

The component uses Streamlit's standard component value mechanism. Internally, the frontend keeps its own annotation state and sends the updated label_dict back to Python on every change. Streamlit re-runs the script with the new return value; no st.session_state plumbing is required from the caller.

If you want to persist annotations across page reloads or sessions, store the returned label_dict in st.session_state or write it to disk yourself.

Data Examples

Input

label_dict = {
    "Personal names": {"color": "red"},
    "Organizations":  {"color": "blue"},
    "Locations":      {"color": "green"},
    "Time":           {"color": "orange"},
    "Money":          {"color": "purple"},
}

Output (after annotation)

{
    "Personal names": {
        "color": "red",
        "annotation": [
            {"start": 20,  "end": 33,  "value": "Emily Johnson"},
            {"start": 38,  "end": 51,  "value": "Michael Smith"},
            {"start": 327, "end": 338, "value": "David Brown"},
        ],
    },
    "Organizations": {
        "color": "blue",
        "annotation": [
            {"start": 118, "end": 126, "value": "TechCorp"},
            {"start": 131, "end": 147, "value": "Global Solutions"},
        ],
    },
    "Locations": {
        "color": "green",
        "annotation": [
            {"start": 63,  "end": 75,  "value": "Central Park"},
            {"start": 79,  "end": 87,  "value": "New York"},
            {"start": 351, "end": 357, "value": "London"},
            {"start": 436, "end": 440, "value": "Asia"},
            {"start": 445, "end": 451, "value": "Europe"},
            {"start": 542, "end": 547, "value": "Paris"},
        ],
    },
    "Time": {
        "color": "orange",
        "annotation": [
            {"start": 0,   "end": 9,   "value": "Yesterday"},
            {"start": 14,  "end": 18,  "value": "3 PM"},
            {"start": 265, "end": 269, "value": "6 PM"},
            {"start": 519, "end": 531, "value": "January 15th"},
            {"start": 533, "end": 537, "value": "2024"},
        ],
    },
    "Money": {
        "color": "purple",
        "annotation": [
            {"start": 179, "end": 198, "value": "500 million dollars"},
        ],
    },
}

Annotations within each entity are sorted by start ascending. Key order within each annotation is start, end, value.

Non-goals (v1)

  • Nested or overlapping annotations.
  • Relation annotation between spans.
  • Multi-document workflows or document navigation.
  • Keyboard shortcuts (planned for a future release).
  • Annotation history / undo-redo beyond the most recent action.

Development

The frontend lives in frontend/ (TypeScript + React, built with Vite). The Python wrapper in struggle_annotator/__init__.py declares the component via streamlit.components.v1.declare_component and re-exports txt_annotator.

# Frontend dev
cd frontend
npm install
npm run dev

# Python (editable install)
pip install -e .

Set _RELEASE = False in struggle_annotator/__init__.py during local development to point at the Vite dev server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

struggle_annotator-0.2.5.tar.gz (102.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

struggle_annotator-0.2.5-py3-none-any.whl (100.1 kB view details)

Uploaded Python 3

File details

Details for the file struggle_annotator-0.2.5.tar.gz.

File metadata

  • Download URL: struggle_annotator-0.2.5.tar.gz
  • Upload date:
  • Size: 102.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for struggle_annotator-0.2.5.tar.gz
Algorithm Hash digest
SHA256 29219814cf0a4d65e9202d7f3f65e6859402dcd9556cb795db6f4942190516a1
MD5 76c1829afc360ee96b380e29ac9428a4
BLAKE2b-256 f0b4ad4dc556cc3d9654041f793a8443667d8b5c8cff67e3c5632bb96c365ee5

See more details on using hashes here.

File details

Details for the file struggle_annotator-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for struggle_annotator-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cb79a665df2f8e66378d4c7796b2188010234aba57ec21beea385ffb5fdc7d0d
MD5 26b396c1be66dbbff9e8d17864e933fc
BLAKE2b-256 c5da5ed66e8029c24133d152fedcca9130d78ca61e5022f4d2f64410c3e53333

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page