Skip to main content

Simple CLI labeling tool for text classification

Project description

theme

Simple CLI labeling tool for text classification

It allows to rapidly acquire manually labeled texts without the need to setup any large-scale labeling solution.

With the least requirements possible one can get an initial dataset to train text classification model.

Installation

pip install theme-label

Usage

To use theme you will need:

  • .csv table with at least two columns: the one with texts and their id's
  • The following script
from theme import Theme

# This is the dict that maps
# what user enters to what goes
# to the table
id2label = {
    '0': 'ham',
    '1': 'spam'
}

# Here markup session is initialized
# data is loaded and everything prepared
t = Theme(
    id2label=id2label,
    text_col='text', # Name of the column with texts
    show_cols=['title'], # Additional fields to show during labeling
    unmarked_table='data.csv', # Our input table
    marked_table='markup.csv', # Output table will have same columns with additional one for label
    label_col='label', # The name of additional column
    id_col='id', # The name of id column
    select_label=None # If you already have labels in label_col and want to relabel some label
)

# Here labeling session is run
t.run()

Labeling process

The info on number of already marked, unmarked and skipped presented to the user first. Then the available options are printed - which input stands for which class.

Finally there are some additional user-defined fields and the text to label. The user is prompted to choose the label.

If entered label is empty, then the text is marked as skipped and will not appear in this session.
If entered label is space, then the previous markedtext is prompted instread of current one.
If the label is not in the id2label the user is prompted to enter the label again.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theme-label-0.1.0.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

theme_label-0.1.0-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page