Skip to main content

Clean minimal text-annotation tool - with buildt in control of Inter-Annotator Agreement

Project description

labton

What's this then?

Labton (LABeling TOol Now!) is an OpenSource annotation tool for creating text data sets for machine learning models. The length of the paragraphs to annotate is defined by you. Labton also allows for extraction of words in the given pargraph - thus the labels of the paragraph can be a word from the paragraph. You can also use Labton to help control Inter-Annotator Agreement (ie. securing that your annotaters agree on the defined classes).

But why?

This tool is thus used in order to create a labeled data set of text. As such this library is for you if you:

  1. Are aiming to build a text-classifiaction or -extraction algorithm, and are putting together a dataset.
  2. Need high quality of the labels (secure Inter-Annotator Agreement)
  3. Want a webpage that can be used to annotate your text both on PC and smartphone.
  4. Prioritize getting up and running fast and efficiently (can be done in seconds if the data is ready)
  5. Actually wanted to use prodigy but need to do a proof a concept before spending the money;)

The following is a standart structure for the development of a NLP-project: Review interface

Labton is developed such that there is a painless integreation from the stages before and after the data annotation step. This is done by making both data integration and data extration possible within the same environment ie. a python IDE such as Jupyter Notebook.

How does it work?

The Python library includes an easily deployable Flask web app that allows one to annotate the text in self defined classes. The app can be deployed directly from a Terminal or through a Jupyter Notebook for lighter annotation tasks. The classes, and many other aspects of the app, are easily configurable either via python or through YAML configguration files.

The data can be easily uploaded to the app. This can be done by directly pointing to a .cvs file (and specifying the delimiter) or by simply feeding in a Pandas Dataframe (and specifying the text column). The app will then structure the data and present one segment at a time for human classification in the UI.

How do I get started in a jupyter notebook (access localy)

Firstly pip install labton in your virtual environment of choice From there simply open jupyter notebook and run the following from a cell.

In order to test the interface, with sample data from the package, simply instantiate and run the App object without any data source or classes

from labton.annotation_app import App
App().run()

The app should now be running at localhost:8080
Type any name to login (no password needed for now)

When you've verified that Labton is useable for your usecase you can go ahead and boot up the app with your own data like so:

from labton.annotation_app import App
app = App(data_source='<your data source>, classes=<your list of classes>)

<your list of classes> is a list contatining the classes you want to classify the text into. For a tweet sentiment classifier this could look like: classes=["positive", "Negative", "Neutral"]
<your data source> can either be a path to the .csv file or alternatively a pandas.DataFrame.
If a it is a Datafame the column_name of the column contatining the text needs to be defined. If a .csv file only contating the text, then the delimiter of the files need to be fed to the App object as csv_sep. You can also set the project_name (used when running many projects from the same environment), the port, the directory you want the config- and data files created in, along with many other setings.

In order to boot up the server and start annotating simply run:

app.run()

The jupyter notebook will be occupied as long as the server is running. You can start the server again and it will continue where you left from.

At any point in the process you can extract a copy of the data as a pandas.DataFrame like so:

from labton.labton_backend.helper_funcs import get_labton_data
df_labeled_data = get_labton_data()

If you have given a your project a custom project_name hand this to the get_labton_data function in order to load the data from the correct database file.

How do I get started with running application on server (access from internet)

If you don't already have an intenet facing server I can recomend starting a free account on pythonanywhere) and booting up a server.

From here simply:

  1. Go to the terminal
  2. Create and activate a virtual environment (optional)
  3. pip install labton or python3 -m pip install labton

In order to test the interface, with sample data from the package, simply go to the folder where you want to initialise the project and run the library as so:

python -m labton

This will create a folder for the config files and a folder for data base files and start the server with a few test sentances.
The server can now be accessed locally for a testrun at localhost:8080

You can now configure the YAML file found in the projects_config_files with the port, host, etc. that mathces your firewall configurations.

TODO: Add easy functionality and walkthrough for how to setup a python anywhere app

annotation interface

Review interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labton-0.2.5.tar.gz (348.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

labton-0.2.5-py3-none-any.whl (336.7 kB view details)

Uploaded Python 3

File details

Details for the file labton-0.2.5.tar.gz.

File metadata

  • Download URL: labton-0.2.5.tar.gz
  • Upload date:
  • Size: 348.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for labton-0.2.5.tar.gz
Algorithm Hash digest
SHA256 0ddf89d21a11ddd8cfb2076554458b0502fe82fd653141cc77da4521dac31faf
MD5 1b501c4bbf4d3c2a93b565e3fb2768e8
BLAKE2b-256 4c55bec1788e93089d584209cc5c34d838b637a71349e47c342bb2662b7a7105

See more details on using hashes here.

File details

Details for the file labton-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: labton-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 336.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for labton-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ebdf50aa95d0264b0c45b72e1790145536396ecd2a5ef2192ca171af884f5165
MD5 72d0556e3888b8a7cbc49e6d38aa4c62
BLAKE2b-256 f7b743ba6466b5ef2876650c97d94a3da1a10b2779349541932f82678d859f26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page