Customizable tool for easy manual annotation

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Humannotator

Library for conveniently creating simple customizable annotators for manual annotation of your data
Jenia Kim, Lawrence Vriend

Works well with Jupyter notebooks:

Use case

The humannotator provides an easy way to set up custom annotators. This tool is for you if manual annotation is part of your workflow and you are looking for a solution that is:

Lightweight
Customizable
Easy to set up
Integrates with Jupyter/pandas/Python

Quick start

    import pandas as pd
    from humannotator import Annotator, task_factory

    df = pd.read_csv('news.csv', index_col=0)
    cols = ['title', 'date', 'news_id']

    choices={
        '0': 'not adverse media',
        '1': 'adverse media',
        '3': 'exclude from dataset',
    }
    instruct = "What is the topic in the title?"
    task1 = task_factory(choices, 'Adverse media')
    task2 = task_factory(
        'str',
        'Topic',
        instruction=instruct,
        nullable=True
    )

    annotator = Annotator([task1, task2], df[cols])

Annotate your data

Use the annotator by calling it: annotator().
The annotator keeps track of where you were.
Highlight phrases with the 'phrases' argument.
The annotator stores user (if provided) and timestamp with the annotation.

Access your annotations

Access the annotations with the annotated attribute.
Return merged data and annotations with the merged method.
The annotations are conveniently stored in a pandas DataFrame.

Store your annotations

Store the annotator with the save method.
Load the annotator with the load method.

Load data

The annotator accepts list, dict, Series and DataFrame objects as data.
The data will be converted to a dataframe internally.

dataframes

By default, the annotator will use its index and all columns.
Use load_data to create a data object if you need more control:
1. id_col sets the column to be used as index.
2. item_cols set the column or columns to be displayed.

Define tasks

Tasks are set up, using the task_factory. Create a task by passing it:

the kind of task
the name of the task
(optionally) an instruction
(optionally) a list of dependencies
whether it is nullable (default is False)
any kwargs necessary

Typically:

    task_factory(
        'kind',
        'name',
        instruction='instruction',
        dependencies=dependencies,
        nullable=True/False,
        **kwargs,
    )

Passing a dict or list to kind will create a categorical task.
In this case the categories kwarg is ignored.

Setting up tasks through subscription

It is also possible to instantiate an annotator and add tasks through subscription:

    a = Annotator()
    a.tasks['topic'] = ['economy', 'politics', 'media', 'other']
    a.tasks['factual'] = bool, "Is the article factual?", False

To add a task like this, you minimally need to provide the kind of task you are trying to create. Optionally, you can add instruction, nullability, dependencies and any other kwargs (as dictionary). Change the order in which tasks are prompted to the user with the order attribute on tasks.

Available tasks

kind	kwargs	dtype	description
str		object	String
regex	regex	object	String validated by regex
int		Int64	Nullable integer
float		float64	Float
bool		bool	Boolean
category	categories	CategoricalDtype	Categorical variable
date		datetime64[ns]	Date

Dependencies

Dependencies consist of a condition and a value, that can be passed as tuple:

    ("col1 == 'x'", False)

The condition is a pandas query statement. Before prompting the user for input, the condition is evaluated on the current annotation. If the query evaluates to True then the value will be assigned automatically.

Annotator

Calling the annotator

The annotator detects if it is run from Jupyter. If so, the annotator will render itself in html and css. If not, the annotator will render itself as text. You can annotate a selection of records by passing a list of ids to the annotator call. If you want to reannotate ids that have already been annotated, then set redo to True when calling the annotator.

Instantiating the annotator

arguments

tasks : task, list of task or DataFrame, default None
Annotation task(s).
If passed a DataFrame, then the tasks will be inferred from it.
Annotation data in the dataframe will also be initialized.
data : data, list-/dict-like, Series or DataFrame, default None
Data to be annotated.
If `data` is not already a data object,
then it will be passed through `load_data`.
The annotator can be instantiated without data,
but will only work after data is loaded.
user : str, default None
Name of the user.
name : str, default 'HUMANNOTATOR'
Name of the annotator.
save_data : boolean, default False
Set flag to True if you want to store the data with the annotator.
This will ensure that the pickled object, will contain the data.
other parameters

DISPLAY
text_display : boolean, default None
If True will display the annotator in plain text instead of html.
DATA
item_cols : str or list of str, default None
Name(s) of dataframe column(s) to display when annotating.
By default: display all columns.
id_col : str, default None
Name of dataframe column to use as index.
By default: use the dataframe's index.
HIGHLIGHTER
phrases : str, list of str, default None
Phrases to highlight in the display.
The phrases can be regexes.
It also to pass in a dict where:
- the keys are the phrases
- the values are the css styling
escape : boolean, default False
Set escape to True in order to escape the phrases.
flags : int, default 0 (no flags)
Flags to pass through to the re module, e.g. re.IGNORECASE.
TRUNCATER
truncate : boolean, default True
Set to False to not truncate items.
trunc_limit : int, default 32
The number of words beyond which an item will be truncated.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.3

Jan 7, 2020

0.0.2

Nov 4, 2019

This version

0.0.1

Oct 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humannotator-0.0.1.tar.gz (27.5 kB view hashes)

Uploaded Oct 5, 2019 Source

Built Distribution

humannotator-0.0.1-py3-none-any.whl (47.9 kB view hashes)

Uploaded Oct 5, 2019 Python 3

Hashes for humannotator-0.0.1.tar.gz

Hashes for humannotator-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2c1b26208157ebaf1bc16bcecbb7d035c495a8208b0ebcfb98404283c3add3d1`
MD5	`89c99dfc6a02f5aa832b248af248cf08`
BLAKE2b-256	`459f2ec9c92b6ac78e2c82aa079f56bee42b108ba18bc685673720c5653aebe9`

Hashes for humannotator-0.0.1-py3-none-any.whl

Hashes for humannotator-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42e770b231b3579115ad2ba7307cd34638d3299c275c4a65b37bf94874c26963`
MD5	`72e6c7187e97ecaa8fcbdbfc2fd5997c`
BLAKE2b-256	`c3d54a904086009bd4786654750e8e049af65d9b4e61fd6d10763670c153d01a`