Skip to main content

Data annotation done right: easy, fun, hyper-productive, and inducing insight.

Project description

Hover

Imagine editing a picture layer by layer, not pixel by pixel, nor by splashing paint.

We can apply this idea to datasets.

PyPI Version PyPI Stage Build Status Codacy Grade Codacy Coverage Discord

Demo


Hover is a machine teaching library that enables intuitive and effecient supervision. In other words, it provides a map where you hover over and label your data... differently. For instance, you can:

  • Binder :seedling: annotate an intuitively selected group of data points at a time
  • Binder :ferris_wheel: throw a model in the loop and exploit active learning
  • Binder :whale: cross-check with Snorkel-based distant supervision

Check out @phurwicz/hover-binder for a complete list of demo apps.

:flags: Latest Announcements

  • Dec 26 Major update coming soon:

    • Low-level APIs in bokeh.core.explorer will undergo significant refacorization. Watch out if you've been using them directly!
    • There will be a change log :sunglasses:
  • Dec 16 We decided to make notebook tutorials instead of videos, because

    • hover is now embeddable in Jupyter notebooks, and
    • active development tends to outdate video tutorials rather quickly.
  • Dec 12 Common usage workflows are now available in hover.recipes. Much cleaner code!

:flight_departure: Quick Start

Step 0: load your dataset

from hover.core.dataset import SupervisableTextDataset

dataset = SupervisableTextDataset(
    raw_dictl=[{"content": "this is great"}],                  # the raw data to be supervised
    # train_dictl=[],                                          # train/dev/test sets can be empty
    dev_dictl=[{"content": "this is awesome", "mark": "A"}],
    test_dictl=[{"content": "this is meh", "mark": "B"}],
    feature_key="content",                                     # specify feature/label keys
    label_key="mark",
)

# define a vectorizer for your feature, then call dimensionality reduction
import spacy
nlp = spacy.load('en')
vectorizer = lambda text: nlp(text).vector # we recommend wrapping a @lru_cache around this
dataset.compute_2d_embedding(vectorizer, "umap")

Step 1: choose a recipe

(or create your own with examples)

from hover.recipes import simple_annotator

handle = simple_annotator(dataset)

Step 2: fire it up

Hover uses bokeh to deliver its annotation interface:

option 1: in Jupyter

from bokeh.io import show, output_notebook
output_notebook()
show(handle)

option 2: with bokeh serve

from bokeh.io import curdoc
doc = curdoc()
handle(doc)

option 3: elsewhere as an embedded app

from bokeh.server.server import Server
server = Server({'/my-app': handle})
server.start()

:package: Installation

Python: 3.6+

OS: tested on Mac & Linux

To get the latest release version, you can use pip:

pip install hover

Installation through conda is not yet available. Please open an issue if you would like conda or conda-forge support.

:flamingo: Features

Here we attempt a quick comparison with a few other packages that do machine teaching:

Package Hover Prodigy Snorkel
Core idea supervise like editing a picture scriptable active learning programmatic distant supervision
Annotates per batch of just the size you find right piece predicted to be the most valuable the whole dataset as long as it fits in
Supports all classification (text only atm) text & images, audio, vidio, & more text classification (for the most part)
Status open-source proprietary open-source
Devs indie Explosion AI Stanford / Snorkel AI
Related many imports of the awesome Bokeh builds on the Thinc/SpaCy stack Variants: Snorkel Drybell, MeTaL, DeepDive
Vanilla usage define a vectorizer and annotate away choose a base model and annotate away define labeling functions and apply away
Advanced usage combine w/ active learning & snorkel patterns / transformers / custom models transforming / slicing functions
Hardcore usage exploit hover.core templates custom @prodigy.recipe the upcoming Snorkel Flow

Hover claims the best deal of scale vs. precision thanks to

  • the flexibility to use, or not use, any technique beyond annotating on a "map";
  • the speed, or coarseness, of annotation being literally at your fingertips;
  • the interaction between multiple "maps" that each serves a different but connected purpose.

:book: Resources

Documentation

  • Contains API references of the most crucial components.
  • A lot more is on the way!

:bell: Remarks

Shoutouts

  • Thanks to Bokeh because hover would not exist without linked plots and callbacks.
  • Thanks to Philip Vollet for sharing hover with the community!

Dependencies

  • ./requirements-dev.txt lists recommended packages for developers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hover-0.4.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

hover-0.4.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file hover-0.4.0.tar.gz.

File metadata

  • Download URL: hover-0.4.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.6.10

File hashes

Hashes for hover-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2175fb7ecbf4e7a2d606cf606442263cfdd6f4e0b2886e7529927d2855dc8956
MD5 48096a94b50f60b1559f976c9091eeab
BLAKE2b-256 e33205dceb6d8a21db83dc0dc50acfa040a0219f2f4e4c491425a7270ebfdb1e

See more details on using hashes here.

File details

Details for the file hover-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: hover-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.6.10

File hashes

Hashes for hover-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c44f392b259d906696eb9641138c53fedea6ac2e24a679b35a46a85912afdc5
MD5 b65666cad4809360aa393188aeaf5eef
BLAKE2b-256 f3b22496b4176c17467f9db9f26abfcf6a4539c67c83d653a9b0a578935ca96e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page