Data annotation done right: easy, fun, hyper-productive, and inducing insight.
Project description
Imagine editing a picture layer by layer, not pixel by pixel, nor by splashing paint.
We can apply this idea to datasets.
Hover
is a machine teaching library that enables intuitive and effecient supervision. In other words, it provides a map where you hover over and label your data... differently. For instance, you can:
- :seedling: annotate an intuitively selected group of data points at a time
- :ferris_wheel: throw a model in the loop and exploit active learning
- :whale: cross-check with Snorkel-based distant supervision
Check out @phurwicz/hover-binder for a complete list of demo apps.
:flags: Latest Announcements
-
Dec 26 Major update coming soon:
- Low-level APIs in
bokeh.core.explorer
will undergo significant refacorization. Watch out if you've been using them directly! - There will be a change log :sunglasses:
- Low-level APIs in
-
Dec 16 We decided to make notebook tutorials instead of videos, because
hover
is now embeddable in Jupyter notebooks, and- active development tends to outdate video tutorials rather quickly.
-
Dec 12 Common usage workflows are now available in
hover.recipes
. Much cleaner code!
:flight_departure: Quick Start
Step 0: load your dataset
from hover.core.dataset import SupervisableTextDataset
dataset = SupervisableTextDataset(
raw_dictl=[{"content": "this is great"}], # the raw data to be supervised
# train_dictl=[], # train/dev/test sets can be empty
dev_dictl=[{"content": "this is awesome", "mark": "A"}],
test_dictl=[{"content": "this is meh", "mark": "B"}],
feature_key="content", # specify feature/label keys
label_key="mark",
)
# define a vectorizer for your feature, then call dimensionality reduction
import spacy
nlp = spacy.load('en')
vectorizer = lambda text: nlp(text).vector # we recommend wrapping a @lru_cache around this
dataset.compute_2d_embedding(vectorizer, "umap")
Step 1: choose a recipe
(or create your own with
examples
)
from hover.recipes import simple_annotator
handle = simple_annotator(dataset)
Step 2: fire it up
Hover
uses bokeh
to deliver its annotation interface:
option 1: in Jupyter
from bokeh.io import show, output_notebook
output_notebook()
show(handle)
option 2: with
bokeh serve
from bokeh.io import curdoc
doc = curdoc()
handle(doc)
option 3: elsewhere as an embedded app
from bokeh.server.server import Server
server = Server({'/my-app': handle})
server.start()
:package: Installation
Python: 3.6+
OS: tested on Mac & Linux
To get the latest release version, you can use pip
:
pip install hover
Installation through conda
is not yet available. Please open an issue if you would like conda
or conda-forge
support.
:flamingo: Features
Here we attempt a quick comparison with a few other packages that do machine teaching:
Package | Hover |
Prodigy |
Snorkel |
---|---|---|---|
Core idea | supervise like editing a picture | scriptable active learning | programmatic distant supervision |
Annotates per | batch of just the size you find right | piece predicted to be the most valuable | the whole dataset as long as it fits in |
Supports | all classification (text only atm) | text & images, audio, vidio, & more | text classification (for the most part) |
Status | open-source | proprietary | open-source |
Devs | indie | Explosion AI | Stanford / Snorkel AI |
Related | many imports of the awesome Bokeh |
builds on the Thinc /SpaCy stack |
Variants: Snorkel Drybell , MeTaL , DeepDive |
Vanilla usage | define a vectorizer and annotate away | choose a base model and annotate away | define labeling functions and apply away |
Advanced usage | combine w/ active learning & snorkel | patterns / transformers / custom models | transforming / slicing functions |
Hardcore usage | exploit hover.core templates |
custom @prodigy.recipe | the upcoming Snorkel Flow |
Hover
claims the best deal of scale vs. precision thanks to
- the flexibility to use, or not use, any technique beyond annotating on a "map";
- the speed, or coarseness, of annotation being literally at your fingertips;
- the interaction between multiple "maps" that each serves a different but connected purpose.
:book: Resources
Documentation
- Contains API references of the most crucial components.
- A lot more is on the way!
:bell: Remarks
Shoutouts
- Thanks to
Bokeh
becausehover
would not exist without linked plots and callbacks. - Thanks to Philip Vollet for sharing
hover
with the community!
Dependencies
./requirements-dev.txt
lists recommended packages for developers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hover-0.4.0.tar.gz
.
File metadata
- Download URL: hover-0.4.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2175fb7ecbf4e7a2d606cf606442263cfdd6f4e0b2886e7529927d2855dc8956 |
|
MD5 | 48096a94b50f60b1559f976c9091eeab |
|
BLAKE2b-256 | e33205dceb6d8a21db83dc0dc50acfa040a0219f2f4e4c491425a7270ebfdb1e |
File details
Details for the file hover-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: hover-0.4.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c44f392b259d906696eb9641138c53fedea6ac2e24a679b35a46a85912afdc5 |
|
MD5 | b65666cad4809360aa393188aeaf5eef |
|
BLAKE2b-256 | f3b22496b4176c17467f9db9f26abfcf6a4539c67c83d653a9b0a578935ca96e |