Hovercraft-like machine learning
Project description
Hover
Hover
is a machine learning helper library that enables smooth human supervision. In other words, it's an interface where you hover over and label your data.. differently. Think driving a hovercraft compared with being on foot.
Hover caters to a variety of programming backgrounds: the "vanilla" use case should be friendly to a Python/Pandas beginner, and there are optional but powerful functionalities to pick up.
The vanilla use case
Docs
For documentation, please check out the mkdocs site.
Dependencies
./requirements.txt
is for developers.
Please refer to hover/requirements.txt
for pip-installable dependencies.
Note that dev dependencies and package dependencies are placed separately.
What is Hover?
Hover is conjectured for efficient machine teaching. It understands that efficiency can have different metrics given different context, and accommodates your needs to:
- coldstart a supervised model, with any amount of annotation (could be 0) available at first
- locate, investigate, and fix 'bad cases' of your model
- minimize costs (developer time) given desired gains (model performance)
- maximize gains (model performance) given budgeted costs (developer time)
The name comes from "patch a bunch of things together -- it works!" but it is also a reference to the World of Warcraft character Hover, a powerful amalgamation of vastly different pieces, alluding to an almost-alive intelligence built of not-so-alive components.
Here's a list of what currently gets patched together:
- distant supervision
LabelingFunction
s based on Snorkel but with additional attributes and functionality - active learning, or just plain annotation, based on Prodigy but with much more flexible model architecture and format
- prior knowledge integration compatible with any pre-trained embedding / language model
- interactive visualization based on Bokeh specialized in data exploration, labeling function engineering, and neural net interpretation
Core Modules
hover.annotation
- for creating labeled data and accepting/rejectingLabelingFunction
s.hover.evaluation
- for assessingLabelingFunction
s and samples for annotation.hover.generation
- for creatingLabelingFunction
s and more.hover.representation
- for representing collections of texts, vector transformations, and more.hover.proposal
- for selectingLabelingFunction
s and samples for annotation.
High-level Usage
hover.workflow
contains the lowest-level objects that most users have to interact with.hover.workflow.Dataset
helps you manage your train/dev/test sets of data.- if your goal is to produce supervised data to feed to your own model, this is what you will eventually export.
hover.workflow.LabelingFunctionPopulation
maintains a healthy collection of both generated and user-defined labeling functions.- if your goal to to produce empirical rules or a
LabelModel
of Snorkel, this is what you will eventually export.
- if your goal to to produce empirical rules or a
hover.workflow.Automated
putsDataset
andLabelingFunctionPopulation
together, iterating and cross-checking back and forth.- it has a built-in model architecture, both for active learning and for establishing a baseline of model performance.
- whether you care more about data or rules, this helps you greedily provide the most relevant supervision.
Advanced Usage
- However,
Automated
should really be a template with replaceable parts -- that is, it incorporates the core modules by default, but allows users to extend any of the modules as long as the interfaces are compatible.- [dev] make an abstract base class as a parent class of
Automated
, thinking carefully about customization.- for example,
Automated
currently uses Prodigy for annotation, but one could usehover.annotation.PromptCollector
, especially if Prodigy is not available. These two annotators behave quite differently, so the base class needs to find the greatest common factor.
- for example,
- [dev] make an abstract base class as a parent class of
Project layout
mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hover-0.1.0a0.tar.gz
.
File metadata
- Download URL: hover-0.1.0a0.tar.gz
- Upload date:
- Size: 50.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc8a0aa4ce3cf4821fac81787028226011756d6f2cbcf7838c72831e29753668 |
|
MD5 | c98bebd2dcb2d0e8e0bf4ee7ba8e690d |
|
BLAKE2b-256 | 7711d3f1feab0191e397645c04056f2948998d03bf4b855350f104d4177390e0 |
File details
Details for the file hover-0.1.0a0-py3-none-any.whl
.
File metadata
- Download URL: hover-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 59.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0464d83023a6a36d05ad602d27a3b308de7850357ca3f9521452a123f9ac3a4b |
|
MD5 | 75ebf8e4d31d81d2dfd8221a3123c769 |
|
BLAKE2b-256 | 43e8c46cc01f6d03a54fe9538cea50c47a5e8ecd454f164323787b84a378bf86 |