Skip to main content

Hovercraft-like machine learning

Project description

Hover

Hover is a machine learning helper library that enables smooth human supervision. In other words, it's an interface where you hover over and label your data.. differently. Think driving a hovercraft compared with being on foot.

Hover caters to a variety of programming backgrounds: the "vanilla" use case should be friendly to a Python/Pandas beginner, and there are optional but powerful functionalities to pick up.

The vanilla use case

Docs

For documentation, please check out the mkdocs site.

Dependencies

./requirements.txt is for developers.

Please refer to hover/requirements.txt for pip-installable dependencies.

Note that dev dependencies and package dependencies are placed separately.

What is Hover?

Hover is conjectured for efficient machine teaching. It understands that efficiency can have different metrics given different context, and accommodates your needs to:

  • coldstart a supervised model, with any amount of annotation (could be 0) available at first
  • locate, investigate, and fix 'bad cases' of your model
  • minimize costs (developer time) given desired gains (model performance)
  • maximize gains (model performance) given budgeted costs (developer time)

The name comes from "patch a bunch of things together -- it works!" but it is also a reference to the World of Warcraft character Hover, a powerful amalgamation of vastly different pieces, alluding to an almost-alive intelligence built of not-so-alive components.

Here's a list of what currently gets patched together:

  • distant supervision LabelingFunctions based on Snorkel but with additional attributes and functionality
  • active learning, or just plain annotation, based on Prodigy but with much more flexible model architecture and format
  • prior knowledge integration compatible with any pre-trained embedding / language model
  • interactive visualization based on Bokeh specialized in data exploration, labeling function engineering, and neural net interpretation

Core Modules

  • hover.annotation - for creating labeled data and accepting/rejecting LabelingFunctions.
  • hover.evaluation - for assessing LabelingFunctions and samples for annotation.
  • hover.generation - for creating LabelingFunctions and more.
  • hover.representation - for representing collections of texts, vector transformations, and more.
  • hover.proposal - for selecting LabelingFunctions and samples for annotation.

High-level Usage

  • hover.workflow contains the lowest-level objects that most users have to interact with.
    • hover.workflow.Dataset helps you manage your train/dev/test sets of data.
      • if your goal is to produce supervised data to feed to your own model, this is what you will eventually export.
    • hover.workflow.LabelingFunctionPopulation maintains a healthy collection of both generated and user-defined labeling functions.
      • if your goal to to produce empirical rules or a LabelModel of Snorkel, this is what you will eventually export.
    • hover.workflow.Automated puts Dataset and LabelingFunctionPopulation together, iterating and cross-checking back and forth.
      • it has a built-in model architecture, both for active learning and for establishing a baseline of model performance.
      • whether you care more about data or rules, this helps you greedily provide the most relevant supervision.

Advanced Usage

  • However, Automated should really be a template with replaceable parts -- that is, it incorporates the core modules by default, but allows users to extend any of the modules as long as the interfaces are compatible.
    • [dev] make an abstract base class as a parent class of Automated, thinking carefully about customization.
      • for example, Automated currently uses Prodigy for annotation, but one could use hover.annotation.PromptCollector, especially if Prodigy is not available. These two annotators behave quite differently, so the base class needs to find the greatest common factor.

Project layout

mkdocs.yml    # The configuration file.
docs/
    index.md  # The documentation homepage.
    ...       # Other markdown pages, images and other files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hover-0.1.0a0.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

hover-0.1.0a0-py3-none-any.whl (59.8 kB view details)

Uploaded Python 3

File details

Details for the file hover-0.1.0a0.tar.gz.

File metadata

  • Download URL: hover-0.1.0a0.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for hover-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 bc8a0aa4ce3cf4821fac81787028226011756d6f2cbcf7838c72831e29753668
MD5 c98bebd2dcb2d0e8e0bf4ee7ba8e690d
BLAKE2b-256 7711d3f1feab0191e397645c04056f2948998d03bf4b855350f104d4177390e0

See more details on using hashes here.

File details

Details for the file hover-0.1.0a0-py3-none-any.whl.

File metadata

  • Download URL: hover-0.1.0a0-py3-none-any.whl
  • Upload date:
  • Size: 59.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for hover-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 0464d83023a6a36d05ad602d27a3b308de7850357ca3f9521452a123f9ac3a4b
MD5 75ebf8e4d31d81d2dfd8221a3123c769
BLAKE2b-256 43e8c46cc01f6d03a54fe9538cea50c47a5e8ecd454f164323787b84a378bf86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page