Skip to main content

Hovercraft-like machine learning

Project description

Hover

Hover is a machine learning helper library that enables smooth human supervision. In other words, it's an interface where you hover over and label your data.. differently. Think driving a hovercraft compared with being on foot.

Hover caters to a variety of programming backgrounds: the "vanilla" use case should be friendly to a Python/Pandas beginner, and there are optional but powerful functionalities to pick up.

The vanilla use case

Docs

For documentation, please check out the mkdocs site.

Dependencies

./requirements.txt is for developers.

Please refer to hover/requirements.txt for pip-installable dependencies.

Note that dev dependencies and package dependencies are placed separately.

What is Hover?

Hover is conjectured for efficient machine teaching. It understands that efficiency can have different metrics given different context, and accommodates your needs to:

  • coldstart a supervised model, with any amount of annotation (could be 0) available at first
  • locate, investigate, and fix 'bad cases' of your model
  • minimize costs (developer time) given desired gains (model performance)
  • maximize gains (model performance) given budgeted costs (developer time)

The name comes from "patch a bunch of things together -- it works!" but it is also a reference to the World of Warcraft character Hover, a powerful amalgamation of vastly different pieces, alluding to an almost-alive intelligence built of not-so-alive components.

Here's a list of what currently gets patched together:

  • distant supervision LabelingFunctions based on Snorkel but with additional attributes and functionality
  • active learning, or just plain annotation, based on Prodigy but with much more flexible model architecture and format
  • prior knowledge integration compatible with any pre-trained embedding / language model
  • interactive visualization based on Bokeh specialized in data exploration, labeling function engineering, and neural net interpretation

Core Modules

  • hover.annotation - for creating labeled data and accepting/rejecting LabelingFunctions.
  • hover.evaluation - for assessing LabelingFunctions and samples for annotation.
  • hover.generation - for creating LabelingFunctions and more.
  • hover.representation - for representing collections of texts, vector transformations, and more.
  • hover.proposal - for selecting LabelingFunctions and samples for annotation.

High-level Usage

  • hover.workflow contains the lowest-level objects that most users have to interact with.
    • hover.workflow.Dataset helps you manage your train/dev/test sets of data.
      • if your goal is to produce supervised data to feed to your own model, this is what you will eventually export.
    • hover.workflow.LabelingFunctionPopulation maintains a healthy collection of both generated and user-defined labeling functions.
      • if your goal to to produce empirical rules or a LabelModel of Snorkel, this is what you will eventually export.
    • hover.workflow.Automated puts Dataset and LabelingFunctionPopulation together, iterating and cross-checking back and forth.
      • it has a built-in model architecture, both for active learning and for establishing a baseline of model performance.
      • whether you care more about data or rules, this helps you greedily provide the most relevant supervision.

Advanced Usage

  • However, Automated should really be a template with replaceable parts -- that is, it incorporates the core modules by default, but allows users to extend any of the modules as long as the interfaces are compatible.
    • [dev] make an abstract base class as a parent class of Automated, thinking carefully about customization.
      • for example, Automated currently uses Prodigy for annotation, but one could use hover.annotation.PromptCollector, especially if Prodigy is not available. These two annotators behave quite differently, so the base class needs to find the greatest common factor.

Project layout

mkdocs.yml    # The configuration file.
docs/
    index.md  # The documentation homepage.
    ...       # Other markdown pages, images and other files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hover-0.1.1.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

hover-0.1.1-py3-none-any.whl (59.8 kB view details)

Uploaded Python 3

File details

Details for the file hover-0.1.1.tar.gz.

File metadata

  • Download URL: hover-0.1.1.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for hover-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2d46232f0c591e0b095352bc5b907b6b06a48988eb1b87bf6b0ffde6c3f328c8
MD5 5aba89582253712189a4b768e0fd4583
BLAKE2b-256 c562b2ef3764c5e8e6be0de22c344c38b29418a33c789473cab20d65264b0ab4

See more details on using hashes here.

File details

Details for the file hover-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: hover-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 59.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for hover-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5b9f546dbe5454a448bfc7c7a220e10f9ed8431eda2807640d030da39f9ba1eb
MD5 b87e9dc02dbe5808fb12bd438342fce9
BLAKE2b-256 858fd46cc9c17066b633b5114614dcd8c24c291115255d376eaf6488f456b3f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page