Skip to main content

A tool for Deep Active Learning

Project description

Deep Active Learning Library

Introduction

Supervised machine learning models are trained to map input data to desired outputs. Often, unlabeled samples are abundant, but obtaining corresponding labels is costly. In particular, acquiring these labels may require human annotators, extensive compute, or a substantial amount of time. In these scenarios, it's best to think carefully about how we allocate these scarce resources, and to prioritize labeling samples that will, once labeled and trained on, facilitate the largest improvements in model quality. The problem of identifying these high-information samples, given a constraint on how much data we are willing to have labeled, is referred to as Active Learning.

Active learning problems are ubiquitous in practical, real-world machine learning deployments. Building on an array of state-of-the-art algorithms developed in our lab, this library provides a general-purpose tool for active learning with deep neural networks.

active learning

Why Does it Matter?

Active learning matters because it allows us to train higher-performing models at a reduced cost. While the potential applications are abundant, the underlying technology is somewhat general, allowing us to build one tool that can handle a wide array of use cases.

One big active learning success at Microsoft involves a Bing language model called "RankLM." RankLM predicts the quality of a search query-result pair, and obtaining these labels for training is costly — requiring either human annotators or compute-intensive models. By using active learning to construct an information-dense training set, the Bing team was able to obtain a significant boost in the predictive quality of RankLM.

Build & Installation

Users can build a wheel package and use it as follows.

python -m pip install --upgrade build
python -m build

pip install dist/active_learn-*.whl

Example Usage

Users can use any torch.nn module with the library. See a demo here.

pip install -r examples/requirements.txt
python examples/demo.py

Using a pretrained model (TBD)

from active_learn import ActiveSampler

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset

# 1) Load a pretrained sentiment classifier
model_name = "finiteautomata/bertweet-base-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 2) Load a new sentiment dataset unlike what the model was trained on.
#    -
#    Here we're pretending labels are not available, and we want to identify the
#    most useful 'n' samples to send to an expert to have labeled before
#    integrating # them into our training data
sentences = samples['train']['sentence']

# 3) Get 100 most valuable samples
sampler = ActiveSampler('classification', (model, tokenizer), 100)
valuable_samples = sampler.select(sentences)

# 4) Now get these new samples labeled and update your model!

API overview

See detailed API description here.

Roadmap

See a list of future work items here.

References

Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." International Conference on Learning Representations. 2020.

Ash, Jordan T., et al. "Gone fishing: Neural active learning with fisher embeddings." Advances in Neural Information Processing Systems. 2021.

Saran, Akanksha, et al. "Streaming Active Learning with Deep Neural Networks." International Conference on Machine Learning. 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

active_learn-0.0.0.tar.gz (805.5 kB view details)

Uploaded Source

Built Distribution

active_learn-0.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file active_learn-0.0.0.tar.gz.

File metadata

  • Download URL: active_learn-0.0.0.tar.gz
  • Upload date:
  • Size: 805.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for active_learn-0.0.0.tar.gz
Algorithm Hash digest
SHA256 6e409b6536fcafd2832ed05697364c4286cee51647c246e21ebbf19c3fa6037e
MD5 0c963b8874296da6cf27dedd13d99d68
BLAKE2b-256 472508971cb5685dc77a5d1fc7e00ce694949debf1809eb87cbfbc8c72e94da0

See more details on using hashes here.

File details

Details for the file active_learn-0.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for active_learn-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e23f79e9e283fe5ef1b64d0887594356541f3431a4826c1699eaf64a0de043d
MD5 357d48a8d0eeb34aec2c49ca9c36395c
BLAKE2b-256 1698711ade516fc68bbefce55e42a6524aced6a219d2f8e5b31c81c3d1971703

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page