A tool for Deep Active Learning
Project description
Deep Active Learning Library
Introduction
Supervised machine learning models are trained to map input data to desired outputs. Often, unlabeled samples are abundant, but obtaining corresponding labels is costly. In particular, acquiring these labels may require human annotators, extensive compute, or a substantial amount of time. In these scenarios, it's best to think carefully about how we allocate these scarce resources, and to prioritize labeling samples that will, once labeled and trained on, facilitate the largest improvements in model quality. The problem of identifying these high-information samples, given a constraint on how much data we are willing to have labeled, is referred to as Active Learning.
Active learning problems are ubiquitous in practical, real-world machine learning deployments. Building on an array of state-of-the-art algorithms developed in our lab, this library provides a general-purpose tool for active learning with deep neural networks.
Why Does it Matter?
Active learning matters because it allows us to train higher-performing models at a reduced cost. While the potential applications are abundant, the underlying technology is somewhat general, allowing us to build one tool that can handle a wide array of use cases.
One big active learning success at Microsoft involves a Bing language model called "RankLM." RankLM predicts the quality of a search query-result pair, and obtaining these labels for training is costly — requiring either human annotators or compute-intensive models. By using active learning to construct an information-dense training set, the Bing team was able to obtain a significant boost in the predictive quality of RankLM.
Build & Installation
Users can build a wheel package and use it as follows.
python -m pip install --upgrade build
python -m build
pip install dist/active_learn-*.whl
Example Usage
Users can use any torch.nn
module with the library. See a demo here.
pip install -r examples/requirements.txt
python examples/demo.py
Using a pretrained model (TBD)
from active_learn import ActiveSampler
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import load_dataset
# 1) Load a pretrained sentiment classifier
model_name = "finiteautomata/bertweet-base-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 2) Load a new sentiment dataset unlike what the model was trained on.
# -
# Here we're pretending labels are not available, and we want to identify the
# most useful 'n' samples to send to an expert to have labeled before
# integrating # them into our training data
sentences = samples['train']['sentence']
# 3) Get 100 most valuable samples
sampler = ActiveSampler('classification', (model, tokenizer), 100)
valuable_samples = sampler.select(sentences)
# 4) Now get these new samples labeled and update your model!
API overview
See detailed API description here.
Roadmap
See a list of future work items here.
References
Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." International Conference on Learning Representations. 2020.
Ash, Jordan T., et al. "Gone fishing: Neural active learning with fisher embeddings." Advances in Neural Information Processing Systems. 2021.
Saran, Akanksha, et al. "Streaming Active Learning with Deep Neural Networks." International Conference on Machine Learning. 2023.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file active_learn-0.0.0.tar.gz
.
File metadata
- Download URL: active_learn-0.0.0.tar.gz
- Upload date:
- Size: 805.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e409b6536fcafd2832ed05697364c4286cee51647c246e21ebbf19c3fa6037e |
|
MD5 | 0c963b8874296da6cf27dedd13d99d68 |
|
BLAKE2b-256 | 472508971cb5685dc77a5d1fc7e00ce694949debf1809eb87cbfbc8c72e94da0 |
File details
Details for the file active_learn-0.0.0-py3-none-any.whl
.
File metadata
- Download URL: active_learn-0.0.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e23f79e9e283fe5ef1b64d0887594356541f3431a4826c1699eaf64a0de043d |
|
MD5 | 357d48a8d0eeb34aec2c49ca9c36395c |
|
BLAKE2b-256 | 1698711ade516fc68bbefce55e42a6524aced6a219d2f8e5b31c81c3d1971703 |