A Distributed MLOps System for Efficient Active Learning
Project description
ALaaS: Active Learning as a Service.
Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled from a full dataset so to reduce labeling cost. It provides a out-of-the-box and standalone experience for users to quickly utilize active learning.
ALaaS is featured for
- :hatching_chick: Easy-to-use With <10 lines of code to start the system to employ active learning.
- :rocket: Fast Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.
- :collision: Elastic Scale up and down multiple active workers, depending on the number of GPU devices.
The project is still under the active development. Welcome to join us!
Try It Out :coffee:
Free ALaaS demo on AWS
Use least confidence sampling with ResNet-18 to select images to be labeled for your tasks!
We have deployed ALaaS on AWS for demonstration. Try it by yourself!
call ALaaS with HTTP 🌐 |
---|
curl \
-X POST http://13.213.29.8:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}],
"parameters": {"budget": 3},
"execEndpoint":"/query"}'
|
call ALaaS with gRPC 🔐 |
from alaas.client import Client
url_list = [
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.29.8:60035')
print(client.query_by_uri(url_list, budget=3))
|
Then you will see 3 data samples (the most informative) has been selected from all the 5 data points by ALaaS.
Installation :construction:
You can easily install the ALaaS by PyPI,
pip install alaas
The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.
:warning: For deep learning frameworks like TensorFlow and Pytorch, you may need to install manually since the version to meet your deployment can be different.
Step by Step
0. Start the active learning server
You need to start an active learning server before conducting the data selection.
from alaas import Server
Server(config_path='./you_config.yml').start()
How to customize a configuration for your deployment scenarios can be found here.
1. Querying data from client
You can easily start the data selection by the following code,
from alaas.client import Client
client = Client('http://0.0.0.0:60035')
queries = client.query_by_uri(<url_list>, budget=<budget number>)
The output data is a subset uris/data in your request, which means the selection results for further data labeling.
Support Strategy :art:
Currently we supported several active learning strategies shown in the following table,
Type | Setting | Abbr | Strategy | Year | Reference |
---|---|---|---|---|---|
Random | Pool-base | RS | Random Sampling | - | - |
Uncertainty | Pool | LC | Least Confidence Sampling | 1994 | DD Lew et al. |
Uncertainty | Pool | MC | Margin Confidence Sampling | 2001 | T Scheffer et al. |
Uncertainty | Pool | RC | Ratio Confidence Sampling | 2009 | B Settles et al. |
Uncertainty | Pool | VRC | Variation Ratios Sampling | 1965 | EH Johnson et al. |
Uncertainty | Pool | ES | Entropy Sampling | 2009 | B Settles et al. |
Uncertainty | Pool | MSTD | Mean Standard Deviation | 2016 | M Kampffmeyer et al. |
Uncertainty | Pool | BALD | Bayesian Active Learning Disagreement | 2017 | Y Gal et al. |
Clustering | Pool | KCG | K-Center Greedy Sampling | 2017 | Ozan Sener et al. |
Clustering | Pool | KM | K-Means Sampling | 2011 | Z Bodó et al. |
Clustering | Pool | CS | Core-Set Selection Approach | 2018 | Ozan Sener et al. |
Diversity | Pool | DBAL | Diverse Mini-batch Sampling | 2019 | Fedor Zhdanov |
Adversarial | Pool | DFAL | DeepFool Active Learning | 2018 | M Ducoffe et al. |
Citation
Our tech report is available on arxiv. Please cite as:
@article{huang2022active,
title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI},
author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang},
journal={arXiv preprint arXiv:2207.09109},
year={2022}
}
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Yizheng Huang 🚇 ⚠️ 💻 |
Huaizheng 🖋 ⚠️ 📖 |
Yuanming Li ⚠️ 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Acknowledgement
- Jina - Build cross-modal and multimodal applications on the cloud
License
The theme is available as open source under the terms of the Apache 2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.