A Distributed MLOps System for Efficient Active Learning
Project description
ALaaS: Active Learning as a Service.
Active Learning as a Service (ALaaS) is a fast and scalable service framework for users to conduct the data selection before human labeling. It can be easily integrated with existing data processing and labeling platforms as a microservice.
ALaaS is featured for
- :rocket: Fast Use the stage-level parallel to achieve over 10x speedup than normal active learning process.
- :collision: Elastic Scale up and down multiple active workers on single or multiple GPU devices.
- :hatching_chick: Easy-to-use With <10 lines of code to start APIs that prototype an active learning workflow.
Try It Out
You may just want to use the pre-trained model as the active data selector to help you select the most informative data samples from the unlabeled data pool. We have a CPU-based server for data selection demonstration (least confidence strategy with ResNet-18), try it by yourself!
HTTP | gRPC |
curl \
-X POST http://13.213.8.21:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}],
"parameters": {"budget": 3},
"execEndpoint":"/query"}'
|
from alaas.client import Client
url_list = [
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.8.21:60035')
print(client.query_by_uri(url_list, budget=3))
|
Then you will see 3 data samples has been selected from all the 5 data points by active learner.
Installation
You can easily install the ALaaS by PyPI,
pip install alaas
The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.
Start the active learning server
You need to start an active learning server before conducting the data selection.
from alaas import Server
Server(config_path='./you_config.yml').start()
How to customize a configuration for your deployment scenarios can be found here.
Querying data from client
You can easily start the data selection by the following code,
from alaas.client import Client
client = Client('http://0.0.0.0:60035')
queries = client.query_by_uri(<url_list>, budget=<budget number>)
The output data is a subset uris/data in your request, which means the selection results for further data labeling.
Support Strategy
Currently we supported several active learning strategies shown in the following table,
Type | Setting | Abbr | Strategy | Year | Reference |
---|---|---|---|---|---|
Random | Pool-base | RS | Random Sampling | - | - |
Uncertainty | Pool-base | LC | Least Confidence Sampling | 1994 | DD Lew et al. |
Uncertainty | Pool-base | MC | Margin Confidence Sampling | 2001 | T Scheffer et al. |
Uncertainty | Pool-base | RC | Ratio Confidence Sampling | 2009 | B Settles et al. |
Uncertainty | Pool-base | ES | Entropy Sampling | 2009 | B Settles et al. |
Uncertainty | Pool-base | BALD | Bayesian Active Learning Disagreement | 2017 | Y Gal et al. |
Clustering | Pool-base | KCG | K-Center Greedy Sampling | 2017 | Ozan Sener et al. |
Clustering | Pool-base | KM | K-Means Sampling | 2011 | Z Bodó et al. |
Clustering | Pool-base | CS | Core-Set Selection Approach | 2018 | Ozan Sener et al. |
Diversity | Pool-base | DBAL | Diverse Mini-batch Sampling | 2019 | Fedor Zhdanov |
Adversarial | Pool-base | DFAL | DeepFool Active Learning | 2018 | M Ducoffe et al. |
License
The theme is available as open source under the terms of the Apache 2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.