Skip to main content

A Distributed MLOps System for Efficient Active Learning

Project description

ALaaS: Active Learning as a Service.

PyPI Downloads Testing GitHub

Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled from a full dataset so to reduce labeling cost. It provides a out-of-the-box and standalone experience for users to quickly utilize active learning.

ALaaS is featured for

  • :hatching_chick: Easy-to-use With <10 lines of code to start the system to employ active learning.
  • :rocket: Fast Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.
  • :collision: Elastic Scale up and down multiple active workers, depending on the number of GPU devices.

The project is still under the active development. Welcome to join us!

Try It Out :coffee:

Free ALaaS demo on AWS

Use least confidence sampling with ResNet-18 to select images to be labeled for your tasks!

We have deployed ALaaS on AWS for demonstration. Try it by yourself!

call ALaaS with HTTP 🌐
curl \
-X POST http://13.213.29.8:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}], 
    "parameters": {"budget": 3},
    "execEndpoint":"/query"}'
call ALaaS with gRPC 🔐
from alaas.client import Client

url_list = [
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.29.8:60035')
print(client.query_by_uri(url_list, budget=3))

Then you will see 3 data samples (the most informative) has been selected from all the 5 data points by ALaaS.

Installation :construction:

You can easily install the ALaaS by PyPI,

pip install alaas

The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.

:warning: For deep learning frameworks like TensorFlow and Pytorch, you may need to install manually since the version to meet your deployment can be different.

Step by Step

0. Start the active learning server

You need to start an active learning server before conducting the data selection.

from alaas import Server

Server(config_path='./you_config.yml').start()

How to customize a configuration for your deployment scenarios can be found here.

1. Querying data from client

You can easily start the data selection by the following code,

from alaas.client import Client

client = Client('http://0.0.0.0:60035')
queries = client.query_by_uri(<url_list>, budget=<budget number>)

The output data is a subset uris/data in your request, which means the selection results for further data labeling.

Support Strategy :art:

Currently we supported several active learning strategies shown in the following table,

Type Setting Abbr Strategy Year Reference
Random Pool-base RS Random Sampling - -
Uncertainty Pool LC Least Confidence Sampling 1994 DD Lew et al.
Uncertainty Pool MC Margin Confidence Sampling 2001 T Scheffer et al.
Uncertainty Pool RC Ratio Confidence Sampling 2009 B Settles et al.
Uncertainty Pool VRC Variation Ratios Sampling 1965 EH Johnson et al.
Uncertainty Pool ES Entropy Sampling 2009 B Settles et al.
Uncertainty Pool MSTD Mean Standard Deviation 2016 M Kampffmeyer et al.
Uncertainty Pool BALD Bayesian Active Learning Disagreement 2017 Y Gal et al.
Clustering Pool KCG K-Center Greedy Sampling 2017 Ozan Sener et al.
Clustering Pool KM K-Means Sampling 2011 Z Bodó et al.
Clustering Pool CS Core-Set Selection Approach 2018 Ozan Sener et al.
Diversity Pool DBAL Diverse Mini-batch Sampling 2019 Fedor Zhdanov
Adversarial Pool DFAL DeepFool Active Learning 2018 M Ducoffe et al.

Citation

Our tech report is available on arxiv. Please cite as:

@article{huang2022active,
  title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI},
  author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2207.09109},
  year={2022}
}

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Yizheng Huang

🚇 ⚠️ 💻

Huaizheng

🖋 ⚠️ 📖

Yuanming Li

⚠️ 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgement

  • Jina - Build cross-modal and multimodal applications on the cloud

License

The theme is available as open source under the terms of the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alaas-0.1.7.tar.gz (26.2 kB view hashes)

Uploaded Source

Built Distribution

alaas-0.1.7-py3-none-any.whl (31.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page