Skip to main content

A Distributed MLOps System for Efficient Active Learning

Project description

ALaaS: Active Learning as a Service.

PyPI Downloads Testing GitHub

Active Learning as a Service (ALaaS) is a fast and scalable service framework for users to conduct the data selection before human labeling. It can be easily integrated with existing data processing and labeling platforms as a microservice.

ALaaS is featured for

  • :rocket: Fast Use the stage-level parallel to achieve over 10x speedup than normal active learning process.
  • :collision: Elastic Scale up and down multiple active workers on single or multiple GPU devices.
  • :hatching_chick: Easy-to-use With <10 lines of code to start APIs that prototype an active learning workflow.

Try It Out

You may just want to use the pre-trained model as the active data selector to help you select the most informative data samples from the unlabeled data pool. We have a CPU-based server for data selection demonstration (least confidence strategy with ResNet-18), try it by yourself!

HTTP gRPC
curl \
-X POST http://13.213.8.21:8081/post \
-H 'Content-Type: application/json' \
-d '{"data":[{"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png"},
            {"uri": "https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png"}], 
    "parameters": {"budget": 3},
    "execEndpoint":"/query"}'
from alaas.client import Client

url_list = [
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',
    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'
]
client = Client('grpc://13.213.8.21:60035')
print(client.query_by_uri(url_list, budget=3))

Then you will see 3 data samples has been selected from all the 5 data points by active learner.

Installation

You can easily install the ALaaS by PyPI,

pip install alaas

The package of ALaaS contains both client and server parts. You can build an active data selection service on your own servers or just apply the client to perform data selection.

Start the active learning server

You need to start an active learning server before conducting the data selection.

from alaas import Server

Server(config_path='./you_config.yml').start()

How to customize a configuration for your deployment scenarios can be found here.

Querying data from client

You can easily start the data selection by the following code,

from alaas.client import Client

client = Client('http://0.0.0.0:60035')
queries = client.query_by_uri(<url_list>, budget=<budget number>)

The output data is a subset uris/data in your request, which means the selection results for further data labeling.

Support Strategy

Currently we supported several active learning strategies shown in the following table,

Type Setting Abbr Strategy Year Reference
Random Pool-base RS Random Sampling - -
Uncertainty Pool-base LC Least Confidence Sampling 1994 DD Lew et al.
Uncertainty Pool-base MC Margin Confidence Sampling 2001 T Scheffer et al.
Uncertainty Pool-base RC Ratio Confidence Sampling 2009 B Settles et al.
Uncertainty Pool-base ES Entropy Sampling 2009 B Settles et al.
Uncertainty Pool-base BALD Bayesian Active Learning Disagreement 2017 Y Gal et al.
Clustering Pool-base KCG K-Center Greedy Sampling 2017 Ozan Sener et al.
Clustering Pool-base KM K-Means Sampling 2011 Z Bodó et al.
Clustering Pool-base CS Core-Set Selection Approach 2018 Ozan Sener et al.
Diversity Pool-base DBAL Diverse Mini-batch Sampling 2019 Fedor Zhdanov
Adversarial Pool-base DFAL DeepFool Active Learning 2018 M Ducoffe et al.

License

The theme is available as open source under the terms of the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alaas-0.1.6.tar.gz (19.4 kB view hashes)

Uploaded Source

Built Distribution

alaas-0.1.6-py3-none-any.whl (25.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page