Skip to main content

Active learning with tensorflow. Create custom and generic active learning loops. Export and share your experiments.

Project description

PyPI version PyPi license Python Version: ^3.6 Coverage Status

Active learning with tensorflow

*Currently only supports bayesian active learning tasks.

Perform active learning in tensorflow/keras with extendable parts.

Index

  1. Installation
  2. Documentation
  3. Getting started
    1. Model wrapper
    2. Acquisition functions
    3. Basic active learning loop
  4. Development
    1. Setup
    2. Scripts
  5. Contribution
  6. Issues

Dependencies

python="^3.6"
tensorflow="^2.0.0"
scikit-learn="^0.24.2"
numpy="^1.0.0"
tqdm="^4.62.6"

Installation

$ pip install tf-al

*To use a specific version of tensorflow or if you want gpu support you should manually install tensorflow. Else this package automatically will install the lastest version of tensorflow described in the dependencies.

Getting started

Following the active learning paradigm the most essential parts are the model and the pool of labeled/unlabeled data.

To enable modularity tensorflow models are wrapped. The model wrapper acts as an interface between the active learning loop and the model. In essence the model wrapper defines methods which are called at different steps in the active learning loop. To manage the labeled and unlabeled datapoints the pool class can be used. Which offers methods to label and select datapoints, labels and indices.

Other parts provided by the library easy the setup of active learning loops. The active learning loop class uses a dataset and model to creat an iterator, which then can be used to perform active learning over a single experiment.(model and query strategy combination)

The experiment suit can be used to perform a couple of experiments in a row, which is useful if for example you want to compare differnt acquisition functions.

Model wrapper

Model wrappers are used to create an interface between the tensorflow model and the active learning loop. Currently there are two wrappers defined. Model and McDropout for bayesian active learning. The Model wrapper can be used to create custom model wrappers.

Here is an example of how to create and wrap a basic McDropout model.

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Input, Flatten
from tf_al.wrapper import McDropout

# Define and wrap model (here McDropout)
base_model = Sequential([
    Conv2D(32, 3, activation=tf.nn.relu, padding="same", input_shape=input_shape),
    Conv2D(64, 3, activation=tf.nn.relu, padding="same"),
    MaxPooling2D(),
    Dropout(.25),
    Flatten(),
    Dense(128, activation=tf.nn.relu),
    Dropout(.5),
    Dense(output, activation="softmax")        
])

# Wrap, configure and compile
model_config = Config(
    fit={"epochs": 200, "batch_size": 10},
    query={"sample_size" 25},
    eval={"batch_size": 900, "sample_size": 25}
)
model = McDropout(base_model, config=model_config)
model.compile(
    optimizer="adam", 
    loss="sparse_categorical_crossentropy", 
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

Basic methods

The model wrapper in essence can be used like a regular tensorflow model.

model = McDropout(base_model)
model.compile(
    optimizer="adam", 
    loss="sparse_categorical_crossentropy", 
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

# Fit model to data
model.fit(inputs, targets, batch_size=25, epochs=100)

# Use model to predict output values
model(inputs)

# Evaluate model returning loss and accuracy
model.evaluate(some_inputs, some_targets)

To define a custom custom model wrapper, simply extend your own class using the Model class and overwrite functions as needed. The regular tensorflow model can be accessed via self._model.

To provide your model wrappers as a package you can simply use the template on github, which already offers a poetry package setup.

from tf_al import Model


class CustomModel(Model):

    def __init__(self, model, **kwargs):
        super().__init__(model, , model_type="custom", **kwargs)


    def __call__(self, *args, **kwargs):
        # Custom __call__ or standard tensorflow __call__


    def predict(self, inputs, **kwargs):
        # Custom prediction method or the standard tensorflow call model(inputs)
        

    def evaluate(self, inputs, targets, **kwargs):
        # Defining custom evaluate method
        # else standard evaluate method of tensorflow used.
        return {"metric_1": some_value, "metrics_2": some_other_value}


    def fit(self, *args, **kwargs):
        # Custom fitting procedure, else tensorflow .fit() method is used. 
        

    def compile(self, *args, **kwargs):
        # Custom compile method else using tensorflow .compile(**kwargs)
        

    def reset(self, pool, dataset):
        # In Which way to reset the network after each active learning round
        # standard is re-loading weights when enabled

Acquisition functions

Basic active learning loop

import tensorflow.keras as keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Input, Flatten

from tf_al import ActiveLearningLoop, Dataset
from tf_al.wrapper import McDropout

# Load dataset and pack into dataset
(x_train, y_train), test_set = keras.datasets.mnist.load()
inital_pool_size = 20
dataset = Dataset(x_train, y_train, test=test_set, init_size=initial_pool_size)

# Create and wrap model
base_model = Sequential([
    Conv2D(32, 3, activation=tf.nn.relu, padding="same", input_shape=input_shape),
    Conv2D(64, 3, activation=tf.nn.relu, padding="same"),
    MaxPooling2D(),
    Dropout(.25),
    Flatten(),
    Dense(128, activation=tf.nn.relu),
    Dropout(.5),
    Dense(output, activation="softmax")        
])

mc_model = McDropout(base_model)
mc_model.compile(
    optimizer="adam", 
    loss="sparse_categorical_crossentropy", 
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

# Create and start experiment suit (Collection of different experiments model + query_strategy)
query_strategy = "random"
active_learning_loop = ActiveLearningLoop(
    mc_model,
    dataset,
    query_strategy,
    step_size=10, # Number of new datapoints to select after each round
    max_rounds=100 # How many active learning rounds per experiment?
)

# To completely run through the active learning loop
active_learning_loop.run()

# Manually iterate over active learning loop
for step in active_learning_loop:

    # Dict with accumulated metrics 
    # ["train", "train_time", "query_time", "optim", "optim_time", "eval", "eval_time", "indices_selected"]
    step["train"]


# Alternativly iterate step inside the loop
num_rounds = 10
for i in range(num_rounds):

    metrics = active_learning_loop.step()
    # ... do something with the metrics

Basic experiment suit setup

import tensorflow as tf
from tensorflow.keras import Model, Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Input, Flatten

from tf_al import ActiveLearningLoop, Dataset, Config, ExperimentSuit, AcquisitionFunction
from tf_al.wrapper import McModel

# Split data and put into a dataset
x_train, x_test, y_train, y_test = train_test_split(some_inputs, some_targets, test_size=test_set_size)

# Number of initial datapoints in pool of labeled data
initial_pool_size = 20 
dataset = Dataset(
    x_train, y_train,
    test=(x_test, y_test),
    init_size=initial_pool_size
)

# Define and wrap model (here McDropout)
base_model = Sequential([
    Conv2D(32, 3, activation=tf.nn.relu, padding="same", input_shape=input_shape),
    Conv2D(64, 3, activation=tf.nn.relu, padding="same"),
    MaxPooling2D(),
    Dropout(.25),
    Flatten(),
    Dense(128, activation=tf.nn.relu),
    Dropout(.5),
    Dense(output, activation="softmax")        
])

model_config = Config(
    fit={"epochs": 200, "batch_size": 10}, # Passed to fit() of the wrapper
    query={"sample_size" 25}, # Configuration passed to acquisition function during query step
    eval={"batch_size": 900, "sample_size": 25} # Parameters passed to evaluation method of the wrapper
)
model = McDropout(base_model, config=model_config)
model.compile(
    optimizer="adam", 
    loss="sparse_categorical_crossentropy", 
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

# Over which model to perform experiments single or list [model_1, ..., model_n]
models = model

# Define which acquisition functions to apply in separate runs either single one or a list [acquisition_1, ...] 
acquisition_functions = ["random", AcqusitionFunction("max_entropy", batch_size=900)]
experiments = ExperimentSuit(
    models,
    acquisition_functions,
    step_size=10, # Number of new datapoints to select after each round
    max_rounds=100 # How many active learning rounds per experiment?
)

Development

Setup

  1. Fork and clone the forked repository
  2. Create a virtual env (optional)
  3. Install and Setup Poetry
  4. Install package dependencies using poetry or set them up manually
  5. Start development

Scripts

Create documentation

To create documentation for the ./tf_al directory. Execute following command in ./docs

$ make html

To clear the generated documentation use following command.

$ make clean

Run tests

To perform automated unittests run following command in the root package directory.

$ pytest

To generate additional coverage reports run.

$ pytest --cov

Contribution

Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-al-0.1.0.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tf_al-0.1.0-py3-none-any.whl (52.3 kB view details)

Uploaded Python 3

File details

Details for the file tf-al-0.1.0.tar.gz.

File metadata

  • Download URL: tf-al-0.1.0.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.10 CPython/3.6.7 Linux/4.15.0-1077-gcp

File hashes

Hashes for tf-al-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4dc458621f7b743977a8b05427f93cfedda1b7af7e42609627d9dbbebfd34e9
MD5 d66ce720f7202e67d7642b3d6dabaefe
BLAKE2b-256 971678aae6a8f9e7485c1551116604e473c7e9422e9d8d9e580e42fd824f1696

See more details on using hashes here.

File details

Details for the file tf_al-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tf_al-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.10 CPython/3.6.7 Linux/4.15.0-1077-gcp

File hashes

Hashes for tf_al-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a030a0bf48a4046bfb50d99ffc263b6ff10f66e77641166a74fd53c14e7a0a1b
MD5 327ed1de8698391c3693b5e8e5d2944b
BLAKE2b-256 680e118760c1b6c04c56c9ea02700add4dcc8b219b5cef1d7003a484df5dc452

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page