Skip to main content

Package with encoder-based approaches to warm-starting Bayesian Hyperparameter Optimization.

Project description

WSMF

tl;dr

This package contains implementations of two novel approaches to warm-starting encoder-based warm-start of the Bayesian Hyperparameter Optimization. It allows both training and using meta-models which can help in this meta-task.

Contents

Meta-models - As for now it contains two approaches to encoder-based warm-start:

  • Metric learning (Dataset2VecMetricLearning) - As an encoder it uses Dataset2Vec which is trained in a way that it produces representations whose distances to each other correspond to distances of the landmarkers (vectors of performances of a predefined set of hyperparameter configuration)
  • Landmarker reconstruction (LandmarkerReconstructionTrainingInterface) - As an encoder it uses Dataset2Vec which produces a latent representation of the entire dataset (of any size) and passes it to MLP which outputs predictions of the landmarker vector

Selectors - for usage for this intended meta-task wsmf provides API to use encoder for proposing hyperparameter configuration. It contains the following samplers:

  • Selector which is choosing based on the learned representation that is applicable in the metric learning approach (RepresentationBasedHpSelector)
  • Selector which is based on the reconstructed landmarkers (ReconstructionBasedHpSelector)
  • Random selector from the predefined portfolio (RandomHpSelector)
  • Selector which chooses the best configuration on average (RankBasedHpSelector)
  • Selector that chooses configurations based on the vector of landmarkers itself (LandmarkerHpSelector)

Examples of usage

Training metric learning based meta-model

# tensors X, y are torch.Tensor objects which correspond to feature and target matrices
train_datasets = { # training meta-dataset
    "dataset_train_1": (tensor_X1, tensor_y1),
    "dataset_train_2": (tensor_X2, tensor_y2),
    ...
}
val_datasets = { # validation meta-dataset
    "dataset_val_1": (tensor_X1, tensor_y1),
    "dataset_val_2": (tensor_X2, tensor_y2),
    ...
}

# tensors l1, l2, .. corresponds to vector of landmarkers in torch.Tensor format
train_landmarkers = { # training meta-dataset
    "dataset_train_1": l1,
    "dataset_train_2": l2,
    ...
}
val_landmarkers = { # validation meta-dataset
    "dataset_val_1": l1,
    "dataset_val_2": l2,
    ...
}

train_dataset = EncoderHpoDataset(train_datasets, train_landmarkers)
train_dataloader = EncoderMetricLearningLoader(train_dataset, train_num_batches, train_batch_size)
val_dataset = EncoderHpoDataset(val_datasets, val_landmarkers)
val_dataloader = EncoderMetricLearningLoader(val_dataset, val_num_batches, val_batch_size)
val_dataloader = GenericRepeatableDataLoader(val_dataloader) # Loader which produces repeatable batches

model = Dataset2VecMetricLearning()
trainer = pl.Trainer()
trainer.fit(model, train_loader, val_loader)

Using selector based on reconstruction

datasets = { # datasets to search from (in this case used for closest dataset search)
    "dataset_1": (tensor_X1, tensor_y1),
    "dataset_2": (tensor_X2, tensor_y2),
    ...
}
landmarkers = { # landmarkers to search from (is this case used for proposing best configurations)
    "dataset_val_1": l1,
    "dataset_val_2": l2,
    ...
}
configurations = [
    {"hp1": val1, "hp2": val2},
    {"hp1": val3, "hp2": val4},
    ...
]

meta_model = Dataset2VecForLandmarkerReconstruction.load_from_checkpoint("path_to_meta_model.ckpt")
selector = ReconstructionBasedHpSelector(
    meta_model,
    datasets,
    landmarkers,
    configurations
)
# Usage
new_dataset = (X, y) # torch.Tensor
n_configurations = 10
configurations = selector.propose_configurations(new_dataset, n_configurations)

Development

Commands useful during development:

  • Seting env variables - export PYTHONPATH=(backtick)pwd(backtick)
  • Install dependencies - pip install -r requirements_dev.txt
  • To run unit tests - pytest
  • Check code quality - ./scripts/check_code.sh
  • Relase - python -m build && twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsmf-0.0.2.tar.gz (40.7 kB view details)

Uploaded Source

Built Distribution

wsmf-0.0.2-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file wsmf-0.0.2.tar.gz.

File metadata

  • Download URL: wsmf-0.0.2.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for wsmf-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1966063a19ed4b845c15aac77960ebee26a8b6db95eb99f59aa912e1cbd6b5a8
MD5 234c12e297c72af9f7df0d502a0a6dc4
BLAKE2b-256 30f279c8e2ea3668f6246d98f9ce214b12fa465a48ae960265c4140f1270a40e

See more details on using hashes here.

File details

Details for the file wsmf-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: wsmf-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for wsmf-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 58028e13448fe28e30f1f47ed45d52aa77567fc34f0b20c2d3a4ec4a12d27312
MD5 03eb2f960123a3b884f6e1ebd6cc5c75
BLAKE2b-256 5254fc1e91001ad8cf9905d845b5df77ea6512df862536f7fa1c351ab4bbfe9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page