PyePAL implemented the epsilon-PAL active learning algorithm
|Documentation and tutorial|
For more detailed docs go here.
To install the latest stable release use
pip install pyepal
to install the latest development version from the head use
pip install git+https://github.com/kjappelbaum/pyepal.git
Developers can install the extras
[testing, docs, pre-commit]. Installation should take only a few minutes.
On MacOS you might need to install
brew install libomp) for multithreading in some of the models.
We currently support Python 3.7 and 3.8.
The main logic is implemented in the
PALBase class. There are some prebuilt classes for common use cases (
sklearn) that inherit from this class.
For more details about how to use the code and notes about the tutorials see the docs.
If you want to use a list of sklearn models, you can use the
PALSklearn class. To use it for one step,
you can follow the following code snippet. The basic principle is the same for all the different
from pyepal import PALSklearn from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, Matern # For each objective, initialize a model gpr_objective_0 = GaussianProcessRegressor(RBF()) gpr_objective_1 = GaussianProcessRegressor(RBF()) # The minimal input to create a PAL instance is a list of models, # the design space (X, in ML terms "feature matrix") and the number of objectives palsklearn_instance = PALSklearn(X, [gpr_objective_0, gpr_objective_1], 2) # the next step is to provide some initial measurements. # You can do this with the update_train_set function, which you # can use throughout the active learning process to update the training set. # For this, provide a numpy array of indices in your design space # and the corresponding measurements sampled_indices = np.array([1,2,3]) measurements = np.array([[1,2], [0.8, 1], [7,1]]) palsklearn_instance.update_train_set(sampled_indices, measurements) # Now, you're ready to run the first iteration. # This will return the next index to sample and update all the attributes # If there are no unclassified samples left, it will return None and # print a statement saying that the classification is completed index_to_sample = palsklearn_instance.run_one_step()
If you want to use a list of GPy models, you can use the
Coregionalized GPR models can utilize correlations between the objectives and also work in the cases in which some of the objectives are not measured for all samples.
You will need to implement the
_predict() functions if you inherit from
PALBase. If you want to tune the hyperparameters of your models while new training points are added, you can implement a schedule by setting the
_should_optimize_hyperparameters() function and the
_set_hyperparameters() function, which sets the hyperparameters for the model(s).
If you need to train a model, use
self.design_space as the feature matrix and
self.y as the target vector. Note that in
self.y all objectives are turned into maximization problems. That is, if one of your problems is a minimization problem, PyePAL will flip its sign in
A basic example of how a custom class can be implemented is the
class PALSklearn(PALBase): """PAL class for a list of Sklearn (GPR) models, with one model per objective""" def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) validate_number_models(self.models, self.ndim) def _train(self): for i, model in enumerate(self.models): model.fit(self.design_space[self.sampled], self.y[self.sampled, i].reshape(-1,1)) def _predict(self): means, stds = ,  for model in self.models: mean, std = model.predict(self.design_space, return_std=True) means.append(mean.reshape(-1, 1)) stds.append(std.reshape(-1, 1)) self.means = np.hstack(mean) self.std = np.hstack(stds)
For scheduling of the hyperparameter optimization, we have some predefined schedules in the
Test the algorithms
If the full design space is known, you can use a while loop to fully explore the space with PyePAL. For the theoretical guarantees of PyePAL to hold, you'll need to sample until all uncertainties are below epsilon. In practice, it is usually enough to require as a termination criterion that there are no unclassified samples left. For this you can use the following snippet
from pyepal.utils import exhaust_loop from pyepal.models.gpr import build_model # indices for initialization sample_idx = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 60, 70]) # build one model per objective model_0 = build_model(X[sample_idx], y[sample_idx], 0) model_1 = build_model(X[sample_idx], y[sample_idx], 1) # initialize the PAL instance palinstance = PALGPy(X, [model_0, model_1], 2, beta_scale=1) palinstance.update_train_set(sample_idx, y[sample_idx]) # This will run the sampling and training as long as there # are unclassified samples exhaust_loop(palinstance, y)
- Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.
- Zuluaga, M.; Sergent, G.; Krause, A.; Püschel, M. Active Learning for Multi-Objective Optimization; Dasgupta, S., McAllester, D., Eds.; Proceedings of machine learning research; PMLR: Atlanta, Georgia, USA, 2013; Vol. 28, pp 462–470.
If you find this code useful for your work, please cite:
Our paper that describes the implementation and an application to materials discovery: Jablonka, K. M.; Giriprasad, M. J.; Wang, S.; Smit, B.; Yoo, B. Bias Free Multiobjective Active Learning for Materials Design and Discovery, ChemRxiv 2020 (10.26434/chemrxiv.13200197.v1).
The original paper that describes the ε-PAL algorithm: Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.
The research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 666983, MaGic), by the NCCR-MARVEL, funded by the Swiss National Science Foundation, and by the Swiss National Science Foundation (SNSF) under Grant 200021_172759. Part of the work was performed as part of the Explore Together internship program at BASF.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size pyepal-0.6.1-py3-none-any.whl (97.7 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size pyepal-0.6.1.tar.gz (60.1 kB)||File type Source||Python version None||Upload date||Hashes View|