Skip to main content

Multi-objective optimization of chemical processes with automated machine learning workflows

Project description

Nomadic Exploratory Multi-objective Optimisation (NEMO)

Meet NEMO - our ‘Nomadic Explorer’. NEMO is quite the connoisseur when it comes to machine learning optimisation - only the best model types and model parameters will suffice. In the case of a single dataset, NEMO will scour the lands for the optimal model type and parameters to fit a given dataset. A range of outputs will then be generated for you to assess, interpret and utilise your newly created model.

If you decide to take your analyses a step further into the realms of Multi-objective Bayesian optimisation, then our Nomadic Explorer will tirelessly search for the best model type and parameters at each iteration of the optimisation. At each stage, the optimal set of conditions will be provided to aid your pursuit of the elusive multi-dimensional pareto front.

NEMO is prepared for the journey with a cavernous bag of tools. However, if your aspirations are more exotic, then NEMO supports the inclusion of custom models, samplers and functions.

Check out the examples to see NEMO in action and get started with your own ML workflows.

What is NEMO?

NEMO is a package designed for Bayesian optimisation of one or multiple objectives simultaneously, with a focus on applying to chemical processes.

Installation

To install NEMO via pip:

pip install nemo-bo

How does NEMO work?

Firstly, the parameters (variables) and targets (objectives) of a chemical process are provided to the algorithm. After providing NEMO with a dataset from prior experiments, it will then identify the relationship between the parameters and targets and then suggest the ideal parameters to use for the optimisation iteration.

In comparison to other open-source optimisation libraries, NEMO will automatically optimise the hyperparameters for various machine learning models and select the one with the best predictive accuracy for a given objective. This ensures that the model is continuously optimised over the course of an optimisation campaign. Furthermore, NEMO natively supports objectives that can be calculated if the exact relationship between the parameters and the target (e.g. materials cost) is known.

What features are in NEMO?

Although NEMO includes many machine learning models, acquisition functions, constraints, and sample generators, the base classes for these are all included and can be utilised as a template for adding your own custom solutions.

The features natively found in NEMO are the following:

Machine learning models available

  1. Gaussian processes (GPs) using the BoTorch library

  2. Various neural networks from the Deeply Uncertain code repository:

    1. Bayesian neural networks
    2. Concrete dropout
    3. Deep ensembles
  3. Various decision-tree based models:

    1. XGBoost Distribution
    2. NGBoost
    3. Random Forest using the forest-confidence-interval

Variable types available

  1. Continuous variables (ContinuousVariable)
  2. Categorical variables with discrete variables (CategoricalVariableDiscreteValues)
  3. Categorical variables with descriptors (CategoricalVariableWithDescriptors)

Categorical variables without any description (e.g. one-hot encoding) is not currently supported

Objective types available

  1. Objectives modelled using machine learning models (RegressionObjective)
  2. Calculated objectives using a user-provided function (CalculableObjective)

Classification objectives are not currently supported

User-selectable acquisition functions available

  1. Expected improvement based methods (ExpectedImprovement)

    1. A modifed single-objective expected improvement algorithm that is better at exploration than the standard analytical method
    2. A modifed multi-objective expected hypervolume improvement algorithm that is better at exploration than the standard analytical method
    3. qNEI and QNEHVI BoTorch methods (only compatible with GP models)
  2. A Unified evolutionary optimization algorithm U-NSGA-III based method that derives uncertainty in the inference by sampling from a distribution (NSGAImprovement)

  3. A fully explorative method that identifies the candidates that have the highest uncertainty in the objective predictions (HighestUncertainty)

Input constraints available

  1. Linear equality and inequality constraints(LinearConstraint)
  2. Basic non-linear equality and inequality constraints that incorporates an exponent for each input variable (NonLinearPowerConstraint)
  3. Equality and inequality constraints that allows the user to pass a function to calculate the left-hand-side of the constraint (FunctionalConstraint)
  4. Stoichiometry constraints that forces the ratio between two input variable to be equal to or greater than a specified value (StoichiometricConstraint)
  5. A constraint type to limit the number of active variables (MaxActiveFeaturesConstraint)
  6. A constraint type that prevents certain categorical constraints from being selected simulatenously (CategoricalConstraint)

Benchmarking functionality available

Benchmark functions are typically used to simulate the outcomes of experiments in a closed-loop manner and therefore can be helpful to evaluate the quality of an optimisation (inferred from the effectiveness of the utilised model(s) and/or acquisition function to identify the optimum)

  1. Machine learning model based on a provided dataset (ModelBenchmark)
  2. Single objective synthetic functions (SingleObjectiveSyntheticBenchmark)
  3. Multi-objective synthetic functions (MultiObjectiveSyntheticBenchmark)

Sample generators available

Methods for generating a samples of parameter values during an optimisation. These can be used independently outside of an optimisation too by calling the generate_samples function

  1. Latin hypercube sampling (with a mixed-integer implementation for efficient sampling of categorical variables) (LatinHyperCubeSampling)
  2. Sobol sampling (SobolSampling)
  3. Polytope sampling (PolytopeSampling)
  4. Random sampling (RandomSampling)
  5. Pool-based sampling using a user-defined set of data points. Typically used as an alternative to a machine learning model benchmark function (PoolBased)

Other utilities/functions available

  1. Included template for provided the dataset with automated extraction
  2. Scatter and bar chart plotting functionality for displaying model quality and optimisation progress

Getting started

The following code demonstrates how to set-up a simple bayesian optimisation using a user-provided dataset containing two continuous variables (X) and two objectives (Y):

# Import the variable, objectives, sampler, acquisition function, and the optimisation classes
from nemo_bo.opt.variables import ContinuousVariable, VariablesList
from nemo_bo.opt.objectives import RegressionObjective, ObjectivesList
from nemo_bo.acquisition_functions.expected_improvement.expected_improvement import ExpectedImprovement
from nemo_bo.opt.samplers import LatinHyperCubeSampling
from nemo_bo.opt.optimisation import Optimisation

# Create the variable objects
var1 = ContinuousVariable(name="variable1", lower_bound=0.0, upper_bound=100.0)
var2 = ContinuousVariable(name="variable2", lower_bound=0.0, upper_bound=100.0)
var_list = VariablesList([var1, var2])

# Create the objective objects
obj1 = RegressionObjective(name="objective1", obj_max_bool=True, lower_bound=0.0, upper_bound=100.0) # obj_max_bool when True defines the objective is to be maximised
obj2 = RegressionObjective(name="objective2", obj_max_bool=False, lower_bound=0.0, upper_bound=100.0) # obj_max_bool when False defines the objective is to be minimised
obj_list = ObjectivesList([obj1, obj2])

# Instantiate the sampler
sampler = LatinHyperCubeSampling()

# Instantiate the acquisition function
acq_func = ExpectedImprovement(num_candidates=4) # num_candidates defines how many sets of parameters to return at each optimisation iteration

# Set up the optimisation instance
optimisation = Optimisation(var_list, obj_list, acq_func, sampler=sampler)

# Start the optimisation using the convenient run function that will run for the specified number of iterations
# X and Y arrays represent a hypothetical user-provided dataset
optimisation_data = optimisation.run(X, Y, number_of_iterations=50)

More tutorials

We encourage you to look through the tutorials written in the tutorials folder to see how to use some other NEMO functions

  1. Setting up a single objective optimisation
  2. How to select specific machine learning models types for the objectives
  3. How to use calculable objectives
  4. How to define transformers for variables and objectives
  5. How to define categorical variables with descriptors
  6. Utilising the machine learning model fitting in NEMO without Bayesian optimisation
  7. How to create a closed-loop optimisation using a machine learning model as the benchmark function
  8. How to create a closed-loop optimisation using a multiobjective synthetic function as the benchmark function
  9. How to create a closed-loop optimisation using a single objective synthetic function as the benchmark function
  10. How to create a closed-loop optimisation using a pool-based sampler as the benchmark
  11. Setting up an optimisation with input constraints
  12. Generating samples without needing to perform an optimisation
  13. How to set up a manual optimisation
  14. How to resume an optimisation run
  15. How to use the BoTorch (quasi-) Monte-Carlo based acquisition functions in NEMO
  16. How to set up an optimisation that uses U-NSGA-III as the acquisition function
  17. Using the input template excel file template to import the variables and objectives data
  18. How to set up an optimisation that uses the highest uncertainty acquisition function

What to do if you find any issues?

Leave a message in the issues section and we will get back to you as soon as we can.

Acknowledgements

Much of the functionality in NEMO is built on top of the work by the authors of the features we incorporate. We are grateful to them for continuously supporting their libraries and establishing their platforms for optimisation work. We reference the works throughout the .py files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo-bo-0.1.12.tar.gz (95.4 kB view details)

Uploaded Source

Built Distribution

nemo_bo-0.1.12-py3-none-any.whl (130.1 kB view details)

Uploaded Python 3

File details

Details for the file nemo-bo-0.1.12.tar.gz.

File metadata

  • Download URL: nemo-bo-0.1.12.tar.gz
  • Upload date:
  • Size: 95.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.4 Windows/10

File hashes

Hashes for nemo-bo-0.1.12.tar.gz
Algorithm Hash digest
SHA256 a58dffa0783d45eb6a0f1084d5d5e84c47e3d74fa188ecd4bf7f58d6fe1c8ee1
MD5 ab0fa172f33024b1ecee0633698408e1
BLAKE2b-256 d9ac3717c548a4b62477b00551f67f9184b9863e6927c75ef01fba95e227843b

See more details on using hashes here.

File details

Details for the file nemo_bo-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: nemo_bo-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 130.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.4 Windows/10

File hashes

Hashes for nemo_bo-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 28bccd1ad14db5060d8e1ccb6de165ab376e9a4f542a4960eebb2d0d1b4416b4
MD5 ef63eb91fa47e28381905e36bee381df
BLAKE2b-256 a296f92557665df9d97019dbc008ba63f6c0d1b04dc85225b433b223ba6ffec4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page