Skip to main content

Bayesian optimization of formulations via Adaptive Experimentation (Ax) platform.

Project description

Adaptive Experimentation (Ax) Platform for Chemistry

AxForChemistry is an unofficial wrapper for the Ax platform geared towards materials science/materials informatics and chemistry/cheminformatics optimization tasks where the datasets are often characterized as:

small, sparse, noisy, multiscale, heterogeneous, high-dimensional 1,
nonlinearly correlated, discontinuous, and nonlinearly constrained

The goal of this codebase is to expose state-of-the-art Bayesian optimization techniques to the materials informatics and cheminformatics communities for both experts and non-experts with minimal barriers to usage/modification while retaining advanced features. This is done through classes and scripts primarily based on real experimental and computational research within the Sparks group across a range of subjects in both industry and academia. While we are not affiliated with Ax, Ax developers have contributed extensively to development and troubleshooting that led to this codebase.

Why another materials adaptive design platform?

There are many existing domain- and non-domain-specific adaptive design packages. A nonexhaustive list of materials discovery resources is given as follows:

A nonexhaustive list of general optimization resources is given as follows:

RayTune, while geared primarily towards hyperparameter optimization, supports a wide variety of search algorithms including Ax and Dragonfly. We recommend looking through the descriptions to see which ones stand out to you.

So, why another platform? The answer lies in the features of the Adaptive Experimentation (Ax) platform. It is relatively easy-to-use, modular, well-documented, open-source, actively maintained and expanding, and importantly, contains state-of-the-art models for a wide variety of tasks. For example, Ax supports noisy, high-dimensional, multi-objective, nonlinearly constrained optimization.. all at once! However, simultaneously implementing these features for a single materials design problem is non-trivial; this is the motivation for our implementation, AxForChemistry. As mentioned before, our goal is to:

expose state-of-the-art Bayesian optimization techniques to the materials informatics and cheminformatics communities for both experts and non-experts with minimal barriers to usage/modification while retaining advanced features.

What are AxForChemistry's use-cases?

Inline with the course of development for this codebase, perhaps the best way to introduce its features is by describing the materials informatics tasks that brought it about. Each of the following tasks links to a tutorial. We encourage you to focus on the tutorials most relevant to your priorities. As an outline:

  • Neural Network Hyperparameter Optimization (crabnet_hyperopt.ipynb, Colab)
  • Industry: Multi-objective Optimization of Dental Resin Formulations (dental_bayesopt.ipynb, Colab)
  • ARPA-E: High-temperature Multi-Principal Element Alloy (MPEA) discovery using domain knowledge and predefined candidates (mpea_predefined.ipynb, Colab)
  • Industry: Maximize packing fraction for solid rocket fuel particle packing simulations under compositional constraints (particle_packing.ipynb, Colab)
  • CrabNet as a pseudo-materials discovery benchmark problem with fake compositional constraints (pseudo_discovery_validation.ipynb, Colab)
  • Experimental validation of materials discovery via Open Citrine Platform, DiSCoVeR, and AxForChemistry (expt_validation_comparison.ipynb, Colab)
  • Vickers Hardness adaptive design - let's consult the literature, again and again (hardness_literature.ipynb, Colab)
  • Sparse, multi-objective, heterogeneous, heteroskedastic, multi-fidelity Bayesian optimization (sparse_moo.ipynb, Colab)
  • Optimizing in a latent space: discovering high-performing crystal structures using VAEs (crystal_bayesopt_vae.ipynb, Colab)

Neural Network Hyperparameter Optimization

Let's begin with the first publication related to this work, a high-dimensional hyperparameter optimization study of 23 neural network hyperparameters, including both numerical and categorical parameters. We used a recently introduced high-dimensional Bayesian optimization scheme within the Ax platform called Sparse Axis-Aligned Subspaces Bayesian Optimization (SAASBO) to set a new state-of-the-art benchmark on a Matbench task (matbench_expt_gap) with no prior knowledge other than (generous) bounds on the search space. See the submission, notebook, and paper for additional details.

Baird, S. G.; Liu, M.; Sparks, T. D. High-Dimensional Bayesian Optimization of Hyperparameters for an Attention-Based Network to Predict Materials Property: A Case Study on CrabNet Using Ax and SAASBO. arXiv:2203.12597 [cond-mat] 2022.

Multi-objective Optimization of Dental Resin Formulations (Industry)

Dental resins are made up of monomer resins, fillers, dyes, and inhibitors.

Monomer optimization, without max_components constraint

We fix filler, dye, and inhibitor contributions and optimize over 16 distinct monomers in a continuous sense.

Monomer optimization, with max_components constraint

We apply the same optimization, except with the constraint that suggested candidates may contain no more than n components out of k monomers. This reframes the problem as an nchoosek problem where each of the k parameters is a continuous variable.

Multiple compositional constraints

Each of the categories (resins, fillers, dyes, and inhibitors) can be restricted to total contribution ranges as well as maximum allowable number of components. For example, we can restrict the total resin contribution to 15-30% and the total filler contribution to 50-70% while also constraining the total contribution of all components to 100%.

High-temperature Multi-Principal Element Alloy (MPEA) discovery using domain knowledge and predefined candidates (ARPA-E)

Discovering new, high-temperature multi-principal element alloys can help unlock a new generation of efficient turbine engines. We limit the search to a max of n elements out of k possible elements (nchoosek) where the individual component contributions vary continuously from 0 to 1.

Maximize packing fraction for solid rocket fuel particle packing simulations under compositional constraints (Industry)

Particle packing fraction of solid rocket fuel affects the combustion process through properties such as density, reactivity, surface area, and mechanical properties.

Concurrent optimization for a large, initial training set (15000)

To reduce memory consumption, an exact (as opposed to noisy) acquisition function is applied during the search for candidates with high packing fractions. RayTune's integrations with Ax are used to perform task scheduling. Out of a pool of CPUs, as soon as one CPU becomes inactive, it is assigned a new task based on all available data (including recently generated data). This maximization of resource efficiency is especially important since simulation times can range from 20 min to over 20 hours. A CPU that completes a 20 minute simulation can run additional, adaptively suggested tasks while another CPU continues to run a 20 hour simulation.

SAASBO, training from scratch

We test the performance of SAASBO on the particle packing simulations by hiding all training data, allowing SAASBO to search from scratch (SAASBO is limited to small datasets).

Multi-fidelity

Dragonfly is used to perform multi-fidelity optimization of the particle packing simulations. Multi-fidelity in this context refers to the fact that simulation results tend to converge when a larger number of particles is used (slow, high-fidelity) and tends to have more noise when a smaller number of particles is used (fast, low-fidelity). Dragonfly interprets number of particles as the fidelity parameter and seeks to maximize search efficiency by leveraging both (fast) low-fidelity and (slow) high-fidelity simulations running concurrently.

Maximize (qualitative) coating quality of metal coated polymers (Industry)

Electroless deposition of metals on polymers requires careful recipe generation to produce adhesive, uniform coatings. Researchers assign qualitative rankings of the coatings as the optimization objective.

CrabNet as a pseudo-materials discovery benchmark problem with fake compositional constraints

Here, we compare Ax models to other state-of-the-art techniques on a fake materials discovery validation problem.

Experimental validation of materials discovery via Open Citrine Platform, DiSCoVeR, and AxForChemistry

Vickers Hardness adaptive design - let's consult the literature, again and again

Rather than immediately venture to the laboratory upon obtaining suggested candidates, we implement a loop where we consult the literature for the top k suggested compounds and update the model/suggestions until n of the k compounds do not contain data within the literature.

Sparse, multi-objective, heterogeneous, heteroskedastic, multi-fidelity Bayesian optimization

Not implemented yet. How to deal with sparsity in the Ax framework? (open an issue).

Optimizing in a latent space: discovering high-performing crystal structures using VAEs

Not implemented yet. Consider using https://github.com/PV-Lab/FTCP.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axforchemistry-0.1.5.tar.gz (1.4 MB view hashes)

Uploaded Source

Built Distribution

axforchemistry-0.1.5-py3-none-any.whl (25.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page