Skip to main content

BayesOPT with LIFT

Project description

🤖 BO-LIFT: Bayesian Optimization using in-context learning

version paper MIT license

BO-LIFT does regression with uncertainties using frozen Large Language Models by using token probabilities. It uses LangChain to select examples to create in-context learning prompts from training data. By selecting examples, it can consider more training data than it fits in the model's context window. Being able to predict uncertainty, allow the employment of interesting techniques such as Bayesian Optimization.

Table of content

Install 📦

bolift can simply be installed using pip:

pip install bolift

Some additional requirements are needed to use the Gaussian Process Regressor (GPR) module. They can also be installed using pip:

pip install bolift[gpr]

Usage 💻

You need to set up your OpenAI API key in order to use BO-LIFT. You can do that using the os Python library:

import os
os.environ["OPENAI_API_KEY"] = "<your-key-here>"

Quickstart 🔥

bolift provides a simple interface to use the model.

# Create the model object
asktell = bolift.AskTellFewShotTopk()

# Tell some points to the model
asktell.tell("1-bromopropane", -1.730)
asktell.tell("1-bromopentane", -3.080)
asktell.tell("1-bromooctane", -5.060)
asktell.tell("1-bromonaphthalene", -4.35)

# Make a prediction
yhat = asktell.predict("1-bromobutane")
print(yhat.mean(), yhat.std())

This prediction returns $-2.92 \pm 1.27$.

Further improvements can be done by using Bayesian Optimization.

# Create a list of examples
pool_list = [
  "1-bromoheptane",
  "1-bromohexane",
  "1-bromo-2-methylpropane",
  "butan-1-ol"
]

# Create the pool object
pool=bolift.Pool(pool_list)

# Ask the next point
asktell.ask(pool)

# Output:
(['1-bromo-2-methylpropane'], [-1.284916344093158], [-1.92])

Where the first value is the selected point, the second value is the value of the acquisition function, and the third value is the predicted mean.

Let's tell this point to the model with its correct label and make a prediction:

asktell.tell("1-bromo-2-methylpropane", -2.430)

yhat = asktell.predict("1-bromobutane")
print(yhat.mean(), yhat.std())

This prediction returns $-1.866 \pm 0.012$. Which is closer to the label of -2.370 for the 1-bromobutane and the uncertainty also decreased.

Customising the model

bolift provides different models depending on the prompt you want to use. One example of usage can be seen in the following:

import bolift
asktell = bolift.AskTellFewShotTopk(
  x_formatter=lambda x: f"iupac name {x}",
  y_name="measured log solubility in mols per litre",
  y_formatter=lambda y: f"{y:.2f}",
  model="gpt-4",
  selector_k=5,
  temperature=0.7,
)

Other arguments can be used to customize the prompt (prefix, prompt_template, suffix) and the in-context learning procedure (use_quantiles, n_quantiles). Additionally, we implemented other models. A brief list can be seen below:

  • AskTellFewShotMulti;
  • AskTellFewShotTopk;
  • AskTellFinetuning;
  • AskTellRidgeKernelRegression;
  • AskTellGPR;
  • AskTellNearestNeighbor.

Refer to the notebooks available in the paper directory to see examples of how to use bolift and the paper for a detailed description of the classes.

Inverse design

Aiming to propose new data, bolift implements another approach to generate data. After following a similar procedure to tell datapoints to the model, the inv_predict can be used to do an inverse prediction. For carrying an inverse design out, we query the label we want and the model should generate a data that corresponds to that label:

data_x = [
"A 15 wt% tungsten carbide catalyst was prepared with Fe dopant metal at 0.5 wt% and carburized at 835 °C. The reaction was run at 280 °C, resulting in a CO yield of",
"A 15 wt% tungsten carbide catalyst was prepared with Fe dopant metal at 0.5 wt% and carburized at 835 °C. The reaction was run at 350 °C, resulting in a CO yield of",
...
]

data_y = [
1.66,
3.03,
...
]


for i in range(n):
  asktell.tell(data_x[i], data_y[i]

asktell.inv_predict(20.0)

The data for that is available in the paper directory. This generated the following procedure:

the synthesis procedure:"A 30 wt% tungsten carbide catalyst was prepared with Cu dopant metal at 5 wt% and carburized at 835 C. The reaction was run at 350 ºC"

Citation

Please, cite Ramos et al.:

@misc{ramos2023bayesian,
      title={Bayesian Optimization of Catalysts With In-context Learning}, 
      author={Mayk Caldas Ramos and Shane S. Michtavy and Marc D. Porosoff and Andrew D. White},
      year={2023},
      eprint={2304.05341},
      archivePrefix={arXiv},
      primaryClass={physics.chem-ph}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bolift-0.2.1.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

bolift-0.2.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file bolift-0.2.1.tar.gz.

File metadata

  • Download URL: bolift-0.2.1.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for bolift-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1b5b8b9cbc84aeda44c222df5a6c977fe6c5d3521eef5a0f355ae4e3d71f4ae7
MD5 8e5b9acbcd6f4f9f2598ee8147eef720
BLAKE2b-256 c6f5a45cab3db81785611f291aed573d9e3fd3a40a3e9c48cbeb3bbe28d5065d

See more details on using hashes here.

File details

Details for the file bolift-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: bolift-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for bolift-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da1dbdfa24af8600ff6caf474f2f147c12bdc144a5badc77a94cfa65154d55be
MD5 18432cc50ad0d7fa867f179150f30c0f
BLAKE2b-256 8e5b386285f657e7ae5f584968dcac934d01fe651cd2e4043a6ded22fda8cb24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page