Skip to main content

Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models

Project description

☂️ BoCoEL

Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models

Logo

Publish Build Pages Formatting Type Checking Unit Testing

GitHub License Python 3.12

🤔 Why BoCoEL?

Large language models are expensive and slow behemoths, and evaluating them on gigantic modern datasets only makes it worse.

If only there is a way to just select a meaningful (and small) subset of the corpus and obtain a highly accurate evaluation.....

Wait, sounds like Bayesian Optimization!

Bocoel works in the following steps:

  1. Encode individual entry into embeddings (way cheaper / faster than LLM and reusable).
  2. Use Bayesian optimization to select queries to evaluate.
  3. Use the queries to retrieve from our corpus (with the encoded embeddings).
  4. Profit.

The evaluations generated are easily managed by the provided manager utility.

To our knowledge, this is the first work aiming to reduce computation costs during evaluation (benchmarking) with a (possibly dynamic) budget.

🚀 Features

  • 🎯 Accurately evaluate large language models with just tens of samples from your selected corpus.
  • 💂‍♂️ Uses the power of Bayesian optimization to select an optimal subset of samples for the language model to evaluate.
  • 💯 Evaluate the corpus on the model in addition to evaluating the model on the corpus.
  • 🤗 Support for GPT2, Pythia, LLAMA and more through integration with huggingface transformers and datasets
  • 🧩 Modular design.
  • 🔎 Efficient representation of the corpus / dataset such as N-sphere representation or whitening of the latent space to augment evaluation quality.

⭐ Give us a star!

Like what you see? Please consider giving this a star (★)!

♾️ Bayesian Optimization

Simply put, Bayesian optimization aims to optimize either the exploration objective (the purple area in the image) or the exploitation object (the height of the black dots). It uses Gaussian processes as a backbone for inference, and uses an acquisition function to decide where to sample next. See here for an a more in-depth introduction.

Since Bayesian optimization works well with an expensive-to-evaluate black-box model (paraphrase: LLM), it is perfect for this particular use case. Bocoel uses Bayesian optimization as a backbone for exploring the embedding space given by our corpus, which allows it to select a good subset acting as a mini snapshot of the corpus.

🏎️ Performance Implications

LLMs are painfully slow, especially generative ones (which is what is usually referred to as LLM), since sequence generation is sequential by nature.

Despite bocoel's requirement to use an embedder to encode the entire corpus, embedders are faster than LLMs by orders of magnitude and the time is gained back by practically any savings in evaluating LLMs.

⬇️ Installation

I don't want optional dependencies:

pip install bocoel

Give me the full experience (all optional dependencies):

pip install "bocoel[all]"

🔬 Usage

See the folder examples/getting_started for a simplistic usage of the library to get started with just a few lines of code.

✍️ Develop with BoCoEL

Usage examples are under the folder examples. API reference can be found here.

🥰 Contributing

Contributors wanted! Don't be shy. Feel free to file issues and PRs. For PRs, please follow the guide on contributing and the code of conduct. Openness and inclusiveness are taken very seriously.

🗺️ Roadmap: work in progress

  • 🪑 Simpler usage. I should provide a high-level wrapper for the entire library s.t. evaluations can be run in one line.
  • 📊 Visualization module of the evaluation.
  • 🎲 Integration of alternative methods (random, kmedoids...) with Gaussian process.
  • 🥨 Integration with more backends such as VLLM and OpenAI's API.
  • 🆕 Support for Python 3.12+

🏷️ License and Citation

The code is available under BSD-3 License.

If you find this project helpful in your research, please cite this work at

@misc{bocoel2024,
    title = {BoCoEL: Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models},
    url = {https://bocoel.rentruewang.com/research/},
    author = {Wang, RenChu},
    month = {January},
    year = {2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bocoel-0.1.4.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

bocoel-0.1.4-py3-none-any.whl (92.9 kB view details)

Uploaded Python 3

File details

Details for the file bocoel-0.1.4.tar.gz.

File metadata

  • Download URL: bocoel-0.1.4.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bocoel-0.1.4.tar.gz
Algorithm Hash digest
SHA256 2e7e4ddc7aa4b8c9359ad1846cd9c3c69fce953fb2ca3440e1b6baabc3f7c3b1
MD5 bfda60624adad0b05aa67d5dd1d5f1a9
BLAKE2b-256 3d774e2c0ca989c23591bd92928cda2c5a79d87e5a76baaa78db913b2c5ebf52

See more details on using hashes here.

Provenance

The following attestation bundles were made for bocoel-0.1.4.tar.gz:

Publisher: release.yaml on rentruewang/bocoel

Attestations:

File details

Details for the file bocoel-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: bocoel-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 92.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bocoel-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c6b7f5dcb5074358848d1e1d534a0060e72c0cd3624d6166aed26e1c21b8fbc4
MD5 0979b83a53863293c9a431948cc1a81c
BLAKE2b-256 7abd499d84506285b91ccffec535ae9b7153c77673ea0d2ae72be4e213338c91

See more details on using hashes here.

Provenance

The following attestation bundles were made for bocoel-0.1.4-py3-none-any.whl:

Publisher: release.yaml on rentruewang/bocoel

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page