Skip to main content

BanditBench: A Bandit Benchmark to Evaluate Self-Improving LLM Algorithms

Project description

EVOLvE

PyPI version Python License Build Status

EVOLvE is a framework for experimenting with Large Language Models (LLMs) in multi-armed and contextual bandit scenarios. This repository contains the code to reproduce the results from the EVOLvE paper.

🚀 Features

  • Flexible framework for bandit experiments with LLMs
  • Support for both multi-armed and contextual bandit scenarios
  • Mixin-based design for highly customizable LLM agents
  • Built-in support for few-shot learning and demonstration
  • Includes popular benchmark environments (e.g., MovieLens)

🎯 Bandit Scenario Example

We provide two types of bandit scenarios:

  1. Multi-Armed Bandit Scenario

    • Classic exploration-exploitation problem with stochastic reward sampled from a fixed distributions
    • Agent learns to select the best arm without any contextual information
    • Example: Choosing between 5 different TikTok videos to show, without knowing which one is more popular at first
  2. Contextual Bandit Scenario

    • Reward distributions depend on a context (e.g., user features)
    • Agent learns to map contexts to optimal actions
    • Example: Recommending movies to users based on their age, location, and past viewing history (e.g., suggesting "The Dark Knight" to a 25-year-old who enjoys action movies and lives in an urban area)

Bandit Scenario Example

📋 Requirements

  • Python >= 3.9
  • TensorFlow (required for TensorFlow Datasets)
  • Other dependencies will be automatically installed

🛠️ Installation

Option 1: Install from PyPI (Recommended for Users)

pip install banditbench

Option 2: Install from Source (Recommended for Developers)

git clone https://github.com/yourusername/evolve.git
cd evolve
pip install -e .  # Install in editable mode for development

🎮 Quick Start

Using Existing Multi-Armed Bandit Scenarios

(Add code example here)

Using Contextual Bandit Scenarios

(Add code example here)

🧩 Architecture

Decision-Making Context

The framework represents decision-making contexts in three segments:

{Task Description + Instruction} (provided by the environment)
{Few-shot demonstrations from historical interactions}
{Current history of interaction} (decided by the agent)
{Query prompt for the next decision} (provided by the environment)

LLM Agents

We use a Mixin-based design pattern to provide maximum flexibility and customization options for agent implementation. This allows you to:

  • Combine different agent behaviors
  • Customize prompt engineering strategies
  • Implement new decision-making algorithms

🔧 Customization

Adding Custom Multi-Armed Bandit Scenarios

To create a custom bandit scenario:

  1. Inherit from the base scenario class
  2. Implement required methods (Add more specific instructions)

Creating Custom Agents

(Add instructions for creating custom agents)

⚠️ Known Issues

  1. TFDS Issues: There is a known issue with TensorFlow Datasets when using multiple Jupyter notebooks sharing the same kernel. The kernel may crash when loading datasets, even with different save locations.

  2. TensorFlow Dependency: The project currently requires TensorFlow due to TFDS usage. We plan to remove this dependency in future releases.

🤝 Contributing

We welcome contributions! Please start by reporting an issue or a feature request.

📄 License

This project is licensed under the [LICENSE NAME] - see the LICENSE file for details.

EVOLvE Framework Overview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

banditbench-0.0.2.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

banditbench-0.0.2-py3-none-any.whl (344.1 kB view details)

Uploaded Python 3

File details

Details for the file banditbench-0.0.2.tar.gz.

File metadata

  • Download URL: banditbench-0.0.2.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.9

File hashes

Hashes for banditbench-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c9f4c056468c04fcd9590ba35ba7f80ac3ae4af100abbacc7c7785b64edec55a
MD5 5ba56c4b48f5e398c46f70d78f44aad4
BLAKE2b-256 a51ee31bb41d916a4b340a711d96daba442239a892be6e1a7f11a4d208319c8b

See more details on using hashes here.

File details

Details for the file banditbench-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: banditbench-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 344.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.9

File hashes

Hashes for banditbench-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0cd0a02eb3e81e5f214f21939c7212d499c05c8b148c313f48bf5476d4df7b6d
MD5 96d75d3a7568c4f153bab6342030e475
BLAKE2b-256 57b34ec42d4aeca42a420c79d3fada25548ba7c3b9f3110cccdecb923ea9d6de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page