AutoGen Testbed Tools

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

AutoGenBench

AutoGenBench is a tool for repeatedly running a set of pre-defined AutoGen scenarios in a setting with tightly-controlled initial conditions. With each run, AutoGen will start from a blank slate, working out what code needs to be written, and what libraries or dependencies to install. The results of each run are logged, and can be ingested by analysis or metrics scripts (see the HumanEval example later in this README). By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.

AutoGenBench is known to work with, all AutoGen 0.1., and 0.2. versions.

Installation and Setup

To get the most out of AutoGenBench, the autogenbench package should be installed. At present, the best way to do this is to git clone the autogen repository then from the repository root, execute:

pip install -e samples/tools/testbed

or, from within the samples/tools/testbed folder (e.g., if reading this README):

pip install -e .

After installation, you must configure your API keys. As with other AutoGen applications, AutoGenBench will look for the OpenAI keys in the OAI_CONFIG_LIST file in the current working directory, or the OAI_CONFIG_LIST environment variable. If neither are provided, it will user the environment variable OPENAI_API_KEY. This behavior can be overridden using a command-line parameter described later.

For some scenarios, additional keys may be required (e.g., keys for the Bing Search API). These can be added to an ENV.json file in the current working folder. An example ENV.json file is provided below:

{
    "BING_API_KEY": "xxxyyyzzz"
}

AutoGenBench also requires Docker (Desktop or Engine). It will not run in GitHub codespaces, unless you opt for native execution (with is strongly discouraged). To install Docker Desktop see https://www.docker.com/products/docker-desktop/.

Cloning Benchmarks

To clone an existing benchmark, simply run:

autogenbench clone [BENCHMARK]

For example,

autogenbench clone HumanEval

To see which existing benchmarks are available to clone, run:

autogenbench clone --list

Running AutoGenBench

To run a benchmark (which executes the tasks, but does not compute metrics), simply execute:

cd [BENCHMARK]
autogenbench run Tasks

For example,

cd HumanEval
autogenbench run Tasks

The default is to run each task once. To run each scenario 10 times, use:

autogenbench run --repeat 10 Tasks

The autogenbench command-line tool allows a number of command-line arguments to control various parameters of execution. Type autogenbench -h to explore these options:

'autogenbench run' will run the specified autogen scenarios for a given number of repetitions and record all logs and trace information. When running in a Docker environment (default), each run will begin from a common, tightly controlled, environment. The resultant logs can then be further processed by other scripts to produce metrics.

positional arguments:
  scenario      The JSONL scenario file to run. If a directory is specified,
                then all JSONL scenarios in the directory are run. (default:
                ./scenarios)

options:
  -h, --help    show this help message and exit

  -r REPEAT, --repeat REPEAT
                The number of repetitions to run for each scenario (default: 1).

  -c CONFIG, --config CONFIG
                The environment variable name or path to the OAI_CONFIG_LIST (default: OAI_CONFIG_LIST).

  --requirements REQUIREMENTS
                The requirements file to pip install before running the scenario.

  -d DOCKER_IMAGE, --docker-image DOCKER_IMAGE
                The Docker image to use when running scenarios. Can not be used together with --native.
                (default: 'autogen/testbed:default', which will be created if not present)

  -d DOCKER_IMAGE, --docker-image DOCKER_IMAGE
                The Docker image to use when running scenarios. Can not be used together with --native.
                (default: 'autogen/testbed:default', which will be created if not present)

  --native      Run the scenarios natively rather than in docker.
                NOTE: This is not advisable, and should be done with great caution.

Results

By default, the AutoGenBench stores results in a folder hierarchy with the following template:

./results/[scenario]/[instance_id]/[repetition]

For example, consider the following folders:

./results/default_two_agents_gpt35/two_agent_stocks/0 ./results/default_two_agents_gpt35/two_agent_stocks/1

...

./results/default_two_agents_gpt35/two_agent_stocks/9

This folder holds the results for the two_agent_stocks instance of the default_two_agents_gpt35 scenario. The 0 folder contains the results of the first run. The 1 folder contains the results of the second run, and so on. You can think of the instance as mapping to a prompt, or a unique set of parameters, while the scenario defines the template in which those parameters are input.

Within each folder, you will find the following files:

timestamp.txt: records the date and time of the run, along with the version of the pyautogen library installed
console_log.txt: all console output produced by Docker when running autogen. Read this like you would a regular console.
[agent]_messages.json: for each Agent, a log of their messages dictionaries
./coding: A directory containing all code written by Autogen, and all artifacts produced by that code.

Scenario Templating

All scenarios are stored in JSONL files (in subdirectories under ./scenarios). Each line of a scenario file is a JSON object. The schema varies slightly based on if "template" specifies a file or a directory.

If "template" points to a file, the format is:

{
   "id": string,
   "template": filename,
   "substitutions" {
       "find_string1": replace_string1,
       "find_string2": replace_string2,
       ...
       "find_stringN": replace_stringN
   }
}

For example:

{
    "id": "two_agent_stocks_gpt4",
    "template": "default_two_agents.py",
    "substitutions": {
        "\__MODEL\__": "gpt-4",
        "\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
    }
}

If "template" points to a directory, the format is:

{
   "id": string,
   "template": dirname,
   "substitutions" {
       "filename1": {
       	   "find_string1_1": replace_string1_1,
           "find_string1_2": replace_string1_2,
           ...
           "find_string1_M": replace_string1_N
       }
       "filename2": {
       	   "find_string2_1": replace_string2_1,
           "find_string2_2": replace_string2_2,
           ...
           "find_string2_N": replace_string2_N
       }
   }
}

For example:

{
    "id": "two_agent_stocks_gpt4",
    "template": "default_two_agents",
    "substitutions": {
	"scenario.py": {
            "\__MODEL\__": "gpt-4",
	},
	"prompt.txt": {
            "\__PROMPT\__": "Plot and save to disk a chart of NVDA and TESLA stock price YTD."
        }
    }
}

In this example, the string __MODEL__ will be replaced in the file scenarios.py, while the string __PROMPT__ will be replaced in the prompt.txt file.

Scenario Expansion Algorithm

When AutoGenBench runs a scenario, it creates a local folder to share with Docker. As noted above, each instance and repetition gets its own folder along the path: ./results/[scenario]/[instance_id]/[repetition]

For the sake of brevity we will refer to this folder as the DEST_FOLDER.

The algorithm for populating the DEST_FOLDER is as follows:

Pre-populate DEST_FOLDER with all the basic starter files for running a scenario.
Recursively copy the scenario folder (if template in the json scenario definition points to a folder) to DEST_FOLDER. If the template instead points to a file, copy the file, but rename it to scenario.py
Apply any templating, as outlined in the prior section.
Write a run.sh file to DEST_FOLDER that will be executed by Docker when it is loaded.

Scenario Execution Algorithm

Once the scenario has been expanded it is run (via run.sh). This script will execute the following steps:

If a file named global_init.sh is present, run it.
If a file named scenario_init.sh is present, run it.
Install the requirements file (if running in Docker)
Run the Autogen scenario via python scenario.py
Clean up (delete cache, etc.)
If a file named scenario_finalize.sh is present, run it.
If a file named global_finalize.sh is present, run it.
echo "SCENARIO COMPLETE !#!#", signaling that all steps completed.

Notably, this means that scenarios can add custom init and teardown logic by including scenario_init.sh and scenario_finalize.sh files.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.3

Mar 28, 2024

0.0.2

Mar 15, 2024

0.0.2a4 pre-release

Feb 16, 2024

0.0.2a3 pre-release

Feb 4, 2024

0.0.2a2 pre-release

Feb 3, 2024

0.0.2a1 pre-release

Jan 31, 2024

0.0.1

Jan 31, 2024

0.0.1a12 pre-release

Jan 26, 2024

0.0.1a11 pre-release

Jan 25, 2024

0.0.1a10 pre-release

Jan 24, 2024

0.0.1a9 pre-release

Jan 24, 2024

0.0.1a8 pre-release

Jan 19, 2024

0.0.1a7 pre-release

Jan 19, 2024

0.0.1a6 pre-release

Jan 19, 2024

0.0.1a5 pre-release

Dec 22, 2023

0.0.1a4 pre-release

Dec 22, 2023

This version

0.0.1a3 pre-release

Dec 21, 2023

0.0.1a2 pre-release

Dec 20, 2023

0.0.1a1 pre-release

Dec 18, 2023

0.0.1a0 pre-release

Dec 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogenbench-0.0.1a3.tar.gz (22.2 kB view details)

Uploaded Dec 21, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autogenbench-0.0.1a3-py3-none-any.whl (23.0 kB view details)

Uploaded Dec 21, 2023 Python 3

File details

Details for the file autogenbench-0.0.1a3.tar.gz.

File metadata

Download URL: autogenbench-0.0.1a3.tar.gz
Upload date: Dec 21, 2023
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for autogenbench-0.0.1a3.tar.gz
Algorithm	Hash digest
SHA256	`725b2764428f4c075984e8e8a38dd22a02ab8e4de6ae7355e58a586627165003`
MD5	`74b91478ba7ae0980862214d27825c7b`
BLAKE2b-256	`0c73f0e3ff6e243ac1203fd92338e6a196561746584d3a124f85402399814737`

See more details on using hashes here.

File details

Details for the file autogenbench-0.0.1a3-py3-none-any.whl.

File metadata

Download URL: autogenbench-0.0.1a3-py3-none-any.whl
Upload date: Dec 21, 2023
Size: 23.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for autogenbench-0.0.1a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9691311f8f0d9c8f688671e9f86efe4c1785aca3996abd208b005734f93ec5fc`
MD5	`2b30d414160a03575620837fc5186e18`
BLAKE2b-256	`ae87fd02672a0b96335c4a5c0a2ee618057d22ec32f6d9381ed16ad7c8a6c521`

See more details on using hashes here.

autogenbench 0.0.1a3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoGenBench

Installation and Setup

Cloning Benchmarks

Running AutoGenBench

Results

Scenario Templating

Scenario Expansion Algorithm

Scenario Execution Algorithm

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes