Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.10.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.10.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.10.1.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.1.tar.gz
Algorithm Hash digest
SHA256 c2cacd70566dbecaa7da968b974a7afd637d6082a660f2a625ab6396c4d6bc6e
MD5 2b7d091455472707cccd07baa40f2317
BLAKE2b-256 c6a3e73feaf8e999df3985931c5a7564cf049732e628ff79081161a94267106a

See more details on using hashes here.

File details

Details for the file browsergym_assistantbench-0.10.1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9029afe1a716720b1e509f0328cec014d03edefb8867b68389d065a82e9f87f2
MD5 fc89379a39f31b6ab0848ed60265e971
BLAKE2b-256 1457cb54d966bf7d58956fa4d84f30e13dd2cf77dc71933c13e983f3b8b79f41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page