Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.10.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.10.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.10.0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.0.tar.gz
Algorithm Hash digest
SHA256 b460730611c775804ebc8b10f95f19e6e3e82894e28d9b365b5fd3f55acb145d
MD5 5f72513b4faae0259c6e66339701fcb7
BLAKE2b-256 3ba54f27d1cbbbc354b3ebaa67da51763bb1e18e205a6e4ae853ac46348b11ce

See more details on using hashes here.

File details

Details for the file browsergym_assistantbench-0.10.0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3344aad026c1001cafdcb76ad92333b56099d68c68fedc77dc42a4b4fb163c18
MD5 24169708f3c4a3aa1c987e1673242173
BLAKE2b-256 6ccd602931a0da2b96f596f012acb7460bd2849d7345a8510c20a2ff310d228e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page