Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.13.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file browsergym_assistantbench-0.13.0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.0.tar.gz
Algorithm Hash digest
SHA256 28081b58ee9d3a38c1d05317a11c60d981a7c163943dbe2a8e356cb889a48522
MD5 2f576a86845ca51e94da64c940d1a57e
BLAKE2b-256 9451575ed4ecc87c74e12d5448f00450d3619d9b6827e6277ed832c0a1a0b369

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.0.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.13.0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35d78f7e9d3bb03f5447f5e31b51b64455a172289e605da1d6eba9242bfae3e2
MD5 51268d0103b37cfd1d16157fe16e8d31
BLAKE2b-256 927b73c9b32d9b3ff3fbab56875b33868e311c1821b843afdc7b1ced76994fc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page