Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.11.3.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file browsergym_assistantbench-0.11.3.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.3.tar.gz
Algorithm Hash digest
SHA256 236229e78de2adfe513a71b26f1b5ab527056f9346ff57cf810b7193f79a5083
MD5 c4bcc2f644f266303055b12b4fd85e77
BLAKE2b-256 f4fb85fd4ce0d1bd9e3f88ccd0efd25dabc7b4dd94b90516e5d952d7b1f57cc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.3.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.11.3-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e1bf335a6c5ab969aa6f660f980c0587447dcafb3f085a277a4fa6ef2ec23283
MD5 5195751e771717719e3e376176cd485a
BLAKE2b-256 c919a840cd3f3b68a5b8e139e15811c14ea7ee0b871af272d74639123d96bda7

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.3-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page