Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browsergym_assistantbench-0.14.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.14.1.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.1.tar.gz
Algorithm Hash digest
SHA256 3825be7a45ae7ba43a40ae6dc3c2d726aee1f39f7b4c7aa6f2f60bba2bf5690b
MD5 99cfe12a362ce0ea82b10dc62e985c99
BLAKE2b-256 7348dc7807e2f5496367bf7077a4a6064b3fc90769c14965bbdfbbea8e09e93b

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.1.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd5b3c587e757cd034a04c63dfb2ae4343cd436f0402f50f6dc166ea68ed180f
MD5 d29a6c93be725febb27d5a4500af5dcd
BLAKE2b-256 d1379c7c70dc55600ce45dd18d2defdf66328ab607096070f1130619a13b37bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.1-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page