Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browsergym_assistantbench-0.14.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.14.0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.0.tar.gz
Algorithm Hash digest
SHA256 23971b41e4cc3d0835c008a5ab0c60bbeab26e03a0d766017e84ec181d07b089
MD5 f74980ed3ce4a2e074df1f58f81abf17
BLAKE2b-256 815263caaf1ca1153d432a0a9edf049de506c1d0dbb9a99fe364b901933beee1

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.0.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2411f8071dd43f3778b8a2b7f292cda5ac0c4087ca553b7ee8c783c8835efc7
MD5 b91873f43bfa53dd4bc7d9e6d2f7f6a1
BLAKE2b-256 2890c581537f508ecdf83d47de9688a5f50e64ed5ce6c18bc8abcd12cdd9c221

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page