Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.3.dev0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browsergym_assistantbench-0.14.3.dev0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.14.3.dev0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev0.tar.gz
Algorithm Hash digest
SHA256 2c4943e64fe5ac7e6f7819b20def2e57c0019ed2fe0030e89b06eb45e396d96e
MD5 a2903c968375bda4a2f58994b01502d4
BLAKE2b-256 09fcccd7f3e83fb64b7ff951acb556fc65a11724be16ec3d9a0e33ca8b6104ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev0.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.3.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 c565eee45af25f47895770d735be3e524e6518ba78e8ac7785cf906224f4fac5
MD5 2976fd19313435a800dbf15c08cd6713
BLAKE2b-256 1585d473d6b290360c045d5a7beaeba2263026bcc30412d57e10494a7c2de8df

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page