Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.3.dev4.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file browsergym_assistantbench-0.14.3.dev4.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev4.tar.gz
Algorithm Hash digest
SHA256 d94a5ffea6e850f0e350e1c37db7f08670b1384178c6fed648fdf81083bf3fb1
MD5 a1be961ad934b49080a87d33c5087c5b
BLAKE2b-256 4f3d9bcc50d8405fe2dbf5227a9e48b630c19099a14ee22f5d7166479a36b8f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev4.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.3.dev4-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev4-py3-none-any.whl
Algorithm Hash digest
SHA256 00cf62dcba409d7fc81eb9789aa31fe72c51019d3256773b9fbf4c64f3584d8a
MD5 6cb9a8493f478caa7cb92c42a925b4f6
BLAKE2b-256 0b54645c29ebd279cb0a87a177b8c3352ad11ff16aaeadbd40f305592a353df8

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev4-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page