Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browsergym_assistantbench-0.14.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.14.2.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.2.tar.gz
Algorithm Hash digest
SHA256 0c76833a1ca0713b2da0b33d62b621677a1b6b8e58733255d052a40f24dbf0ab
MD5 607138cf55e8a09678ee4b75045f5e30
BLAKE2b-256 fea15cc6441acf3dc9d8fa27db028d1852984b0f8962f6f307b7f4b915fa029f

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.2.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.2-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f137abe167f2d6287d7eb125a68eee0f3d63da365b34a70798993638de41139e
MD5 44d5cc46f5d3d279378732308246907f
BLAKE2b-256 89bb9b8884365606a270632607833d9a40c1d7ee45b7990c34a90ecf17d4ff9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.2-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page