Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.12.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file browsergym_assistantbench-0.12.0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.12.0.tar.gz
Algorithm Hash digest
SHA256 7cb474cf3af986c9c2481f6c1c7625aa1a6e5e9c921b8924ce86320e3f36cb13
MD5 87b3cf9a1ab55bc8e335dbb277ddcaca
BLAKE2b-256 4c4a19d0ff610fd915ab2efd558b16078d39adaa51d2b656b9d684cec208c905

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.12.0.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.12.0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e858399616e27b55acd4a828270dfd017153e9a05fc163283f7b596947eb5b4
MD5 bd0afee1c4c51b14166fc6df377aecd8
BLAKE2b-256 dcd81f73d1c29774993decb763faa00209385ca13e0d0ce9ee64c69c81b80f9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.12.0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page