Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.14.3.dev1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file browsergym_assistantbench-0.14.3.dev1.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev1.tar.gz
Algorithm Hash digest
SHA256 733868bedf7fa27b5b24f2c612bcca7d1b6e9ff3be46ebb1e0fa81f5b6761744
MD5 a6b39880dad734b563bc2c4ee55d8970
BLAKE2b-256 307104b2930d35817e8c59d20dd2c4bc811ec99074b095dc26466e97108983bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev1.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browsergym_assistantbench-0.14.3.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.14.3.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 9eb75ac487b0b1d74f18ab234d8e5beaa72abe51065354cdac4457790b9ccc47
MD5 ca3f1b1a5cd277330b63dbfc9baee414
BLAKE2b-256 b5e835a65c75b6a2737232b4f863915631bd9db204bfe005b2488b59ef10b508

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.14.3.dev1-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page