Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.11.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.11.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.11.1.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.1.tar.gz
Algorithm Hash digest
SHA256 8ba63da221820edfcd362d064e40a20bb3259c8529c3fde3f2c05a818cffe8e2
MD5 8f83d6d5cb1332415666d0f34a5d7dd6
BLAKE2b-256 85694e6a25a4691223fc022b0cadc5fadaea07bb9c0eb9c9fd65a6382c2369cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.1.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.11.1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d802319fb784ac0486df0d6a44a0718b449107056ceb2e6880b4bbad8828fecc
MD5 d34c2ded1e9f7c84efe6b3ae94f6175a
BLAKE2b-256 76061163843e18b1309b6fbdbf3e1b7a3f02022cf1234700dc746c8ea0a8246e

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.1-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page