Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.11.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.11.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.11.2.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.2.tar.gz
Algorithm Hash digest
SHA256 b21027c6010e199eb8aeb41950595d08be550aa6da8ab2d93f60c01bfa0caa9c
MD5 37d02e647372f8300a3ea34476f895a8
BLAKE2b-256 6040cab13ecf28a6ebc383a2d0b174e305348a8a3da2c45c33487b417ffe244a

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.2.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.11.2-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3730c5b6901611c1886099572b43a59a1ff2823bef15928f99ecd481eea806a6
MD5 6804cbf057b27dc340a7279354aead15
BLAKE2b-256 b60761fa277721ee667eca98f615840ceafd062b379eefbdc04a67065d695d93

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.2-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page