Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.13.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.13.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.13.2.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.2.tar.gz
Algorithm Hash digest
SHA256 5432b8216307a836228472a9ebbac28f2b3e6d4e8a11986f269bb98ef7cc2281
MD5 f2076eec2c986f4e0df820c5f375f142
BLAKE2b-256 ad0575ac541f6a5fdcd3030eaa4105445910b456bd6410b7f94373d27d2b6531

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.2.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.13.2-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.2-py3-none-any.whl
Algorithm Hash digest
SHA256 05b6bfdbdc8e10af1e9fa89138c880e0668508f0480b7b8599279f0fd4f19f3c
MD5 25bcba3d52545ecc6d3acc214a36971d
BLAKE2b-256 b32db21e0832d01beff0e3b307fc708a03a3fdc7b3a527fdb0effb7ac13a3340

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.2-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page