Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.10.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.10.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.10.2.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.2.tar.gz
Algorithm Hash digest
SHA256 de18eb7c010403d5d467b927b4713b56f6e97a59493bee4c42599d4d7cb54dce
MD5 37a09a2f64bc9166dd7ad0ff5d1c94dc
BLAKE2b-256 d9ae0c4b91beab9316662d3e10bc70142859a4ebb9f526c7b4ba3aae23405c1f

See more details on using hashes here.

File details

Details for the file browsergym_assistantbench-0.10.2-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 af0d3a3e23686066b070feca38f8740262bed6d65ccf9098f393334a005987c0
MD5 fd2e8037b6c20d3421aade2e26d9473b
BLAKE2b-256 fe9069bff6647419eb42575716368272da1a04779953b99b13d27eabd824107f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page