Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name assistantbench.validation.3
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.13.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.13.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.13.1.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.1.tar.gz
Algorithm Hash digest
SHA256 b89b257c6aa3ae56bf856737558ce4c725cce49d4d2ce2ba578346ac4a02b7b5
MD5 c2168b553696dda1d958e1f019f089c8
BLAKE2b-256 201eb0711dbdab5fceb003e40278b9dad3d4b0ef8c4c08ac68e53eae637a1bc2

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.1.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.13.1-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.13.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31bc4e2d27feaff7b43495c384aedfbedf0c8c68da176e3f3d2d792d9778e3d9
MD5 96d73732f1be73413c2d59e50fbbd74e
BLAKE2b-256 d11342d8eebf0ff0855bd7aacac82ffcc397378566ea14376b2d92e2c374ae10

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.13.1-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page