Skip to main content

AssistantBench benchmark for BrowserGym

Project description

AssistantBench <> BrowserGym

This package provides an implementation for using the AssistantBench benchmark in BrowserGym.

Because AssistantBench includes open-ended tasks, setup is extremely easy and simply requires installing the package.

Please note that AssistantBench has a hidden test set, so test set predictions will need to be uploaded to the official leaderboard.

Setting up

  • Install the package (this is still a wip)
pip install browsergym-assistantbench
  • Run inference, e.g., run the following commands for demo on a simple toy task
python demo_agent/run_demo.py --task_name ab.imp.0
  • Test set predictions will be saved to ./assistantbench-predictions-test.jsonl. To evaluate on the official test set, upload these predictions to the official leaderboard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browsergym_assistantbench-0.11.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

browsergym_assistantbench-0.11.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file browsergym_assistantbench-0.11.0.tar.gz.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.0.tar.gz
Algorithm Hash digest
SHA256 7fc6cc52619d9ba5f3e273f0b33d26112720d1e846448588c7fa028b0b26cba1
MD5 5a24fdfaa78480ba8170fdd47f39b596
BLAKE2b-256 70697051e25b8a85769a9a80853700380c68d7f8c289e29d10c113ce00a92df0

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.0.tar.gz:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

File details

Details for the file browsergym_assistantbench-0.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for browsergym_assistantbench-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c93a3cb40cb33cc601838d195f8307c10483a502b88f8629b0cf248b0d6752c
MD5 49f051f253b7a4b867591f9c035a8837
BLAKE2b-256 90c7c939ca3d0f611d9ad55b64871c6239c1b92d5fbe9e307a328a4f99b16728

See more details on using hashes here.

Provenance

The following attestation bundles were made for browsergym_assistantbench-0.11.0-py3-none-any.whl:

Publisher: pypi.yml on ServiceNow/BrowserGym

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page