Skip to main content

Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents.

Project description

🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

arXiv Slack Discord Wechat Twitter

Overview

CRAB is a framework for building LLM agent benchmark environments in a Python-centric way.

Key Features

🌐 Cross-platform and Multi-environment

  • Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
  • Let the agent access all the environments in the same time through a unified interface.

⚙ ️Easy-to-use Configuration

  • Add a new action by simply adding a @action decorator on a Python function.
  • Deine the environment by integrating several actions together.

📐 Novel Benchmarking Suite

  • Define tasks and the corresponding evlauators in an intuitive Python-native way.
  • Introduce a novel graph evaluator method providing fine-grained metrics.

Installation

Prerequisites

  • Python 3.10 or newer
pip install crab-framework[client]

Experiment on CRAB-Benchmark-v0

All datasets and experiment code are in crab-benchmark-v0 directory. Please carefully read the benchmark tutorial before using our benchmark.

Examples

Run template environment with openai agent

export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py

Cite

Please cite our paper if you use anything related in your work:

@misc{xu2024crab,
      title={CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents}, 
      author={Tianqi Xu and Linyao Chen and Dai-Jie Wu and Yanjun Chen and Zecheng Zhang and Xiang Yao and Zhiqiang Xie and Yongchao Chen and Shilong Liu and Bochen Qian and Philip Torr and Bernard Ghanem and Guohao Li},
      year={2024},
      eprint={2407.01511},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.01511}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crab_framework-0.1.2.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

crab_framework-0.1.2-py3-none-any.whl (71.4 kB view details)

Uploaded Python 3

File details

Details for the file crab_framework-0.1.2.tar.gz.

File metadata

  • Download URL: crab_framework-0.1.2.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.0 Linux/6.5.0-1024-azure

File hashes

Hashes for crab_framework-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2cc51b27ec9a016345d38cf60b263f9c0e18d7e79adf9f3acbebec80d25490e8
MD5 ee1fd55eedd7b63dba5dc20939325312
BLAKE2b-256 5cb40a0496fc06c5c836b10a9d446a4953138e1cc980b6a8b36cac9fd7ec1796

See more details on using hashes here.

File details

Details for the file crab_framework-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: crab_framework-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 71.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.0 Linux/6.5.0-1024-azure

File hashes

Hashes for crab_framework-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc98f216d2b104e397dd6eb5f52a1630bec7b271b37765cf0211fd9d4ee2f603
MD5 f4854fb372097793329ad1c289fbe385
BLAKE2b-256 8ff76d211aec4a0eb6b694b5b9c74ad02608af7c8f6d561a2d93ad956cb1aed6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page