Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents.
Project description
🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents
Overview
CRAB is a framework for building LLM agent benchmark environments in a Python-centric way.
Key Features
🌐 Cross-platform and Multi-environment
- Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
- Let the agent access all the environments in the same time through a unified interface.
⚙ ️Easy-to-use Configuration
- Add a new action by simply adding a
@action
decorator on a Python function. - Deine the environment by integrating several actions together.
📐 Novel Benchmarking Suite
- Define tasks and the corresponding evlauators in an intuitive Python-native way.
- Introduce a novel graph evaluator method providing fine-grained metrics.
Installation
Prerequisites
- Python 3.10 or newer
pip install crab-framework[client]
Experiment on CRAB-Benchmark-v0
All datasets and experiment code are in crab-benchmark-v0 directory. Please carefully read the benchmark tutorial before using our benchmark.
Examples
Run template environment with openai agent
export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py
Cite
Please cite our paper if you use anything related in your work:
@misc{xu2024crab,
title={CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents},
author={Tianqi Xu and Linyao Chen and Dai-Jie Wu and Yanjun Chen and Zecheng Zhang and Xiang Yao and Zhiqiang Xie and Yongchao Chen and Shilong Liu and Bochen Qian and Philip Torr and Bernard Ghanem and Guohao Li},
year={2024},
eprint={2407.01511},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2407.01511},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crab_framework-0.1.2.tar.gz
(41.8 kB
view details)
Built Distribution
File details
Details for the file crab_framework-0.1.2.tar.gz
.
File metadata
- Download URL: crab_framework-0.1.2.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.0 Linux/6.5.0-1024-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cc51b27ec9a016345d38cf60b263f9c0e18d7e79adf9f3acbebec80d25490e8 |
|
MD5 | ee1fd55eedd7b63dba5dc20939325312 |
|
BLAKE2b-256 | 5cb40a0496fc06c5c836b10a9d446a4953138e1cc980b6a8b36cac9fd7ec1796 |
File details
Details for the file crab_framework-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: crab_framework-0.1.2-py3-none-any.whl
- Upload date:
- Size: 71.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.0 Linux/6.5.0-1024-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc98f216d2b104e397dd6eb5f52a1630bec7b271b37765cf0211fd9d4ee2f603 |
|
MD5 | f4854fb372097793329ad1c289fbe385 |
|
BLAKE2b-256 | 8ff76d211aec4a0eb6b694b5b9c74ad02608af7c8f6d561a2d93ad956cb1aed6 |