Cross-platfrom Agent Benchmark Framework for Multimodal Embodied Language Model Agents
Project description
🦀 Crab: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents
Overview
Crab is a framework for building LLM agent benchmark environments in a Python-centric way.
Key Features
🌐 Cross-platform
- Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
- Let the agent access all the environments in the same time through a unified interface.
⚙ ️Easy-to-use Configuration
- Add a new action by simply adding a
@action
decorator on a Python function. - Deine the environment by integrating several actions together.
📐 Novel Benchmarking Suite
- Define tasks and the corresponding evlauators in an intuitive Python-native way.
- Introduce a novel graph evaluator method providing fine-grained metrics.
Installation
Prerequisites
- Python 3.10 or newer
- pip
pip install crab-framework[visual-prompt]
Examples
Run template environment with openai agent
You can run the examples using the following command.
export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py
Run desktop environment with openai agent
You can run the examples using the following command.
export OPENAI_API_KEY=<your api key>
python examples/desktop_env.py "Open Firefox"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crab_framework-0.1.0.tar.gz
(29.4 kB
view hashes)
Built Distribution
Close
Hashes for crab_framework-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 999f3fa8cf0ab1478091636dd994523c2a178e580aead8b00d34b6352ab00309 |
|
MD5 | 6ee5ab422a775d30d570e79de5df31c7 |
|
BLAKE2b-256 | c578387a8843dcab23fcdc054d46c6a933aaff3f3a0c5e2ff09300bca177ef9c |