Cross-platfrom Agent Benchmark Framework for Multimodal Embodied Language Model Agents
Reason this release was yanked:
Doesn't support Python 3.9
Project description
🦀 Crab: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents
Overview
Crab is a framework for building LLM agent benchmark environments in a Python-centric way.
Key Features
🌐 Cross-platform
- Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
- Let the agent access all the environments in the same time through a unified interface.
⚙ ️Easy-to-use Configuration
- Add a new action by simply adding a
@action
decorator on a Python function. - Deine the environment by integrating several actions together.
📐 Novel Benchmarking Suite
- Define tasks and the corresponding evlauators in an intuitive Python-native way.
- Introduce a novel graph evaluator method providing fine-grained metrics.
Installation
Prerequisites
- Python 3.10 or newer
- pip
pip install crab-framework[visual-prompt]
Examples
Run template environment with openai agent
You can run the examples using the following command.
export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py
Run desktop environment with openai agent
You can run the examples using the following command.
export OPENAI_API_KEY=<your api key>
python examples/desktop_env.py "Open Firefox"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crab_framework-0.1.1.tar.gz
(29.4 kB
view hashes)
Built Distribution
Close
Hashes for crab_framework-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d18e2a38bf41e2b100c1b6bfeec4c17f74f7c96db5a8c731ce671ef3c5993b41 |
|
MD5 | 52181f0f0d4ce16fb820d11726d57f33 |
|
BLAKE2b-256 | 3d326185e7fffc3d274070bfaad8b9695defe7cd9d11a95b28ba3b006b63f8ed |