Skip to main content

Cross-platfrom Agent Benchmark Framework for Multimodal Embodied Language Model Agents

Project description

🦀 Crab: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

Overview

Crab is a framework for building LLM agent benchmark environments in a Python-centric way.

Key Features

🌐 Cross-platform

  • Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
  • Let the agent access all the environments in the same time through a unified interface.

⚙ ️Easy-to-use Configuration

  • Add a new action by simply adding a @action decorator on a Python function.
  • Deine the environment by integrating several actions together.

📐 Novel Benchmarking Suite

  • Define tasks and the corresponding evlauators in an intuitive Python-native way.
  • Introduce a novel graph evaluator method providing fine-grained metrics.

Installation

Prerequisites

  • Python 3.10 or newer
  • pip
pip install crab-framework[visual-prompt]

Examples

Run template environment with openai agent

You can run the examples using the following command.

export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py

Run desktop environment with openai agent

You can run the examples using the following command.

export OPENAI_API_KEY=<your api key>
python examples/desktop_env.py "Open Firefox"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crab_framework-0.1.0.tar.gz (29.4 kB view hashes)

Uploaded Source

Built Distribution

crab_framework-0.1.0-py3-none-any.whl (50.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page