Skip to main content

A framework for evaluating and optimizing agents and models using sandboxed environments.

Project description

Marina

Marina banner

Marina is a fork of Harbor, a framework for evaluating and optimizing AI agents and language models. You can use Marina to:

  • Evaluate arbitrary agents like Claude Code, OpenHands, Codex CLI, and more.
  • Build and share your own benchmarks and environments.
  • Conduct experiments in thousands of environments in parallel through providers like Daytona and Modal.
  • Generate rollouts for RL optimization.
marina run -p examples/tasks/hello-screenshot --agent openhands-sdk --model openrouter/openai/gpt-5.5 --ae LLM_SUPPORTS_VISION=true

Changes

We track upstream Harbor closely, with a few additions of our own.

Agent kwargs (--ak key=value)

  • disable_builtin_tools — drop the default terminal/file-editor/task-tracker tools
  • disable_stuck_detection — turn off the SDK's stuck-agent detection

Environment variables (--ae NAME=value)

  • LLM_SUPPORTS_VISION — force vision support for models LiteLLM misclassifies
  • OPENROUTER_REASONING_ENABLED / _EXCLUDE — control OpenRouter reasoning (auto-on for Opus 4.7)
  • OPENROUTER_VERBOSITY — set OpenRouter verbosity
  • LLM_THINKING_DISPLAY — thinking display mode (default summarized)
  • SYSTEM_MESSAGE_SUFFIX / USER_MESSAGE_SUFFIX — append text to the OpenHands SDK system / user prompts
  • AWS_BEARER_TOKEN_BEDROCK — bearer-token auth for Bedrock

Vision support also writes MCP image observations to /logs/agent/trajectory-images/, referenced from the ATIF trajectory.

Installation

Marina is published to PyPI as chakra-marina.

Install the CLI. To put marina on your PATH:

uv tool install chakra-marina

Or with pip:

pip install chakra-marina

Add extras for cloud providers — e.g. chakra-marina[daytona], or chakra-marina[cloud] for all providers and chakra-marina[all] for everything.

Example: Running a task

Run the bundled hello-screenshot task locally with Docker. It's a vision smoke test: the agent calls an MCP take_screenshot tool, sees a solid-color image, and reports the color

export LLM_API_KEY=<YOUR-KEY>
marina run -p examples/tasks/hello-screenshot \
   --agent openhands-sdk \
   --model openrouter/openai/gpt-5.5 \
   --ae LLM_SUPPORTS_VISION=true \
   --ae SYSTEM_MESSAGE_SUFFIX="You run fully autonomously — never ask for confirmation."

--ae LLM_SUPPORTS_VISION=true enables vision (required for this task), and --ae SYSTEM_MESSAGE_SUFFIX="..." appends text to the OpenHands SDK system prompt. Both apply only to the openhands-sdk agent (see Changes).

To run on a cloud provider (like Daytona) instead of local Docker, pass the --env flag:

export LLM_API_KEY=<YOUR-KEY>
export DAYTONA_API_KEY=<YOUR-KEY>
marina run -p examples/tasks/hello-screenshot \
   --agent openhands-sdk \
   --model openrouter/openai/gpt-5.5 \
   --ae LLM_SUPPORTS_VISION=true \
   --env daytona

When running a whole benchmark (many tasks), raise --n-concurrent to fan out across hundreds or thousands of environments in parallel.

To see all supported agents, and other options run:

marina run --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chakra_marina-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chakra_marina-0.2.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file chakra_marina-0.2.0.tar.gz.

File metadata

  • Download URL: chakra_marina-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for chakra_marina-0.2.0.tar.gz
Algorithm Hash digest
SHA256 aef530d9f58ba299bbfe2b1b2b6de99bd8e6ad4408a13607878b784b64905d15
MD5 1f5e21389adfeb48894ea5fc088aba49
BLAKE2b-256 e37a82477b9deb8429edd235e2c7b75b2d3e7b1ac2837d5cdd05beca17bc0262

See more details on using hashes here.

File details

Details for the file chakra_marina-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chakra_marina-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d270239b439c5a72e8333041a06ce83c08bca0e0672e07ea99c9661a58727f01
MD5 29ba9ee7611dab9380511f9e1c2c104f
BLAKE2b-256 8ef45bedbc81c5dd8371abc76216ed89543cff17233e70ae3633d0c049fc8a13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page