Skip to main content

Agentic Research and Evaluation Suite

Project description

ARES

ARES (Agentic Research and Evaluation Suite) is an RL-first framework for training and evaluating agents.

Quick Start

Get ARES running in minutes - no API keys required!

Prerequisites

  • Python 3.12 or higher
  • Docker - For running code agents in containers
  • uv - Fast Python package installer and resolver

To install uv, follow the instructions at https://docs.astral.sh/uv/getting-started/installation/

Installation

For now, we recommend running ARES locally from this directory:

uv sync --all-groups

and you're ready to get started.

Your First ARES "Agent"

No API keys needed!

Run the Hello World example to see the RL loop in action:

uv run -m examples.00_hello_world

This example uses:

  • ✓ Local Docker containers (no cloud account needed)
  • ✓ A mock LLM (no API keys needed)

You'll see how ARES treats code agent interactions as a reinforcement learning problem, with LLM requests as observations and LLM responses as actions.

Examples

ARES includes several examples that demonstrate different usage patterns:

1. Minimal Loop (Local Docker + Real LLM)

File: examples/01_minimal_loop.py What you'll need: Docker, Martian API key

# Set up your API key first (see Cloud Setup below)
uv run -m examples.01_minimal_loop

Shows the RL loop with a real LLM (via Martian API) and local Docker containers.

2. Local LLM (Fully Local)

File: examples/02_local_llm.py What you'll need: Docker

uv run -m examples.02_local_llm

Demonstrates running ARES completely locally using a local LLM (Qwen2.5-3B-Instruct). No cloud services required.

Cloud Setup (Optional)

For production use or larger-scale experiments, you can use cloud containers and API-based LLMs.

Option 1: Using Martian API for LLM Inference

  1. Create an account at https://app.withmartian.com
  2. Copy the example environment file: cp .env.example .env
  3. Add your Martian API key: CHAT_COMPLETION_API_KEY=your_key_here

Option 2: Using Daytona for Cloud Containers

By default, ARES uses Daytona for container management. To set this up:

  1. Create a Daytona account at https://www.daytona.io
  2. Copy the example environment file: cp .env.example .env
  3. Add your Daytona credentials:
    • DAYTONA_API_KEY=your_key_here
    • DAYTONA_API_URL=your_url_here

See .env.example for all available configuration options.

API Usage

ARES environments use an async version of the dm_env spec. Here's a complete example:

import asyncio

from ares.code_agents import mini_swe_agent
from ares.containers import docker  # Use local Docker, or import daytona for cloud
from ares.environments import swebench_env
from ares.llms import chat_completions_compatible


async def main():
    # Create an LLM client (requires CHAT_COMPLETION_API_KEY in .env)
    llm_client = chat_completions_compatible.ChatCompletionCompatibleLLMClient(
        model="openai/gpt-4o-mini"
    )

    # Load SWE-bench tasks
    all_tasks = swebench_env.swebench_verified_tasks()
    tasks = [all_tasks[0]]  # Run on only one task for now

    # Create environment with local Docker and MiniSWE agent
    async with swebench_env.SweBenchEnv(
        tasks=tasks,
        container_factory=docker.DockerContainer,  # Use local Docker
        code_agent_factory=mini_swe_agent.MiniSWECodeAgent,
    ) as env:
        # The RL loop
        ts = await env.reset()
        while not ts.last():
            # Environment sends observation (LLM request) to agent
            action = await llm_client(ts.observation)

            # Environment processes action (LLM response) and returns next state
            ts = await env.step(action)
            print(f"Step complete. Reward: {ts.reward}")


if __name__ == "__main__":
    asyncio.run(main())

This example uses:

  • Container backend: Local Docker (change to daytona.DaytonaContainer for cloud)
  • LLM backend: Martian API (or any OpenAI-compatible API)
  • Code agent: MiniSWE agent from the mini-swe-agent library

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

martian_ares-0.0.1.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

martian_ares-0.0.1-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file martian_ares-0.0.1.tar.gz.

File metadata

  • Download URL: martian_ares-0.0.1.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for martian_ares-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b1a45dfbd20f8017e4b9852ac06757e0c0ea61589f0528168a3a918724f8cfac
MD5 f6a47cb60bb05da3f7df5d84a3d12f2d
BLAKE2b-256 8941e59fa30f197eebc22e413e47101bb913481336eb7f290939198c5b268f0a

See more details on using hashes here.

File details

Details for the file martian_ares-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: martian_ares-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for martian_ares-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bfdf50357589e871334441b44b9a1d6e617b6d89d1f84cdace9e2ef96f859654
MD5 bdf8e0276b4f0d952ac98b574e0a3a9e
BLAKE2b-256 f7052aa4344301d0cbae6f64632f547d6ebd7e1b5acd33fa936b4f6c6c4cbec9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page