Skip to main content

Agentic Research and Evaluation Suite

Project description

ARES: Agentic Research & Evaluation Suite

Documentation PyPI version License Discord

image

ARES is an RL-first framework for training and evaluating LLM agents, especially coding agents.

It is a modern gym: the environment layer powering RL research.

ARES treats LLMRequests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out example 3 to run this yourself.

Quick Start

Pre-requisites

  • Python >= 3.12

Getting Started

Install with uv:

uv add martian-ares

ARES comes packaged with useful presets for different code agent & environment configurations. List them with:

uv run python -c "import ares; print(ares.list_presets())"

You can get started by using this minimal loop to run mini-swe-agent on SWE-bench Verified sequentially.

Note: to run this particular example you will need:

  • Docker (with the daemon running)
  • A Martian API key (see below)
import asyncio

import ares
from ares import llms

async def main():
    # This requires `CHAT_COMPLETION_API_KEY` to be set with a Martian API key--see below.
    agent = llms.ChatCompletionCompatibleLLMClient(model="openai/gpt-5-mini")

    async with ares.make("sbv-mswea") as env:
        ts = await env.reset()
        while not ts.last():
            action = await agent(ts.observation)   # observation = LLM request
            ts = await env.step(action)            # action = LLM response
            print(f"{action}\n{ts}")

if __name__ == "__main__":
    asyncio.run(main())

To run the example above you'll need a Martian API key set in your .env file. To get a key:

  1. Go to https://app.withmartian.com
  2. on the Billing tab, add a payment method + top up some credits.
  3. on the API Keys tab create an API key.
  4. write CHAT_COMPLETION_API_KEY={your-key} in your .env

Alternatively, you can use another chat completions-compatible endpoint by setting both:

  • CHAT_COMPLETION_API_BASE_URL
  • CHAT_COMPLETION_API_KEY

Next Steps

  1. Check out the examples
  2. Read the docs to understand ARES and its key abstractions
  3. Read our blog post about why ARES and what we hope to see

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

martian_ares-0.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

martian_ares-0.1.0-py3-none-any.whl (141.8 kB view details)

Uploaded Python 3

File details

Details for the file martian_ares-0.1.0.tar.gz.

File metadata

  • Download URL: martian_ares-0.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for martian_ares-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18cca8ad2d1c862843af3c9066f0982434e0c0fa7cd0c7a99b5cd35e9336b31c
MD5 d0ba2d9ef653cc0c6951237d6964c079
BLAKE2b-256 964f6e6bddb577c690e8cccc4fe40ba4ca1e0ac554fc3dae28f8a2f781f4bea5

See more details on using hashes here.

File details

Details for the file martian_ares-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: martian_ares-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 141.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for martian_ares-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50d6a23c7bc6704b517f6bfc99b5f38aa35bb9c4b73f89fd75a1428bd03df39b
MD5 edd6fb2bcf4720ee9abaebbc2ba499a8
BLAKE2b-256 036b2e792a0d16d5143fcbfc000cf11b5923c1662881ebb7d65bebbadd975797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page