Skip to main content

Benchmarking suite for evaluating autonomous agents in real-world domains.

Project description

pre-commit Code style: black

A2Perf is a benchmark for evaluating agents on sequential decision problems that are relevant to the real world. This repository contains code for running and evaluating participant's submissions on the benchmark platform.

Environments

A2Perf provides benchmark environments in the following domains:

  • Web Navigation - This environment facilitates the creation of compositional tasks represented by dependency graphs, where automatically generated websites are completed by the trained agent.
  • Quadruped Locomotion - This quadruped locomotion environment aims to teach a legged robot with 18 degrees of freedom to replicate animal-like behaviors by imitating real-world motion data to develop a diverse repertoire of skills.
  • Circuit Training - Chip floorplanning, a complex and traditionally manual process, has been addressed by Google's open-source Circuit Training framework, which uses reinforcement learning to optimize chip layouts for multiple objectives.

Installation

A2Perf can be installed on your local machine:

git clone https://github.com/Farama-Foundation/A2Perf.git
cd A2Perf
git submodule sync --recursive
git submodule update --init --recursive
pip install -e .[all]

Specific Package installation

To install specific packages, you can use the following commands:

pip install -e .[web_navigation]
pip install -e .[quadruped_locomotion]
pip install -e .[circuit_training]

Both x86-64 and Arch64 (ARM64) architectures are supported.
Please note that the Windows version is not as well-tested as Linux and macOS versions. It can be used for development and testing but if you want to conduct serious ( time and resource-extensive) experiments on Windows, please consider using Docker or WSL with Linux version.

API

Environments in A2Perf are registered under specific names for each domain and task. Here are the available environments:

  1. Quadruped Locomotion:

    • QuadrupedLocomotion-DogPace-v0
    • QuadrupedLocomotion-DogTrot-v0
    • QuadrupedLocomotion-DogSpin-v0
  2. Web Navigation:

    • WebNavigation-Difficulty-01-v0
    • WebNavigation-Difficulty-02-v0
    • WebNavigation-Difficulty-03-v0
  3. Circuit Training:

    • CircuitTraining-ToyMacro-v0
    • CircuitTraining-Ariane-v0

For example, you can create an instance of the WebNavigation-Difficulty-01-v0 environment as follows:

import gymnasium as gym

from a2perf.domains import web_navigation

env = gym.make("WebNavigation-DifficultyLevel-01-v0", num_websites=10, seed=0)

User Submission

A beginners guide to benchmarking with A2Perf is described here.

  • Users can pull the template repository at https://github.com/Farama-Foundation/a2perf-benchmark-submission
    • The submission repository must include:
      • train.py - defines a global train function with the following signature:
        def train():
          """Trains the user's model."""
        
      • inference.py - defines the following functions:
        def load_policy(env, **load_kwargs):
          """Loads a trained policy model from the specified directory."""
        def infer_once(policy, observation):
          """Runs a single inference step using the given policy and observation."""
        def preprocess_observation(observation):
          """Preprocesses a raw observation from the environment into a format compatible with the policy."""
        
      • requirements.txt - lists the required Python packages and their versions for running the user's code
      • __init__.py - an empty file that allows the submission to be imported as a Python module

Gin Configuration Files

Under a2perf/submission/configs, there are default gin configuration files for training and inference for each domain. These files define various settings and parameters for benchmarking.

Here's an example of an training.gin file for web navigation:

# ----------------------
# IMPORTS
# ----------------------
import a2perf.submission.submission_util

# ----------------------
# SUBMISSION SETUP
# ----------------------
# Set up submission object
Submission.mode = %BenchmarkMode.TRAIN
Submission.domain = %BenchmarkDomain.WEB_NAVIGATION
Submission.run_offline_metrics_only = False
Submission.measure_emissions = True

# ----------------------
# SYSTEM METRICS SETUP
# ----------------------
# Set up codecarbon for system metrics
track_emissions_decorator.project_name = 'a2perf_web_navigation_train'
track_emissions_decorator.measure_power_secs = 5
track_emissions_decorator.save_to_file = True  # Save data to file
track_emissions_decorator.save_to_logger = False  # Do not save data to logger
track_emissions_decorator.gpu_ids = None  # Enter list of specific GPU IDs to track if desired
track_emissions_decorator.log_level = 'info'  # Log level set to 'info'
track_emissions_decorator.country_iso_code = 'USA'
track_emissions_decorator.region = 'Massachusetts'
track_emissions_decorator.offline = True

Baselines

Baselines for all tasks are provided and are described in the article supporting A2Perf.

Environment Versioning

A2Perf keeps strict versioning for reproducibility reasons. All environments end in a suffix like "-v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion. This follows the Gymnasium convention.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2perf-0.1.0.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

a2perf-0.1.0-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file a2perf-0.1.0.tar.gz.

File metadata

  • Download URL: a2perf-0.1.0.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for a2perf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba43c30fb0aa96c8e4455889aa67d42a857c98b6a1f76346f8f8a0905a140563
MD5 0127ce64319671c937c7a8b7701f910b
BLAKE2b-256 8c7619d8587b5e34b717933eb7e1191f571b5f4fd6c12d3589e302d8bee30734

See more details on using hashes here.

File details

Details for the file a2perf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: a2perf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for a2perf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 51a77ffce2cd2cb1e8be16f32f437ffd756fcd223498cbe9198799391ffed85f
MD5 a1b108e5883f9c4f4981107c6d540769
BLAKE2b-256 42c007a7294ff1afe5258b55f2da22a7b2dcde61fb9c0bb68f102b3c0761c1d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page