Skip to main content

Bipedal Skills RL Benchmark

Project description

The Bipedal Skills Benchmark

The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.

Tasks Overview

This benchmark was introduced in Hierarchial Skills for Efficient Exploration.

Usage

In order to run the environments, a working MuJoCo setup (version 2.0 or higher) is required. You can follow the respective installation steps of dm_control for that.

Afterwards, install the Python package with pip:

pip install bipedal-skills

To install the package from a working copy, do:

pip install .

All tasks are exposed and registered as Gym environments once the bisk module is imported:

import gym
import bisk

env = gym.make('BiskHurdles-v1', robot='Walker')
# Alternatively
env = gym.make('BiskHurdlesWalker-v1')

A detailed description of the tasks can be found in the corresponding publication.

Evaluation Protocol

For evaluating agents, we recommend estimating returns on 50 environment instances with distinct seeds. This can be acheived in sequence or by using one of Gym's vector wrappers:

# Sequential evaluation
env = gym.make('BiskHurdlesWalker-v1')
retrns = []
for i in range(50):
  obs, _ = env.reset(seed=i)
  retrn = 0
  while True:
    # Retrieve `action` from agent
    obs, reward, terminated, truncated, info = env.step(action)
    retrn += reward
    if terminated or truncated:
      # End of episode
      retrns.append(reward)
      break
print(f'Average return: {sum(retrns)/len(retrns)}')

# Batched evaluation
from gym.vector import SyncVectorEnv
import numpy as np
n = 50
env = SyncVectorEnv([lambda: gym.make('BiskHurdlesWalker-v1')] * n)
retrns = np.array([0.0] * n)
dones = np.array([False] * n)
obs, _ = env.reset(seed=0)
while not dones.all():
    # Retrieve `action` from agent
    obs, reward, terminated, truncated, info = env.step(action)
    retrns += reward * np.logical_not(dones)
    dones |= (terminated | truncated)
print(f'Average return: {retrns.mean()}')

License

The bipedal skills benchmark is MIT licensed, as found in the LICENSE file.

Model definitions have been adapted from:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bipedal-skills-2.0.tar.gz (140.9 kB view details)

Uploaded Source

Built Distribution

bipedal_skills-2.0-py3-none-any.whl (154.9 kB view details)

Uploaded Python 3

File details

Details for the file bipedal-skills-2.0.tar.gz.

File metadata

  • Download URL: bipedal-skills-2.0.tar.gz
  • Upload date:
  • Size: 140.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bipedal-skills-2.0.tar.gz
Algorithm Hash digest
SHA256 c5fd54a8dbd483434054f8d7c09ccf628a67aea06c12416b2a4afd097b91478e
MD5 307b4f57ef70c43bcd1d40b75435de53
BLAKE2b-256 c056fd141d5d1ab47cf82bbe8488e37357a0dc6d2e533514a3394fbd77a7c2fd

See more details on using hashes here.

File details

Details for the file bipedal_skills-2.0-py3-none-any.whl.

File metadata

  • Download URL: bipedal_skills-2.0-py3-none-any.whl
  • Upload date:
  • Size: 154.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bipedal_skills-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 969b53e994d561f9d2ca353f02727757e715773d09c29255f88b481b6f1f02ea
MD5 46c94a0df8bb604b8864eafd708ea5f9
BLAKE2b-256 89a41afacfe7cfe593428d45932b061752ec8ed79318cd2bbe252772c21ce551

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page