Skip to main content

Ball & beam environments for OpenAI gym

Project description

PyPI version License

Ball & Beam Gym

Ball & beam simulation as OpenAI gym environments.


Installation

Run command:

pip install ballbeam-gym

or clone the repository and run the following inside the folder:

pip install -e .

System Dynamics

Simulated as a frictionless first order system that takes the beam angle as input. The equation that describe the system is as follows:

dx/dt = v(t)
dv/dt = -m*g*sin(theta(t))/((I + 1)*m)

visualization


Environments

  • BallBeamBalanceEnv - Objective is to not drop the ball from the beam.
  • BallBeamSetpointEnv - Objective is to keep the ball as close to a set position on the beam as possible.

BallBeamBalanceEnv

Ball is given a random initial velocity and it is the agents job to stabilize the ball on the beam.

Parameters

  • timestep - Length of a timestep.
  • beam_length - Length of beam.
  • max_angle - Max/min angle of beam.

Observation Space:

  • Beam angle
  • Ball position on beam.
  • Ball velocity

Action Space:

  • Beam angle

Rewards

A reward of +1 is given for each timestep ball stays on beam.

Reset

Resets when ball falls of beam.


BallBeamSetpointEnv

The agent's job is to keep the ball's position as close as possible to a setpoint.

Parameters

  • timestep - Length of a timestep.
  • setpoint - Ball setpoint position on beam (None for random).
  • beam_length - Length of beam.
  • max_angle - Max/min angle of beam.

Observation Space:

  • Beam angle
  • Ball position
  • Ball velocity
  • Setpoint position

Action Space:

  • Beam angle

Rewards

At each timestep the agent is rewarded with the squared proximity between the ball and the setpoint:

reward = (1 - (setpoint - ball_position)/beam_length)^2.

Reset

Resets when ball falls of beam.


API

The environments use the same API and inherits from OpenAI gyms.

  • step(action) - Simulate one timestep.
  • reset() - Reset environment to start conditions.
  • render() - Visualize one timestep.
  • seed(seed) - Make environment deterministic.

Example: PID Controller

import gym
import ballbeam_gym

# pass env arguments as kwargs
kwargs = {'time_step': 0.05, 
          'setpoint': 0.4,
          'beam_length': 1.0,
          'max_angle': 0.2}

# create env
env = gym.make('BallBeamSetpoint-v0', **kwargs)

# constants for PID calculation
Kp = 2.0
Kd = 1.0

# simulate 1000 steps
for i in range(1000):   
    # control theta with a PID controller
    theta = Kp*(env.bb.x - env.setpoint) + Kd*(env.bb.v)
    obs, reward, done, info = env.step(theta)
    env.render()

Example: Reinforcement Learning

import gym
import ballbeam_gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

# pass env arguments as kwargs
kwargs = {'time_step': 0.05, 
          'setpoint': 0.4,
          'beam_length': 1.0,
          'max_angle': 0.2}

# create env
#env = gym.make('BallBeamBalance-v0', **kwargs)
env = gym.make('BallBeamSetpoint-v0', **kwargs)

# train a mlp policy agent
env = DummyVecEnv([lambda: env])
model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=20000)

obs = env.reset()
env.render()

# test agent on 1000 steps
for i in range(1000):
    action, _ = model.predict(obs)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
        env.reset()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ballbeam_gym-0.0.1.tar.gz (5.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page