Industrial Benchmark for OpenAI Gym
Industrial Benchmark for Gym
gym-industrial is a standalone Python re-implementation of the Industrial Benchmark (IB) for OpenAI Gym.
D. Hein et al., 2017
A benchmark environment motivated by industrial control problems.
In IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8).
pip install gym-industrial
To register the environments in Gym, simply import the package at any point before calling
import gym import gym_industrial env = gym.make(<environment id>, **kwargs)
The main environment is registered in Gym as
IndustrialBenchmark-v0. The IB's sub-dynamics have also been implemented as Gym environments. Each contributes with different challenges to the overall task.
|Industrial Benchmark||IndustrialBenchmark-v0||All of the following|
|Operational Cost||IBOperationalCost-v0||Delayed, blurred, nonlinear rewards|
|Mis-calibration||IBMisCalibration-v0||Partial observability, non-stationary dynamics|
|Fatigue||IBFatigue-v0||Heteroscedatisc noise, self-amplifying processes|
Dynamics as Stochastic Computation Graphs
The following are views of the Industrial Benchmark sub-dynamics, plus the reward function, as stochastic computation graphs (SCG).
The graphical notation used and the SCG definition are taken from Gradient Estimation Using Stochastic Computation Graphs.
Definition 1 (Stochastic Computation Graph). A directed, acyclic graph, with three types of nodes:
- Input nodes, which are set externally, including the parameters we differentiate with respect to.
- Deterministic nodes, which are functions of their parents.
- Stochastic nodes, which are distributed conditionally on their parents. Each parent v of a non-input node w is connected to it by a directed edge (v, w).
Squares denote deterministic nodes and circles, stochastic nodes. A special type of deterministic node, denoted by diamonds, indicates that a variable is a cost/reward and thus not part of the observation/state.
Node labels use the notation from the Industrial Benchmark paper and correspond to the variables in the equations therein.
The sub-dynamics of operational cost are influenced by the external driver setpoint p and two of the three steerings, velocity v and gain g.
The observation of operational cost is delayed and blurred by a convolution of past operational costs. In the graph below, denotes a vector of the past 10 values of the hidden operational cost, .
The motivation for this dynamical behavior is that it is non-linear, it depends on more than one influence, and it is delayed and blurred. All those effects have been observed in industrial applications, like the heating process observable during combustion.
The sub-dynamics of mis-calibration are influenced by external driver setpoint p and steering shift h. The goal is to reward an agent to oscillate in h in a pre-defined frequency around a specific operation point determined by setpoint p. Thereby, the reward topology is inspired by an example from quantum physics, namely Goldstone’s ”Mexican hat” potential.
The Goldstone potential-inspired reward is denoted below by the node for ease of presentation. Details of the function can be found in the implementation or in Appendix B of the paper.
Below is a visual description, taken from the paper, of the penalty landscape and oscillating dynamics.<center></center>
The sub-dynamics of fatigue are influenced by the same variables as the sub-dynamics of operational cost, i.e., setpoint p, velocity v, and gain g. The IB is designed in such a way that, when changing the steerings velocity v and gain g as to reduce the operational cost, fatigue will be increased, leading to the desired multi-criterial task, with two reward components showing opposite dependencies on the actions.
The following SCG highlights the complex stochasticity of the Fatigue dynamics. The random variables don't have dedicated equations in the paper, but are sampled as follows ( denotes the exponential distribution and , the logistic function).<center></center>
In the real-world tasks that motivated the IB, the reward function has always been known explicitly. In some cases it itself was subject to optimization and had to be adjusted to properly express the optimization goal. For the IB we therefore assume that the reward function is known and all variables influencing it are observable.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size gym-industrial-0.0.7.tar.gz (15.2 kB)||File type Source||Python version None||Upload date||Hashes View|