Gym environments that allow for coarse but fast testing of AI agents.

These details have not been verified by PyPI

Project links

Project description

gym-quickcheck

Many bugs and implementation errors can already be spotted by running the agent in relatively simple environments. This gym extension provides environments which run fast even on low spec VMs and can be used in Continuous Integration tests. This project aims to help improve code quality and stability of Reinforcement Learning algorithms by providing additional means for automated testing.

Installation

You can install the package using pip:

pip install gym-quickcheck

Quick Start

Random Walk

A random agent navigating the random walk environment, rendering a textual representation to the standard output:

import gym

env = gym.make('gym_quickcheck:random-walk-v0')
done = False
observation = env.reset()
while not done:
    observation, reward, done, info = env.step(env.action_space.sample())
    env.render()
    print(f"Observation: {observation}, Reward: {reward}")

Running the example should produce an output similar to this:

...
(Left)
#######
Observation: [0. 0. 0. 0. 0. 1. 0.], Reward: -1
(Right)
#######
Observation: [0. 0. 0. 0. 0. 0. 1.], Reward: 1

Alternation

A random agent navigating the alteration environment, rendering a textual representation to the standard output:

import gym

env = gym.make('gym_quickcheck:alternation-v0')
done = False
observation = env.reset()
while not done:
    observation, reward, done, info = env.step(env.action_space.sample())
    env.render()
    print(f"Observation: {observation}, Reward: {reward}")

Running the example should produce an output similar to this:

...
(Right)
##
Observation: [0 1], Reward: -0.9959229664071392
(Left)
##
Observation: [1 0], Reward: 0.8693727604523271

N-Knob

A random agent trying random values for the correct knob settings, rendering a textual representation to the standard output:

import gym

env = gym.make('gym_quickcheck:n-knob-v0')
done = False
observation = env.reset()
while not done:
    observation, reward, done, info = env.step(env.action_space.sample())
    env.render()
    print(f"Observation: {observation}, Reward: {reward}")

Running the example should produce an output similar to this:

...
Observation: [-1. -1. -1. -1. -1. -1. -1.], Reward: -1
(0.315/-0.791) (0.111/0.905) (-0.198/0.278) (-0.008/-0.918) (-0.848/0.477) (-0.447/0.510) (0.642/0.665)
Observation: [ 1. -1. -1.  1. -1. -1. -1.], Reward: -1
(0.315/0.648) (0.111/-0.968) (-0.198/0.666) (-0.008/0.404) (-0.848/0.652) (-0.447/-0.453) (0.642/-0.497)
Observation: [-1.  1. -1. -1. -1.  0.  1.], Reward: -1

Random Walk

This random walk environment is similar to the one described in Reinforcement Learning An Introduction. It differs in having max episode length instead of terminating at both ends, and in penalizing each step except the goal.

random walk graph

The agent receives a reward of 1 when it reaches the goal, which is the rightmost cell and -1 on reaching any other cell. The environment either terminates upon reaching the goal or after a maximum amount of steps. First, this ensures that the environment has an upper bound of episodes it takes to complete, making testing faster. Second, because the maximum negative reward has a lower bound that is reached quickly, reasonable baseline estimates should improve learning significantly. With baselines having such a noticeable effect, it makes this environment well suited for testing algorithms which make use of baseline estimates.

Alternation

The alteration environment is straightforward, as it just requires the agent to alternate between its two possible states to achieve the maximum reward.

alteration graph

The agent receives a normally distributed reward of 1 when switching from one state to the other, and a normally distributes penalty of -1 when staying in its current state. The environment terminates after a fixed amount of steps. This environment's rewards nicely scale linearly with performance. Meaning if the agent alternates one sequence more, it gets precisely one more reward. It makes it easier for agents not to get stuck at local minima. Hence most agents should be able to learn the optimal policy quickly. However, a random agent only achieves, on average, a total reward around zero. It makes this environment well suited for sanity checking algorithms making sure that they learn at all. By providing such a simple setup, it is also easier to comprehend any obvious problems an algorithm might have.

N-Knob

The knob environment initially chooses random floating-point values as the correct "knob" settings. The goal for the agent is to recover these settings. To accomplish this the environment gives hints to the direction of the correct value as observations. For instance, if the correct knob value is -0.3 and the agent sets the action value to 0.5 the observation would return -1 indicating to the agent that its value is too high. The environment is very simple to solve efficiently, however, a purely random agent can't solve it within the given time frame of 200 steps. This makes it a good testing environment to be used to check the learning behaviour of algorithms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.1

Feb 21, 2020

1.2.0

Feb 18, 2020

1.1.0

Oct 27, 2019

1.0.2

Oct 6, 2019

1.0.1

Oct 6, 2019

1.0.0

Oct 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gym_quickcheck-1.2.1.tar.gz (6.7 kB view details)

Uploaded Feb 21, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gym_quickcheck-1.2.1-py3-none-any.whl (9.1 kB view details)

Uploaded Feb 21, 2020 Python 3

File details

Details for the file gym_quickcheck-1.2.1.tar.gz.

File metadata

Download URL: gym_quickcheck-1.2.1.tar.gz
Upload date: Feb 21, 2020
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.7

File hashes

Hashes for gym_quickcheck-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`c593b43f1ed707d248a33dc1aad9e68954b075dd448c27453e9746c04f07a45a`
MD5	`9588bc75577349be7f3170a9002bb156`
BLAKE2b-256	`f3669131c260c8ca73dc5075bf6ae63307f049a6658f3b2cd54b976c94d2eadf`

See more details on using hashes here.

File details

Details for the file gym_quickcheck-1.2.1-py3-none-any.whl.

File metadata

Download URL: gym_quickcheck-1.2.1-py3-none-any.whl
Upload date: Feb 21, 2020
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.7

File hashes

Hashes for gym_quickcheck-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bff1ec216667d0ce123642ed9feb6105efd864cf6fde928ae4506cd8420a4781`
MD5	`04b1f92ca364c3b57bb6d243b015615a`
BLAKE2b-256	`d525b815725f46231bb39a98c29077a9d131296ca924718d3fb27efda3f7f8bb`

See more details on using hashes here.

gym-quickcheck 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gym-quickcheck

Installation

Quick Start

Random Walk

Alternation

N-Knob

Random Walk

Alternation

N-Knob

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes