Skip to main content

No project description provided

Project description

skinner

Skinner, a new framework of reinforcement learning by Python

It is built for the beginner of RL.

It is under development, the APIs are not designed perfectly, but runs stably. For grid worlds, it is mature enough.

Enjoy skinner!

Requrements

  • gym
  • numpy

Download

download from github, or pypi by pip command pip install skinner.

Design

We consider the observer design pattern. The env and agents in it observe each other generally. The agents observe the env to how to act and got the reward, env observe the agents and other objects to render the viewer and record the information.

Feature

so easy

Use

Quick start

run demo.py in examples.

other examples: demo1.py, demo2.py

Define envs

If you just want to build a simple env, then the following is an option, a grid world.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Demo of RL

An env with some traps and a gold.
"""

from skinner import *
from gym.envs.classic_control import rendering

from objects import *

class MyGridWorld(GridMaze, SingleAgentEnv):
    """Grid world
    
    A robot playing the grid world, tries to find the golden (yellow circle), meanwhile
    it has to avoid of the traps(black circles)
    Extends:
        GridMaze: grid world with walls
        SingleAgentEnv: there is only one agent
    """
    
    # configure the env
    
    # get the positions of the objects (done automatically)
    CHARGER = ...
    TRAPS = ...
    DEATHTRAPS = ...
    GOLD = ...

    def __init__(self, *args, **kwargs):
        super(MyGridWorld, self).__init__(*args, **kwargs)
        self.add_walls(conf['walls'])
        self.add_objects((*traps, *deathtraps, charger, gold))

    # Define the condition when the demo of rl will stop.
    def is_terminal(self):
        return self.agent.position in self.DEATHTRAPS or self.agent.position == self.GOLD or self.agent.power<=0

    def is_successful(self):
        return self.agent.position == self.GOLD

    # Following methods are not necessary, that only for recording the process of rl
    def post_process(self):
        if self.is_successful():
            self.history['n_steps'].append(self.agent.n_steps)
        else:
            self.history['n_steps'].append(self.max_steps)
        self.history['reward'].append(self.agent.total_reward)
        self.agent.post_process()

    def pre_process(self):
        self.history['n_steps'] = []
        self.history['reward'] = []

    def end_process(self):
        import pandas as pd
        data = pd.DataFrame(self.history)
        data.to_csv('history.csv')

Configure env and its objects

see conf.yaml for an example. The object classes would be defined in objects.py.

# Grid Maze: 
# n_cols * n_rows: size of the maze, the number of squares
# edge: the length of the edge of each square
# walls: the positions of walls as the components of the environment


## number of grids
n_cols: 7
n_rows: 7
## size of every grid
edge: 80


## positions of walls
walls: !!set
  {
  !!python/tuple [2, 6],
  !!python/tuple [3, 6],
  ...
  !!python/tuple [4, 2]}


## objects in environment (excluding the agent)
## traps, not terminal
traps: !!python/object:objects.ObjectGroup
  name: 'traps'
  members:
    - !!python/object:objects.Trap
      position: !!python/tuple [3, 5]
      color: [1,0.5,0]
      size: 30
    - !!python/object:objects.Trap
      position: !!python/tuple [1, 3]
      color: [1,0.5,0]
      size: 30
    - !!python/object:objects.Trap
      position: !!python/tuple [7, 1]
      color: [1,0.5,0]
      size: 30

## deathtraps, terminal
deathtraps: !!python/object:objects.ObjectGroup
  name: 'deathtraps'
  members:
    - !!python/object:objects.DeathTrap
      position: !!python/tuple [6, 5]
      color: [.8,0,0.5]
      size: 35

    - !!python/object:objects.DeathTrap
      position: !!python/tuple [2, 1]
      color: [.8,0,0.5]
      size: 35

## gold, terminal
gold: !!python/object:objects.Gold
  name: 'gold'
  position: !!python/tuple
    [7, 7]
  color: [1,0.8,0]
  size: 30

Define objects

  1. the shape of object (circle by default)
  2. the method to plot (don't override it, if the shape is simple)
class _Object(Object):
    props = ('name', 'position', 'color', 'size')
    default_position=(0, 0)  # set default value to help you reducing the codes when creating an object

    @property
    def coordinate(self):
        # the coordinate where the object is plotted
        ...

class Gold(_Object):
    def draw(self, viewer):
        '''this method is the most direct to determine how to plot the object
        You should define the shape and coordinate
        '''
        ...

class Charger(_Object):
    def create_shape(self):
        '''redefine the shape, here we define a squre with edges length of 40.
        The default shape is a circle
        '''
        a = 20
        self.shape = rendering.make_polygon([(-a,-a), (a,-a), (a,a), (-a,a)])
        self.shape.set_color(*self.color)

Define agents

  1. transition function $f(s,a)$
  2. reward function $r(s,a,s')$
from skinner import *

class MyRobot(StandardAgent):
    actions = Discrete(4)
    
    # define the shape
    size = 30
    color = (0.8, 0.6, 0.4)

    def _reset(self):
        # define the initial state
        ...
        
    def _next_state(self, state, action):
        """transition function: s, a -> s'
        """
        ...


    def _get_reward(self, state0, action, state1):
        """reward function: s,a,s'->r
        """
        ...


# define parameters
agent = MyRobot(alpha = 0.3, gamma = 0.9)

Example

codes

see scripts in examples

results

Commemoration

In memory of B. F. Skinner (1904-1990), a great American psychologist. The RL is mainly inspirited by his behaviorism.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skinner-0.1.2.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

skinner-0.1.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file skinner-0.1.2.tar.gz.

File metadata

  • Download URL: skinner-0.1.2.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Darwin/19.5.0

File hashes

Hashes for skinner-0.1.2.tar.gz
Algorithm Hash digest
SHA256 57631e1f275702b95e1b5d0a23b20276bf6756bab92edc24f17fc0bbb67a3169
MD5 0e666c45bfca70117286054aaaaf1f8d
BLAKE2b-256 558f05ae56eed785b3503a3db6ac4a4435fb27a9fa697939479dcb7d3542483e

See more details on using hashes here.

File details

Details for the file skinner-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: skinner-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.5 Darwin/19.5.0

File hashes

Hashes for skinner-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2688c2662f79108a19959ee4c30eb03b1069d3784948cee89c6f7cf580b371aa
MD5 c8ed0d1c9eef907e8493181af1395c1d
BLAKE2b-256 3aca4a77a9b330f7c22cdf275d7370962b566bfea354f05644c564cec9bd2e73

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page