Skip to main content

A package to provide RL capability in tasks.

Project description

skyline_rl_lab

We are going to implement and do experiments on RL algorithms in this repo to facilitate our research and tutoring purposes. Below we are going to explain how use this repo with a simple example.

Environment

For RL (Reinforcement learning) to work, we need an environment to interact with. From Skyline lab, We can list supported environment as below:

>>> from skyline import lab
>>> lab.list_env()
===== GridWorld =====
This is a environment to show case of Skyline lab. The environment is a grid world where you can move up, down, right and leftif you don't encounter obstacle. When you obtain the reward (-1, 1, 2), the game is over. You can use env.info() to learn more.

Then We use function make to create the desired environment. e.g.

>>> grid_env = lab.make(lab.Env.GridWorld)
>>> grid_env.info()
- environment is a grid world
- x means you can't go there
- s means start position
- number means reward at that state
===========
.  .  .  1
.  x  . -1
.  .  .  x
s  x  .  2
===========

Avaiable actions are indicated as follows:

>>> grid_env.available_actions()
['U', 'D', 'L', 'R']

To get the current state of an environment:

>>> grid_env.current_state
GridState(i=3, j=0)

In this specific scenario, the starting position (s) is located at coordinates (3, 0).

Let's take an action and check how the state changes in the environment:

>>> grid_env.step('U')  # Take action 'Up'
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)

>>> grid_env.current_state  # Get current state
GridState(i=2, j=0)

After taking action U, we expect the i-axis to move up from 3->2 and we can confirm this from the return action result. Let's reset the environment by calling the reset method which will bring the state of environment back to its initial state GridState(i=3, j=0):

>>> grid_env.reset()
>>> grid_env.current_state
GridState(i=3, j=0)

Experiments of RL algorithms

Here we are going to test some well-known RL algorithms and demonstrate the usage of this lab. All RL methods we are going to implement must implement proto RLAlgorithmProto in rl_protos.py. We will take a look at the implementation of some RL methods to see how they are used.

Monte Carlo Method

In this method, we simply simulate many trajectories (decision processes), and calculate the average returns. (wiki page)

We implement this algorithm in monte_carlo.py. The code snippet below will initialize this RL method:

>>> from skyline.alg import monte_carlo
>>> mc_alg = monte_carlo.MonteCarlo()

Each RL method object will support method fit to learn from the given environment object. For example:

>>> mc_alg.fit(grid_env)

Then we can leverage utility gridworld_utils.py to print out the learned RL knowledge. Below is the learned value function from the Monte Carlo method:

>>> from skyline.lab import gridworld_utils
>>> gridworld_utils.print_values(mc_alg._state_2_value, grid_env)
---------------------------
 1.18| 1.30| 1.46| 1.00|
---------------------------
 1.31| 0.00| 1.62|-1.00|
---------------------------
 1.46| 1.62| 1.80| 0.00|
---------------------------
 1.31| 0.00| 2.00| 2.00|

Then let's check the learned policy:

>>> gridworld_utils.print_policy(mc_alg._policy, grid_env)
---------------------------
  D  |  R  |  D  |  ?  |
---------------------------
  D  |  x  |  D  |  ?  |
---------------------------
  R  |  R  |  D  |  x  |
---------------------------
  U  |  x  |  R  |  ?  |

Finally, we can use trained Monte Carlo method object to interact with the environment. Below is the sample code for reference:

# Play game util done
grid_env.reset()

print(f'Begin state={grid_env.current_state}')
step_count = 0
while not grid_env.is_done:
    result = mc_alg.play(grid_env)
    step_count += 1
    print(result)

print(f'Final reward={result.reward} with {step_count} step(s)')

The execution would look like:

Begin state=GridState(i=3, j=0)
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=1), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='D', state=GridState(i=3, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=3, j=3), reward=2, is_done=True, is_truncated=False, info=None)
Final reward=2 with 5 step(s)

Random Method

This method takes random action(s) in the given environment. It is often used as a baseline to evaluate other RL methods. The code below will instantiate a Random RL method:

from skyline.alg import random_rl

random_alg = random_rl.RandomRL()

Random RL method won't require training at all. So if you call method fit of random_alg, it will return immediately:

# Training
random_alg.fit(grid_env)

Since this is a random process, each time you play the game will have different result very likely:

# Play game util done
grid_env.reset()

print(f'Begin state={grid_env.current_state}')
step_count = 0
while not grid_env.is_done:
    result = random_alg.play(grid_env)
    step_count += 1
    print(result)
print(f'Final reward={result.reward} with {step_count} step(s)')

Below is one execution example:

Begin state=GridState(i=3, j=0)
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)
...
ActionResult(action='R', state=GridState(i=0, j=3), reward=1, is_done=True, is_truncated=False, info=None)
Final reward=1 with 16 step(s)

From the result above, the random RL method took more steps and not guarantee to obtain the best reward, Therefore, it is obvious that the Monte Carlo method performs much better than the Random RL method!

How to rank RL methods

Before we start introducing how score board work, we need to understand RLExaminer first. Basically, scoreboard is a design to help you rank the different RL methods.

RLExaminer

Every environment can have more than one examiner to calculate the score of RL method. Each examiner may have its own aspect to evaluate the RL method (time, reward etc.). Let's check one used to calculate the average reward of grid environment:

from skyline.lab import gridworld_env

# This examiner considers both reward and number of steps.
examiner = gridworld_env.GridWorldExaminer()

Then, what's score of Monte Carlo Method:

# Monte Carlo will get reward 2 by taking 5 steps.
# So the score will be reward / steps: 2 / 5 = 0.4
examiner.score(mc_alg, grid_env)

Monte Carlo method got score 0.4. Let's check another RL method Random Method:

# The number of steps required by random RL method is unknown.
# Also the best reward is not guaranteed. So the score here will be random.
examiner.score(random_alg, grid_env)

Random RL method often got scores to be less than Monte Carlo method.

Scoreboard

Scoreboard literally calculate the scores of given RL methods according to the specific examiner and the rank those RL methods accordingly:

from skyline import lab

score_board = lab.Scoreboard()
sorted_scores  = score_board.rank(
    examiner=examiner, env=grid_env, rl_methods=[random_alg, mc_alg])

Below output will be produced:

+-------+------------+---------------------+
| Rank. |  RL Name   |        Score        |
+-------+------------+---------------------+
|   1   | MonteCarlo |         0.4         |
|   2   |  RandomRL  | 0.13333333333333333 |
+-------+------------+---------------------+

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skyline_rl_lab-0.0.1.2.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

skyline_rl_lab-0.0.1.2-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file skyline_rl_lab-0.0.1.2.tar.gz.

File metadata

  • Download URL: skyline_rl_lab-0.0.1.2.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for skyline_rl_lab-0.0.1.2.tar.gz
Algorithm Hash digest
SHA256 6db996ca60315daa6394a07bdaa6b90f94eb96d2f866d29e795b15529997211b
MD5 2e484cc0cb93ad379d67911492dda430
BLAKE2b-256 8d46c79db32a890b830c8cf1a521d40147db47238a323db277450a00876dd54e

See more details on using hashes here.

File details

Details for the file skyline_rl_lab-0.0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for skyline_rl_lab-0.0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d0fb52bf91bbcfee05b450fb7115cde748edb69e213214239b60f85dcc6d8953
MD5 e8811aef9b4b84768c60d746117bfab7
BLAKE2b-256 03c5534a2563f521f7a0910ec10105f07b363f0318c28d5c2597cf14905c6904

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page