pyqlearning is Python library to implement Reinforcement Learning, especially for Q-Learning.

## Project description

`pyqlearning` is Python library to implement Reinforcement Learning,
especially for Q-Learning.

## Description

Considering many variable parts and functional extensions in the
Q-learning paradigm, I implemented these Python Scripts for
demonstrations of *commonality/variability analysis* in order to design
the models.

## Documentation

Full documentation is available on https://code.accel-brain.com/Reinforcement-Learning/ . This document contains information on functionally reusability, functional scalability and functional extensibility.

## Installation

Install using pip:

pip install pyqlearning

### Python package index(PyPI)

Installers for the latest released version are available at the Python package index.

### Dependencies

- numpy: v1.13.3 or higher.
- pandas: v0.22.0 or higher.

## Demonstration: Simple Maze Solving by Q-Learning (Jupyter notebook)

I have details of this library on my Jupyter notebook: search_maze_by_q_learning.ipynb. This notebook demonstrates a simple maze solving algorithm based on Epsilon-Greedy Q-Learning or Q-Learning, loosely coupled with Deep Boltzmann Machine(DBM).

## Demonstration: Q-Learning

Q-Learning is a kind of
`Temporal Difference learning`(`TD Learning`) that can be
considered as hybrid of `Monte Carlo method` and
`Dynamic Programming Method`. As `Monte Carlo method`,
`TD Learning` algorithm can learn by experience without model of
environment. And this learning algorithm is *functionally equivalent* of
bootstrap method as `Dynamic Programming Method`.

`Epsilon Greedy Q-Leanring` algorithm is `off-policy`. In this
paradigm, *stochastic* searching and *deterministic* searching can
coexist by hyperparameter ε (0 < ε < 1) that is probability that agent
searches greedy. Greedy searching is *deterministic* in the sense that
policy of agent follows the selection that maximizes the Q-Value.

demo_maze_greedy_q_learning.py
is a simple maze solving algorithm. `MazeGreedyQLearning` in
devsample/maze_greedy_q_learning.py
is a `Concrete Class` in `Template Method Pattern` to run the
Q-Learning algorithm for this task. `GreedyQLearning` in
pyqlearning/qlearning/greedy_q_learning.py
is also `Concreat Class` for the epsilon-greedy-method. The
`Abstract Class` that defines the skeleton of Q-Learning algorithm in
the operation and declares algorithm placeholders is
pyqlearning/q_learning.py.
So
demo_maze_greedy_q_learning.py
is a kind of `Client` in `Template Method Pattern`.

This algorithm allow the *agent* to search the goal in maze by *reward
value* in each point in map.

The following is an example of map.

[['#' '#' '#' '#' '#' '#' '#' '#' '#' '#'] ['#' 'S' 4 8 8 4 9 6 0 '#'] ['#' 2 26 2 5 9 0 6 6 '#'] ['#' 2 '@' 38 5 8 8 1 2 '#'] ['#' 3 6 0 49 8 3 4 9 '#'] ['#' 9 7 4 6 55 7 0 3 '#'] ['#' 1 8 4 8 2 69 8 2 '#'] ['#' 1 0 2 1 7 0 76 2 '#'] ['#' 2 8 0 1 4 7 5 'G' '#'] ['#' '#' '#' '#' '#' '#' '#' '#' '#' '#']]

`#`is wall in maze.`S`is a start point.`G`is a goal.`@`is the agent.

In relation to reinforcement learning theory, the *state* of *agent* is
2D position coordinates and the *action* is to dicide the direction of
movement. Within the wall, the *agent* is movable in a cross direction
and can advance by one point at a time. After moving into a new
position, the *agent* can obtain a *reward*. On greedy searching, this
extrinsically motivated agent performs in order to obtain some *reward*
as high as possible. Each *reward value* is plot in map.

To see how *agent* can search and rearch the goal, run the batch
program:
demo_maze_greedy_q_learning.py

python demo_maze_greedy_q_learning.py

## Demonstration: Q-Learning, loosely coupled with Deep Boltzmann Machine.

demo_maze_deep_boltzmann_q_learning.py
is a demonstration of how the *Q-Learning* can be to *deepen*. A
so-called *Deep Q-Network* (DQN) is meant only as an example. In this
demonstration, let me cite the *Q-Learning* , loosely coupled with
**Deep Boltzmann Machine** (DBM). As API Documentation of
pydbm
library has pointed out, DBM is functionally equivalent to stacked
auto-encoder. The main function I observe is the same as dimensions
reduction(or pre-training). Then the function this DBM is dimensionality
reduction of *reward value* matrix.

Q-Learning, loosely coupled with Deep Boltzmann Machine (DBM), is a more
effective way to solve maze. The pre-training by DBM allow Q-Learning
*agent* to abstract feature of `reward value` matrix and to observe
the map in a bird’s-eye view. Then *agent* can reache the goal with a
smaller number of trials.

To realize the power of DBM, I performed a simple experiment.

### Feature engineering

For instance, a feature in each coordinate can be transformed and
extracted by reward value as so-called *observed data points* in its
adjoining points. More formally, see
search_maze_by_q_learning.ipynb.

Then the feature representation can be as calculated. After this
pre-training, the DBM has extracted *feature points* below.

[['#' '#' '#' '#' '#' '#' '#' '#' '#' '#'] ['#' 'S' 0.22186305563593528 0.22170599483791015 0.2216928599218454 0.22164807496640074 0.22170371283788584 0.22164021608623224 0.2218165339471332 '#'] ['#' 0.22174745260072407 0.221880094307873 0.22174244728061343 0.2214709292493749 0.22174626768015263 0.2216756589222596 0.22181057818975275 0.22174525714311788 '#'] ['#' 0.22177496678085065 0.2219122743656551 0.22187543599733664 0.22170745588799798 0.2215226084843615 0.22153827385193636 0.22168466277729898 0.22179391402965035 '#'] ['#' 0.2215341770250964 0.22174315536140118 0.22143149966676515 0.22181685688674144 0.22178215385805333 0.2212249704384472 0.22149210148879617 0.22185413678274837 '#'] ['#' 0.22162363223483128 0.22171313373253035 0.2217109987501002 0.22152432841656014 0.22175562457887335 0.22176040052504634 0.22137688854285298 0.22175365642579478 '#'] ['#' 0.22149515807715153 0.22169199881701832 0.22169558478042856 0.2216904005450013 0.22145368271014734 0.2217144069625017 0.2214896100292738 0.221398594191006 '#'] ['#' 0.22139837944992058 0.22130176116356184 0.2215414328019404 0.22146667964656613 0.22164354506366127 0.22148685616333666 0.22162822887193126 0.22140174437162474 '#'] ['#' 0.22140060918518528 0.22155145714201702 0.22162929776464463 0.22147466752374162 0.22150300682310872 0.22162775291471243 0.2214233075299188 'G' '#'] ['#' '#' '#' '#' '#' '#' '#' '#' '#' '#']]

To see how *agent* can search and rearch the goal, install
pydbm
library and run the batch program:
demo_maze_deep_boltzmann_q_learning.py

python demo_maze_deep_boltzmann_q_learning.py

### More detail demos

- Webクローラ型人工知能：キメラ・ネットワークの仕様
- 20001 bots are running as 20001 web-crawlers and 20001 web-scrapers.

## License

- GNU General Public License v2.0

## Project details

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help | File type | Python version | Upload date |
---|---|---|---|

pyqlearning-1.0.8-py3-none-any.whl (19.6 kB) Copy SHA256 hash SHA256 | Wheel | py3 | May 13, 2018 |

pyqlearning-1.0.8.tar.gz (15.2 kB) Copy SHA256 hash SHA256 | Source | None | May 13, 2018 |