pyqlearning is Python library to implement Reinforcement Learning, especially for Q-Learning.
Project description
pyqlearning is Python library to implement Reinforcement Learning, especially for Q-Learning.
Description
Considering many variable parts and functional extensions in the Q-learning paradigm, I implemented these Python Scripts for demonstrations of commonality/variability analysis in order to design the models.
Documentation
Full documentation is available on https://code.accel-brain.com/Reinforcement-Learning/ . This document contains information on functionally reusability, functional scalability and functional extensibility.
Installation
Install using pip:
pip install pyqlearning
Source code
The source code is currently hosted on GitHub.
Python package index(PyPI)
Installers for the latest released version are available at the Python package index.
Dependencies
numpy: v1.13.3 or higher.
pandas: v0.22.0 or higher.
Demonstration: Simple Maze Solving by Q-Learning (Jupyter notebook)
I have details of this library on my Jupyter notebook: search_maze_by_q_learning.ipynb. This notebook demonstrates a simple maze solving algorithm based on Epsilon-Greedy Q-Learning or Q-Learning, loosely coupled with Deep Boltzmann Machine(DBM).
Demonstration: Q-Learning
Q-Learning is a kind of Temporal Difference learning(TD Learning) that can be considered as hybrid of Monte Carlo method and Dynamic Programming Method. As Monte Carlo method, TD Learning algorithm can learn by experience without model of environment. And this learning algorithm is functionally equivalent of bootstrap method as Dynamic Programming Method.
Epsilon Greedy Q-Leanring algorithm is off-policy. In this paradigm, stochastic searching and deterministic searching can coexist by hyperparameter ε (0 < ε < 1) that is probability that agent searches greedy. Greedy searching is deterministic in the sense that policy of agent follows the selection that maximizes the Q-Value.
demo_maze_greedy_q_learning.py is a simple maze solving algorithm. MazeGreedyQLearning in devsample/maze_greedy_q_learning.py is a Concrete Class in Template Method Pattern to run the Q-Learning algorithm for this task. GreedyQLearning in pyqlearning/qlearning/greedy_q_learning.py is also Concreat Class for the epsilon-greedy-method. The Abstract Class that defines the skeleton of Q-Learning algorithm in the operation and declares algorithm placeholders is pyqlearning/q_learning.py. So demo_maze_greedy_q_learning.py is a kind of Client in Template Method Pattern.
This algorithm allow the agent to search the goal in maze by reward value in each point in map.
The following is an example of map.
[['#' '#' '#' '#' '#' '#' '#' '#' '#' '#'] ['#' 'S' 4 8 8 4 9 6 0 '#'] ['#' 2 26 2 5 9 0 6 6 '#'] ['#' 2 '@' 38 5 8 8 1 2 '#'] ['#' 3 6 0 49 8 3 4 9 '#'] ['#' 9 7 4 6 55 7 0 3 '#'] ['#' 1 8 4 8 2 69 8 2 '#'] ['#' 1 0 2 1 7 0 76 2 '#'] ['#' 2 8 0 1 4 7 5 'G' '#'] ['#' '#' '#' '#' '#' '#' '#' '#' '#' '#']]
# is wall in maze.
S is a start point.
G is a goal.
@ is the agent.
In relation to reinforcement learning theory, the state of agent is 2D position coordinates and the action is to dicide the direction of movement. Within the wall, the agent is movable in a cross direction and can advance by one point at a time. After moving into a new position, the agent can obtain a reward. On greedy searching, this extrinsically motivated agent performs in order to obtain some reward as high as possible. Each reward value is plot in map.
To see how agent can search and rearch the goal, run the batch program: demo_maze_greedy_q_learning.py
python demo_maze_greedy_q_learning.py
Demonstration: Q-Learning, loosely coupled with Deep Boltzmann Machine.
demo_maze_deep_boltzmann_q_learning.py is a demonstration of how the Q-Learning can be to deepen. A so-called Deep Q-Network (DQN) is meant only as an example. In this demonstration, let me cite the Q-Learning , loosely coupled with Deep Boltzmann Machine (DBM). As API Documentation of pydbm library has pointed out, DBM is functionally equivalent to stacked auto-encoder. The main function I observe is the same as dimensions reduction(or pre-training). Then the function this DBM is dimensionality reduction of reward value matrix.
Q-Learning, loosely coupled with Deep Boltzmann Machine (DBM), is a more effective way to solve maze. The pre-training by DBM allow Q-Learning agent to abstract feature of reward value matrix and to observe the map in a bird’s-eye view. Then agent can reache the goal with a smaller number of trials.
To realize the power of DBM, I performed a simple experiment.
Feature engineering
For instance, a feature in each coordinate can be transformed and extracted by reward value as so-called observed data points in its adjoining points. More formally, see search_maze_by_q_learning.ipynb.
Then the feature representation can be as calculated. After this pre-training, the DBM has extracted feature points below.
[['#' '#' '#' '#' '#' '#' '#' '#' '#' '#'] ['#' 'S' 0.22186305563593528 0.22170599483791015 0.2216928599218454 0.22164807496640074 0.22170371283788584 0.22164021608623224 0.2218165339471332 '#'] ['#' 0.22174745260072407 0.221880094307873 0.22174244728061343 0.2214709292493749 0.22174626768015263 0.2216756589222596 0.22181057818975275 0.22174525714311788 '#'] ['#' 0.22177496678085065 0.2219122743656551 0.22187543599733664 0.22170745588799798 0.2215226084843615 0.22153827385193636 0.22168466277729898 0.22179391402965035 '#'] ['#' 0.2215341770250964 0.22174315536140118 0.22143149966676515 0.22181685688674144 0.22178215385805333 0.2212249704384472 0.22149210148879617 0.22185413678274837 '#'] ['#' 0.22162363223483128 0.22171313373253035 0.2217109987501002 0.22152432841656014 0.22175562457887335 0.22176040052504634 0.22137688854285298 0.22175365642579478 '#'] ['#' 0.22149515807715153 0.22169199881701832 0.22169558478042856 0.2216904005450013 0.22145368271014734 0.2217144069625017 0.2214896100292738 0.221398594191006 '#'] ['#' 0.22139837944992058 0.22130176116356184 0.2215414328019404 0.22146667964656613 0.22164354506366127 0.22148685616333666 0.22162822887193126 0.22140174437162474 '#'] ['#' 0.22140060918518528 0.22155145714201702 0.22162929776464463 0.22147466752374162 0.22150300682310872 0.22162775291471243 0.2214233075299188 'G' '#'] ['#' '#' '#' '#' '#' '#' '#' '#' '#' '#']]
To see how agent can search and rearch the goal, install pydbm library and run the batch program: demo_maze_deep_boltzmann_q_learning.py
python demo_maze_deep_boltzmann_q_learning.py
More detail demos
-
20001 bots are running as 20001 web-crawlers and 20001 web-scrapers.
License
GNU General Public License v2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyqlearning-1.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b5b68baa574ccc3fb0534ea6facf41c9ad0c0c35644d671e2c0d50bc9c332f6 |
|
MD5 | 3f64a70fe033b2deb1c749f283984259 |
|
BLAKE2b-256 | 66c98b53586ada39ab1c266b77a5c4ab8c7120b614c566f59a73faa18fafc742 |