A fast solver for Markov Decision Processes
Project description
MDPSolver
MDPSolver is a Python package for large Markov Decision Processes (MDPs) with infinite-horizons.
Features
- Fast solver: Our C++-based solver is substantially faster than other MDP packages available for Python. See details in the documentation.
- Two optimality criteria: Discounted and Average reward.
- Three optimization algorithms: Value iteration, Policy iteration, and Modified policy iteration.
- Three value-update methods: Standard, Gauss–Seidel, and Successive over-relaxation.
- Supports sparse matrices.
- Employs parallel computing.
Installation
Linux
Install directly from PyPI with:
pip install mdpsolver
MDPSolver works out of the box on Linux.
Windows
Requires Visual Studio 2022 (17.9) with MSVC C++ compiler and libraries installed.
After installing Visual Studio (incl. MSVC C++ compiler and libraries), install directly from PyPI with:
pip install mdpsolver
Quick start guide
The following shows how to get quickly started with mdpsolver
.
Usage
Start by specifying the reward function and transition probabilities as lists. The following is an example of a simple MDP containing three states and two actions in each state.
#Import packages
import mdpsolver
#Rewards (3 states x 2 actions)
#e.g. choosing second action in first state gives reward=-1
rewards = [[5,-1],
[1,-2],
[50,0]]
#Transition probabilities (3 from_states x 2 actions x 3 to_states)
#e.g. choosing first action in third state gives a probability of 0.6 of staying in third state
tranMatWithZeros = [[[0.9,0.1,0.0],[0.1,0.9,0.0]],
[[0.4,0.5,0.1],[0.3,0.5,0.2]],
[[0.2,0.2,0.6],[0.5,0.5,0.0]]]
Now, create the model object and insert the problem parameters.
#Create model object
mdl = mdpsolver.model()
#Insert the problem parameters
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatWithZeros=tranMatWithZeros)
We can now optimize the policy.
mdl.solve()
The optimized policy can be returned in a variety of ways. Here, we return the policy as a list and print directly in the terminal.
print(mdl.getPolicy())
#[1, 1, 0]
Sparse transition matrix?
mdpsolver
has three alternative formats for large and highly sparse transition probability matrices.
(1) Elementwise representation (excluding elements containing zeros):
#[from_state,action,to_state,probability]
tranMatElementwise = [[0,0,0,0.9],
[0,0,1,0.1],
[0,1,0,0.1],
[0,1,1,0.9],
[1,0,0,0.4],
[1,0,1,0.5],
[1,0,2,0.1],
[1,1,0,0.3],
[1,1,1,0.5],
[1,1,2,0.2],
[2,0,0,0.2],
[2,0,1,0.2],
[2,0,2,0.6],
[2,1,0,0.5],
[2,1,1,0.5]]
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatElementwise=tranMatElementwise)
(2) Probabilities and column (to_state) indices in separate lists:
tranMatProbs = [[[0.9,0.1],[0.1,0.9]],
[[0.4,0.5,0.1],[0.3,0.5,0.2]],
[[0.2,0.2,0.6],[0.5,0.5]]]
tranMatColumns = [[[0,1],[0,1]],
[[0,1,2],[0,1,2]],
[[0,1,2],[0,1]]]
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatProbs=tranMatProbs,
tranMatColumns=tranMatColumns)
(3) Load the elementwise representation from a file:
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatFromFile="transitions.csv")
Documentation
The documentation can be found in the wiki for MDPSolver (https://github.com/areenberg/MDPSolver/wiki).
How to cite
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mdpsolver-0.9.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80b718705cba07fbf3ecf0e350717989e9de21251603fc81c78ada738546cb6d |
|
MD5 | 0e3072cd107a7c1174a7cb1952826871 |
|
BLAKE2b-256 | ee95fd955b5210cea8a0fbe3ec93c2f73fc3523f66bf91599684dc665ff27bd4 |