Models of sequential decision making
Project description
msdm
: Models of Sequential Decision-Making
Goals
msdm
aims to simplify the design and evaluation of
models of sequential decision-making. The library
can be used for cognitive science or computer
science research/teaching.
Approach
msdm
provides standardized interfaces and implementations
for common constructs in sequential
decision-making. This includes algorithms used in single-agent
reinforcement learning
as well as those used in
planning,
partially observable environments,
and multi-agent games.
The library is organized around different problem classes and algorithms that operate on problem instances. We take inspiration from existing libraries such as scikit-learn that enable users to transparently mix and match components. For instance, a standard way to define a problem, solve it, and examine the results would be:
# create a problem instance
mdp = make_russell_norvig_grid(
discount_rate=0.95,
slip_prob=0.8,
)
# solve the problem
vi = ValueIteration()
res = vi.plan_on(mdp)
# print the value function
print(res.V)
The library is under active development. Currently, we support the following problem classes:
- Markov Decision Processes (MDPs)
- Partially Observable Markov Decision Processes (POMDPs)
- Markov Games
- Partially Observable Stochastic Games (POSGs)
The following algorithms have been implemented and tested:
- Classical Planning
- Breadth-First Search (Zuse, 1945)
- A* (Hart, Nilsson & Raphael, 1968)
- Stochastic Planning
- Value Iteration (Bellman, 1957)
- Policy Iteration (Howard, 1960)
- Labeled Real-time Dynamic Programming (Bonet & Geffner, 2003)
- LAO* (Hansen & Zilberstein, 2003)
- Partially Observable Planning
- QMDP (Littman, Cassandra & Kaelbling, 1995)
- Point-based Value-Iteration (Pineau, Gordon & Thrun, 2003)
- Finite state controller gradient ascent (Meuleau, Kim, Kaelbling & Cassandra, 1999)
- Bounded finite state controller policy iteration (Poupart & Boutilier, 2003)
- Wrappers for POMDPs.jl solvers (requires Julia installation)
- Reinforcement Learning
- Q-Learning (Watkins, 1992)
- Double Q-Learning (van Hasselt, 2010)
- SARSA (Rummery & Niranjan, 1994)
- Expected SARSA (van Seijen, van Hasselt, Whiteson & Wiering, 2009)
- R-MAX (Brafman & Tennenholtz, 2002)
- Multi-agent Reinforcement Learning (in progress)
- Correlated Q Learning (Greenwald & Hall, 2002)
- Nash Q Learning (Hu & Wellman, 2003)
- Friend/Foe Q Learning (Littman, 2001)
We aim to add implementations for other algorithms in the near future (e.g., inverse RL, deep learning, multi-agent learning and planning).
Installation
It is recommended to use a virtual environment.
Installing from pip
$ pip install msdm
Installing from GitHub
$ pip install --upgrade git+https://github.com/markkho/msdm.git
Installing the package in edit mode
After downloading, go into the folder and install the package locally (with a symlink so its updated as source file changes are made):
$ pip install -e .
Contributing
We welcome contributions in the form of implementations of algorithms for common problem classes that are well-documented in the literature. Please first post an issue and/or reach out to mark.ho.cs@gmail.com to check if a proposed contribution is within the scope of the library.
Running tests, etc.
To run all tests: make test
To run tests for some file: python -m py.test msdm/tests/$TEST_FILE_NAME.py
To lint the code: make lint
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file msdm-0.11.tar.gz
.
File metadata
- Download URL: msdm-0.11.tar.gz
- Upload date:
- Size: 130.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17159a7a6d2fe503bee7e41bba86213b77925fad10dc7c3d5c4c090715b50bf6 |
|
MD5 | 04d2571300cb59922bd6eb626d013fad |
|
BLAKE2b-256 | f6cdab6b38a33e60f3b9c78d76880ff630734a962263e7e2295306e0795e3748 |