Reinforcement Learning with Model Predictive Control

These details have not been verified by PyPI

Project links

Project description

Reinforcement Learning with Model Predictive Control

Model Predictive Control-based Reinforcement Learning (mpcrl, for short) is a library for training model-based Reinforcement Learning (RL) [1] agents with Model Predictive Control (MPC) [2] as function approximation.

Documentation https://mpc-reinforcement-learning.readthedocs.io/en/stable/

Download https://pypi.python.org/pypi/mpcrl/

Source code https://github.com/FilippoAiraldi/mpc-reinforcement-learning/

Report issues https://github.com/FilippoAiraldi/mpc-reinforcement-learning/issues/


Documentation	https://mpc-reinforcement-learning.readthedocs.io/en/stable/
Download	https://pypi.python.org/pypi/mpcrl/
Source code	https://github.com/FilippoAiraldi/mpc-reinforcement-learning/
Report issues	https://github.com/FilippoAiraldi/mpc-reinforcement-learning/issues/

Python 3.9

Introduction

This framework, also referred to as RL with/using MPC, was first proposed in [3] and has so far been shown effective in various applications, with different learning algorithms and more sound theory, e.g., [4, 5, 7, 8]. It merges two powerful control techinques into a single data-driven one

MPC, a well-known control methodology that exploits a prediction model to predict the future behaviour of the environment and compute the optimal action
and RL, a Machine Learning paradigm that showed many successes in recent years (with games such as chess, Go, etc.) and is highly adaptable to unknown and complex-to-model environments.

The figure below shows the main idea behind this learning-based control approach. The MPC controller, parametrized in its objective, predictive model and constraints (or a subset of these), acts both as policy provider (i.e., providing an action to the environment, given the current state) and as function approximation for the state and action value functions (i.e., predicting the expected return following the current control policy from the given state and state-action pair). Concurrently, an RL algorithm is employed to tune this parametrization of the MPC in such a way to increase the controller's performance and achieve an (sub)optimal policy. For this purpose, different algorithms can be employed, two of the most successful being Q-learning [4] and Deterministic Policy Gradient (DPG) [5].

Installation

Using `pip`

You can use pip to install mpcrl with the command

pip install mpcrl

mpcrl has the following dependencies

Python 3.9 or higher (though support and testing for 3.9 are deprecated)
csnlp
SciPy
Gymnasium
Numba
typing_extensions (only for Python 3.9)

If you'd like to play around with the source code instead, run

git clone https://github.com/FilippoAiraldi/mpc-reinforcement-learning.git

The main branch contains the main releases of the packages (and the occasional post release). The experimental branch is reserved for the implementation and test of new features and hosts the release candidates. You can then install the package to edit it as you wish as

pip install -e /path/to/mpc-reinforcement-learning

Getting started

Here we provide the skeleton of a simple application of the library. The aim of the code below is to let an MPC control strategy learn how to optimally control a simple Linear Time Invariant (LTI) system. The cost (i.e., the opposite of the reward) of controlling this system in state $s \in \mathbb{R}^{n_s}$ with action $a \in \mathbb{R}^{n_a}$ is given by

$$ L(s,a) = s^\top Q s + a^\top R a, $$

where $Q \in \mathbb{R}^{n_s \times n_s}$ and $R \in \mathbb{R}^{n_a \times n_a}$ are suitable positive definite matrices. This is a very well-known problem in optimal control theory. However, here, in the context of RL, these matrices are not known, and we can only observe realizations of the cost for each state-action pair our controller visits. The underlying system dynamics are described by the usual state-space model

$$ s_{k+1} = A s_k + B a_k, $$

whose matrices $A \in \mathbb{R}^{n_s \times n_s}$ and $B \in \mathbb{R}^{n_s \times n_a}$ could again in general be unknown. The control action $a_k$ is assumed bounded in the interval $[-1,1]$. In what follows we will go through the usual steps in setting up and solving such a task.

Environment

The first ingredient to implement is the LTI system in the form of a gymnasium.Env class. Fill free to fill in the missing parts based on your needs. The gymnasium.Env.reset method should initialize the state of the system, while the gymnasium.Env.step method should update the state of the system based on the action provided and mainly return the new state and the cost.

from gymnasium import Env
from gymnasium.wrappers import TimeLimit
import numpy as np


class LtiSystem(Env):
    ns = ...  # number of states (must be continuous)
    na = ...  # number of actions (must be continuous)
    A = ...  # state-space matrix A
    B = ...  # state-space matrix B
    Q = ...  # state-cost matrix Q
    R = ...  # action-cost matrix R
    action_space = Box(-1.0, 1.0, (na,), np.float64)  # action space

    def reset(self, *, seed=None, options=None):
        super().reset(seed=seed, options=options)
        self.s = ...  # set initial state
        return self.s, {}

    def step(self, action):
        a = np.reshape(action, self.action_space.shape)
        assert self.action_space.contains(a)
        c = self.s.T @ self.Q @ self.s + a.T @ self.R @ a
        self.s = self.A @ self.s + self.B @ a
        return self.s, c, False, False, {}


# lastly, instantiate the environment with a wrapper to ensure the simulation finishes
env = TimeLimit(LtiSystem(), max_steps=5000)

Controller

As aforementioned, we'd like to control this system via an MPC controller. Therefore, the next step is to craft one. To do so, we leverage the csnlp package, in particular its csnlp.wrappers.Mpc class (on top of that, under the hood, we exploit this package also to compute the sensitivities of the MPC controller w.r.t. its parametrization, which are crucial in calculating the RL updates). In mathematical terms, the MPC looks like this:

$$ \begin{aligned} \min_{x_{0:N}, u_{0:N-1}} \quad & \sum_{i=0}^{N-1}{ x_i^\top \tilde{Q} x_i + u_i^\top \tilde{R} u_i } & \ \textrm{s.t.} \quad & x_0 = s_k \ & x_{i+1} = \tilde{A} x_i + \tilde{B} u_i, \quad & i=0,\dots,N-1 \ & -1 \le u_k \le 1, \quad & i=0,\dots,N-1 \end{aligned} $$

where $\tilde{Q}, \tilde{R}, \tilde{A}, \tilde{B}$ do not necessarily have to match the environment's $Q, R, A, B$ as they represent a possibly approximated a priori knowledge on the sytem. In code, we can implement this as follows.

import casadi as cs
from csnlp import Nlp
from csnlp.wrappers import Mpc

N = ...  # prediction horizon
mpc = Mpc[cs.SX](Nlp(), N)

# create the parametrization of the controller
nx, nu = LtiSystem.ns, LtiSystem.na
Atilde = mpc.parameter("Atilde", (nx, nx))
Btilde = mpc.parameter("Btilde", (nx, nu))
Qtilde = mpc.parameter("Qtilde", (nx, nx))
Rtilde = mpc.parameter("Rtilde", (nu, nu))

# create the variables of the controller
x, _ = mpc.state("x", nx)
u, _ = mpc.action("u", nu, lb=-1.0, ub=1.0)

# set the dynamics
mpc.set_linear_dynamics(Atilde, Btilde)

# set the objective
mpc.minimize(
    sum(cs.bilin(Qtilde, x[:, i]) + cs.bilin(Rtilde, u[:, i]) for i in range(N))
)

# initiliaze the solver with some options
opts = {
    "print_time": False,
    "bound_consistency": True,
    "calc_lam_x": True,
    "calc_lam_p": False,
    "ipopt": {"max_iter": 500, "sb": "yes", "print_level": 0},
}
mpc.init_solver(opts, solver="ipopt")

Learning

The last step is to train the controller using an RL algorithm. For instance, here we use Q-Learning. The idea is to let the controller interact with the environment, observe the cost, and update the MPC parameters accordingly. This can be achieved by computing the temporal difference error

$$ \delta_k = L(s_k, a_k) + \gamma V_\theta(s_{k+1}) - Q_\theta(s_k, a_k), $$

where $\gamma$ is the discount factor, and $V_\theta$ and $Q_\theta$ are the state and state-action value functions, both provided by the parametrized MPC controller with $\theta = {\tilde{A}, \tilde{B}, \tilde{Q}, \tilde{R}}$. The update rule for the parameters is then given by

$$ \theta \gets \theta + \alpha \delta_k \nabla_\theta Q_\theta(s_k, a_k), $$

where $\alpha$ is the learning rate, and $\nabla_\theta Q_\theta(s_k, a_k)$ is the sensitivity of the state-action value function w.r.t. the parameters. All of this can be implemented as follows.

from mpcrl import LearnableParameter, LearnableParametersDict, LstdQLearningAgent
from mpcrl.optim import GradientDescent

# give some initial values to the learnable parameters (shapes must match!)
learnable_pars_init = {"Atilde": ..., "Btilde": ..., "Qtilde": ..., "Rtilde": ...}

# create the set of parameters that should be learnt
learnable_pars = LearnableParametersDict(
    (
        LearnableParameter(name, val.shape, val)
        for name, val in learnable_pars_init.items()
    )
)

# instantiate the learning agent
agent = LstdQLearningAgent(
    mpc=mpc,
    learnable_parameters=learnable_pars,
    discount_factor=...,  # a number in (0,1], e.g.,  1.0
    update_strategy=...,  # an integer, e.g., 1
    optimizer=GradientDescent(learning_rate=...),
    record_td_errors=True,
)

# finally, launch the training for 5000 timesteps. The method will return an array of
# (hopefully) decreasing costs
costs = agent.train(env=env, episodes=1, seed=69)

Examples

Our examples subdirectory contains examples on how to use the library on some academic, small-scale application (a small linear time-invariant (LTI) system), tackled both with on-policy Q-learning, off-policy Q-learning and DPG. While the aforementioned algorithms are all gradient-based, we also provide an example on how to use Bayesian Optimization (BO) [6] to tune the MPC parameters in a gradient-free way.

License

The repository is provided under the MIT License. See the LICENSE file included with this repository.

Author

Filippo Airaldi, PhD Candidate [f.airaldi@tudelft.nl | filippoairaldi@gmail.com]

Delft Center for Systems and Control in Delft University of Technology

Copyright notice: Technische Universiteit Delft hereby disclaims all copyright interest in the program “mpcrl” (Reinforcement Learning with Model Predictive Control) written by the Author(s). Prof. Dr. Ir. Fred van Keulen, Dean of ME.

References

[1] Sutton, R.S. and Barto, A.G. (2018). Reinforcement learning: An introduction. Cambridge, MIT press.

[2] Rawlings, J.B., Mayne, D.Q. and Diehl, M. (2017). Model Predictive Control: theory, computation, and design (Vol. 2). Madison, WI: Nob Hill Publishing.

[3] Gros, S. and Zanon, M. (2020). Data-Driven Economic NMPC Using Reinforcement Learning. IEEE Transactions on Automatic Control, 65(2), 636-648.

[4] Esfahani, H. N. and Kordabad, A. B. and Gros, S. (2021). Approximate Robust NMPC using Reinforcement Learning. European Control Conference (ECC), 132-137.

[5] Cai, W. and Kordabad, A. B. and Esfahani, H. N. and Lekkas, A. M. and Gros, S. (2021). MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles. 60th IEEE Conference on Decision and Control (CDC), 2990-2995.

[6] Garnett, R., 2023. Bayesian Optimization. Cambridge University Press.

[7] Gros, S. and Zanon, M. (2022). Learning for MPC with stability & safety guarantees. Automatica, 164, 110598.

[8] Zanon, M. and Gros, S. (2021). Safe Reinforcement Learning Using Robust MPC. IEEE Transactions on Automatic Control, 66(8), 3638-3652.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.4.1

Oct 17, 2025

1.4.1rc1 pre-release

Oct 8, 2025

1.4.0.post1

Oct 9, 2025

1.4.0

Oct 8, 2025

1.3.4rc2 pre-release

Jul 23, 2025

1.3.4rc1 pre-release

Jul 23, 2025

1.3.3

Jul 16, 2025

1.3.2.post2

Apr 3, 2025

1.3.2.post1

Apr 3, 2025

1.3.2

Apr 3, 2025

1.3.2rc5 pre-release

Jan 20, 2025

1.3.2rc4 pre-release

Jan 17, 2025

1.3.2rc3 pre-release

Jan 15, 2025

1.3.2rc2 pre-release

Jan 10, 2025

1.3.2rc1 pre-release

Jan 5, 2025

1.3.1.post2

Nov 25, 2024

1.3.1.post1

Nov 15, 2024

1.3.1

Nov 15, 2024

1.3.1rc3 pre-release

Nov 11, 2024

1.3.1rc2 pre-release

Nov 7, 2024

1.3.1rc1 pre-release

Oct 23, 2024

1.3.0

Oct 18, 2024

1.3.0rc3 pre-release

Oct 17, 2024

1.3.0rc2 pre-release

Oct 9, 2024

1.3.0rc1 pre-release

Sep 26, 2024

1.2.1

Jul 17, 2024

1.2.1rc4 pre-release

Jun 19, 2024

1.2.1rc3 pre-release

Jun 10, 2024

1.2.1rc2 pre-release yanked

May 27, 2024

Reason this release was yanked:

built with uncommited changes

1.2.1rc1 pre-release

Apr 15, 2024

1.2.0.post2

Jun 19, 2024

1.2.0.post1

Apr 11, 2024

1.2.0 yanked

Aug 5, 2023

Reason this release was yanked:

Premature release with bugs

1.2.0rc4 pre-release

Apr 5, 2024

1.2.0rc3 pre-release

Feb 14, 2024

1.2.0rc2 pre-release

Feb 9, 2024

1.2.0rc1 pre-release

Feb 5, 2024

1.1.9

Dec 29, 2023

1.1.9rc4 pre-release

Nov 30, 2023

1.1.9rc3 pre-release

Nov 21, 2023

1.1.9rc2 pre-release

Nov 15, 2023

1.1.9rc1 pre-release yanked

Nov 15, 2023

1.1.8

Oct 26, 2023

1.1.8rc5 pre-release

Oct 21, 2023

1.1.8rc4 pre-release

Oct 10, 2023

1.1.8rc3 pre-release

Sep 27, 2023

1.1.8rc2 pre-release

Sep 23, 2023

1.1.8rc1 pre-release

Sep 20, 2023

1.1.7

Sep 13, 2023

1.1.7rc4 pre-release

Sep 6, 2023

1.1.7rc3 pre-release

Aug 31, 2023

1.1.7rc2 pre-release

Aug 30, 2023

1.1.7rc1 pre-release

Aug 29, 2023

1.1.6

Aug 29, 2023

1.1.5.dev1 pre-release

Aug 5, 2023

1.1.4.post2

Sep 21, 2023

1.1.4.post1 yanked

Sep 21, 2023

Reason this release was yanked:

Dependency conflict

1.1.4

Jun 20, 2023

1.1.3

Apr 5, 2023

1.1.2

Mar 30, 2023

1.1.1

Feb 22, 2023

1.1.0

Jan 31, 2023

1.0.2

Dec 25, 2022

1.0.1

Dec 25, 2022

1.0.0

Dec 25, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mpcrl-1.4.1.tar.gz (98.7 kB view details)

Uploaded Oct 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mpcrl-1.4.1-py3-none-any.whl (97.5 kB view details)

Uploaded Oct 17, 2025 Python 3

File details

Details for the file mpcrl-1.4.1.tar.gz.

File metadata

Download URL: mpcrl-1.4.1.tar.gz
Upload date: Oct 17, 2025
Size: 98.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for mpcrl-1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`432ca049e7b63e9c5f16cd486744bb3bdf570387151c2d90529d13e70eda92c0`
MD5	`8803bc8d3de0440272449fd76f55b59b`
BLAKE2b-256	`50aa2a918b97b51461d0aac12d94b94979e1b976515adbf4c81190e99113fd6c`

See more details on using hashes here.

File details

Details for the file mpcrl-1.4.1-py3-none-any.whl.

File metadata

Download URL: mpcrl-1.4.1-py3-none-any.whl
Upload date: Oct 17, 2025
Size: 97.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for mpcrl-1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2925e218a31fae4f664345f4a78e3f8fe5df8c5ad1d244945ea2a2fe58cd9bd6`
MD5	`e3d228bd86091f711af04f9e7a1dd431`
BLAKE2b-256	`6260a263041e3a5c58d33e2c65ce3cf7565ab6540ced76df274c72fd3528443b`

See more details on using hashes here.

mpcrl 1.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Reinforcement Learning with Model Predictive Control

Introduction

Installation

Using `pip`

Getting started

Environment

Controller

Learning

Examples

License

Author

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

mpcrl 1.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Reinforcement Learning with Model Predictive Control

Introduction

Installation

Using pip

Getting started

Environment

Controller

Learning

Examples

License

Author

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using `pip`