Dynamic Programming using Neural Networks
Project description
Dynamic Programming with Neural Networks (nndp)
By: Marc de la Barrera i Bardalet, Tim de Silva
Overview
nndp provides a framework for solving finite horizon dynamic programming problems using neural networks that is implemented using the JAX functional programming paradigm and Haiku. This solution technique, introduced and described in detail by Duarte, Fonesca, Goodman, and Parker (2021), applies to problems of the following form:
$$V(s_0)=\max_{a_t\in\Gamma(s_t)} E_0\left[\sum_{t=0}^T u(s_t,a_t)\right],$$
$$s_{t+1}=m(s_{t},a_{t},\epsilon_t), $$
$$s_0 \sim F(\cdot).$$
The state vector is denoted by $s_t=(k_t, x_t)$, where $k_t$ are exogenous states and $x_t$ are endogenous states. We adopt the convention that the first exogenous state in $k_t$ is $t$. The goal is to find a policy function $\pi(s_t)$ that satisfies:
$$\hat V(s_0,\pi)=E_0\left[\sum_{t=0}^T u(s_t,\pi(s_t))\right],$$
$$s_{t+1}=m(s_{t},\pi(s_{t}),\epsilon_t),$$
$$V(s_0)=\hat V(s_0,\pi)\quad \forall s_0.$$
We parametrize $\pi(s_t)=\tilde\pi(s_t,\theta)$ as a fully connected feedforward neural network and update the networks’ parameters, $\theta$, using stochastic gradient descent. To use this framework, the user only needs to write the following functions that are defined by the dynamic programming problem of interest:
u(state, action): reward function for $s_t$ =stateand $a_t$ =actionm(key, state, action): state evolution equation for $s_{t+1}$ if $s_t$ =stateand $a_t$ =action.keyis a JAX RNG key used to simulate any shocks present in the model.Gamma(state): defines the set of possible actions, $a_t$, at $s_t$ =stateF(key, N): samplesNobservations from the distribution of $s_0$.keyis a JAX RNG key used to simulate any shocks present in the model.nn_to_action(state, params, nn): defines how the output of a Haiku Neural Network,nn, with parameters,params, is mapped into an action at $s_t$ =state
We provide an example application to the income fluctations problem in docs/source/notebooks/income_fluctuations/main.ipynb to illustrate how this framework can be used.
Installation
nndp requires JAX and Haiku to be installed. To install with pip, run pip install nndp.
References
Duarte, Victor, Julia Fonseca, Jonathan A. Parker, and Aaron Goodman (2021), Simple Allocation Rules and Optimal Portfolio Choice Over the Lifecycle, Working Paper.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nndp-0.1.0.tar.gz.
File metadata
- Download URL: nndp-0.1.0.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce6140ad892038f4a0c5224434d0789739800154247511dc74296c98be98c246
|
|
| MD5 |
630b5ff8632effd043b9365621f96a68
|
|
| BLAKE2b-256 |
413959302e675fd88794d646415a66240d5f22b7e3eeec63993cf482d85e2364
|
File details
Details for the file nndp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nndp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c70c238152b75ac3481eb690bb36a67b27c687daeefc60534a095ee6b4d09dc9
|
|
| MD5 |
a201ee62d4520c3d37d40aa1a1d2f4e4
|
|
| BLAKE2b-256 |
bd961c9497af81fa2f906896e843f854d78152a9b184801b0330843d1a013c67
|