Skip to main content

Probabilistic planning in continuous state-action MDPs using TensorFlow.

Project description

tf-mdpPy Versions PyPI version Build Status Documentation Status License: GPL v3

Probabilistic planning in continuous state-action MDPs using TensorFlow.

tf-mdp is an implementation based on the paper:

Thiago P. Bueno; Leliane N. de Barros; Denis D. Mauá; Scott Sanner
Deep Reactive Policies for Planning in Stochastic Nonlinear Domains
In AAAI, 2019.

Quickstart

tf-mdp is a Python3.6+ package available in PyPI.

$ pip3 install tf-mdp

Please make sure you have a running TensorFlow version on your system before pip-installing this package.

Features

tf-mdp solves discrete-time continuous state-action MDPs.

The domains/instances are specified using the RDDL language.

It is built on the following packages available on the Python3 RDDL toolkit:

  • pyrddl: RDDL lexer/parser.
  • rddlgym: A toolkit for working with RDDL domains.
  • rddl2tf: RDDL2TensorFlow compiler.
  • tf-rddlsim: A RDDL simulator running in TensorFlow.

Please refer to each project documentation for further details.

Usage

$ tfmdp --help

usage: tfmdp [-h] [-l LAYERS [LAYERS ...]]
             [-a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}]
             [-iln] [-b BATCH_SIZE] [-hr HORIZON] [-e EPOCHS]
             [-lr LEARNING_RATE]
             [-opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}]
             [-lfn {linear,mse}] [-ld LOGDIR] [-v]
             rddl

Probabilistic planning in continuous state-action MDPs using TensorFlow.

positional arguments:
  rddl                  RDDL file or rddlgym domain id

optional arguments:
  -h, --help            show this help message and exit
  -l LAYERS [LAYERS ...], --layers LAYERS [LAYERS ...]
                        number of units in each hidden layer in policy network
  -a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}, --activation {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}
                        activation function for hidden layers in policy
                        network
  -iln, --input-layer-norm
                        input layer normalization flag
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        number of trajectories in a batch (default=256)
  -hr HORIZON, --horizon HORIZON
                        number of timesteps (default=40)
  -e EPOCHS, --epochs EPOCHS
                        number of timesteps (default=200)
  -lr LEARNING_RATE, --learning-rate LEARNING_RATE
                        optimizer learning rate (default=0.001)
  -opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}, --optimizer {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}
                        loss optimizer (default=RMSProp)
  -lfn {linear,mse}, --loss-fn {linear,mse}
                        loss function (default=linear)
  -ld LOGDIR, --logdir LOGDIR
                        log directory for data summaries (default=/tmp/tfmdp)
  -v, --verbose         verbosity mode

Examples

$ tfmdp Reservoir-20 -l 2048 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   Reservoir-20
>> logdir: /tmp/tfmdp

>> Policy Net:
layers = [2048]
activation = elu
input  layer norm = True

>> Hyperparameters:
epochs        = 200
learning rate = 0.001
batch size    = 256
horizon       = 40

>> Optimization:
optimizer     = RMSProp
loss function = mse

>> Loading model ...
Done in 0.018952 sec.

>> Optimizing...
2021-06-23 22:56:18.873731: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:56:18.895765: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:56:18.896462: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x46628b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:56:18.896514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch   199: loss = 1201677952.000000
Done in 28.525183 sec.

>> Performance:
total reward = -3653.9695, reward per timestep = -91.3492
$ tfmdp HVAC-3 -l 256 128 64 32 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.0001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   HVAC-3
>> logdir: /tmp/tfmdp

>> Policy Net:
layers = [256,128,64,32]
activation = elu
input  layer norm = True

>> Hyperparameters:
epochs        = 200
learning rate = 0.0001
batch size    = 256
horizon       = 40

>> Optimization:
optimizer     = RMSProp
loss function = mse

>> Loading model ...
Done in 0.017646 sec.

>> Optimizing...
2021-06-23 22:54:05.766434: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:54:05.787832: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:54:05.788607: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49a4d00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:54:05.788690: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch   199: loss = 103798661120.0000000
Done in 15.748765 sec.

>> Performance:
total reward = -315724.4688, reward per timestep = -7893.1117
$ tfmdp Navigation-v2 -l 256 128 64 32 -a elu -b 128 -hr 20 -e 200 -lr 0.001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   Navigation-v2
>> logdir: /tmp/tfmdp

>> Policy Net:
layers = [256,128,64,32]
activation = elu
input  layer norm = False

>> Hyperparameters:
epochs        = 200
learning rate = 0.001
batch size    = 128
horizon       = 20

>> Optimization:
optimizer     = RMSProp
loss function = mse

>> Loading model ...
Done in 0.012209 sec.

>> Optimizing...
2021-06-23 22:50:59.732002: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-06-23 22:50:59.751959: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-06-23 22:50:59.752494: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5bc6a20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-06-23 22:50:59.752514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Epoch   199: loss = 6452.3613285
Done in 6.466699 sec.

>> Performance:
total reward = -78.3427, reward per timestep = -3.9171

Documentation

Please refer to [https://tf-mdp.readthedocs.io/][readthedocs] for the code documentation.

Support

If you are having issues with tf-mdp, please let me know at: thiago.pbueno@gmail.com.

License

Copyright (c) 2018-2021 Thiago Pereira Bueno All Rights Reserved.

tf-mdp is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

tf-mdp is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with tf-mdp. If not, see http://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-mdp-0.5.5.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tf_mdp-0.5.5-py3-none-any.whl (61.2 kB view details)

Uploaded Python 3

File details

Details for the file tf-mdp-0.5.5.tar.gz.

File metadata

  • Download URL: tf-mdp-0.5.5.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for tf-mdp-0.5.5.tar.gz
Algorithm Hash digest
SHA256 cfac5638f0c12c4fb185cb1c85608a42281c3141c34a6ccd497224ec4052712e
MD5 83a8af1447bcd1f948b42bfa0b1d1272
BLAKE2b-256 04115c05bb866e8a3c8ef764ba5153c329ec32d6e410a0948e3f12867307be0b

See more details on using hashes here.

File details

Details for the file tf_mdp-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: tf_mdp-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 61.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for tf_mdp-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8742070e6242a2f0816cb070527073cac5b16c7b1925afef6bcf434ad895faf0
MD5 935c3d99f19ac77c1ce0c8ec2e4ad469
BLAKE2b-256 189f84b500d304e7f5c51a4da9fb87bae95ade8347f3da447c49511f1ef88017

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page