Elegant Implementations of Offline Safe Reinforcement Learning Algorithms
Project description
OSRL (Offline Safe Reinforcement Learning) offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.
The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes DSRL and FSRL, and is built to facilitate the development of robust and reliable offline safe RL solutions.
To learn more, please visit our project website.
Structure
The structure of this repo is as follows:
├── examples
│ ├── configs # the training configs of each algorithm
│ ├── eval # the evaluation escipts
│ ├── train # the training scipts
├── osrl
│ ├── algorithms # offline safe RL algorithms
│ ├── common # base networks and utils
The implemented offline safe RL and imitation learning algorithms include:
Algorithm | Type | Description |
---|---|---|
BCQ-Lag | Q-learning | BCQ with PID Lagrangian |
BEAR-Lag | Q-learning | BEARL with PID Lagrangian |
CPQ | Q-learning | Constraints Penalized Q-learning (CPQ)) |
COptiDICE | Distribution Correction Estimation | Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation |
CDT | Sequential Modeling | Constrained Decision Transformer |
BC-All | Imitation Learning | Behavior Cloning with all datasets |
BC-Safe | Imitation Learning | Behavior Cloning with safe trajectories |
BC-Frontier | Imitation Learning | Behavior Cloning with high-reward trajectories |
Installation
Pull the repo and install:
git clone https://github.com/liuzuxin/OSRL.git
cd osrl
pip install -e .
pip install OApackage==2.7.6
How to use OSRL
The example usage are in the examples
folder, where you can find the training and evaluation scripts for all the algorithms.
All the parameters and their default configs for each algorithm are available in the examples/configs
folder.
OSRL uses the WandbLogger
in FSRL. The offline dataset and offline environments are provided in DSRL, so make sure you install both of them first.
Training
For example, to train the bcql
method, simply run by overriding the default parameters:
python examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ...
By default, the config file and the logs during training will be written to logs\
folder and the training plots can be viewed online using Wandb.
You can also launch a sequence of experiments or in parallel via the EasyRunner package, see examples/train_all_tasks.py
for details.
Evaluation
To evaluate a trained agent, for example, a BCQ agent, simply run
python example/eval/eval_bcql.py --path path_to_model --eval_episodes 20
It will load config file from path_to_model/config.yaml
and model file from path_to_model/checkpoints/model.pt
, run 20 episodes, and print the average normalized reward and cost.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file osrl-lib-0.1.0.tar.gz
.
File metadata
- Download URL: osrl-lib-0.1.0.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 238e248763f7fb9176c8a35cf3bd4774f3bc2eb1752867630a0008a0780784c1 |
|
MD5 | a652bdc9efde487a667682ef8b0a5b57 |
|
BLAKE2b-256 | 62e18943331e24b5f5060e47f404d431268c665a68b9005400fb98b40ab2605b |
File details
Details for the file osrl_lib-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: osrl_lib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 724925792490fff923c23e52061b52d82d590ba06225c8c9a323a813b0e40cfc |
|
MD5 | ea30b18e4f87002c95c6163395183f43 |
|
BLAKE2b-256 | 54b70f3b240e6a5805ff0d5730d8c1b3d27944dba4e55ed08f378db630fbcb95 |