Skip to main content

Datasets for Offline Safe Reinforcement Learning

Project description


Python 3.8+ License


DSRL (Datasets for Safe Reinforcement Learning) provides a rich collection of datasets specifically designed for offline Safe Reinforcement Learning (RL). Created with the objective of fostering progress in offline safe RL research, DSRL bridges a crucial gap in the availability of safety-centric public benchmarks and datasets.

DSRL provides:

  1. Diverse datasets: 38 datasets across different safe RL environments and difficulty levels in SafetyGymnasium, BulletSafetyGym, and MetaDrive, all prepared with safety considerations.
  2. Consistent API with D4RL: For easy use and evaluation of offline learning methods.
  3. Data post-processing filters: Allowing alteration of data density, noise level, and reward distributions to simulate various data collection conditions.

This package is a part of a comprehensive benchmarking suite that includes FSRL and OSRL and aims to promote advancements in the development and evaluation of safe learning algorithms.

To learn more, please visit our project website.

Installation

Pull this repo and install:

git clone https://github.com/liuzuxin/DSRL.git
cd DSRL
# install bullet_safety_gym only (by default)
pip install -e .
# install mujoco-based safety_gymnasium
pip install -e .[mujoco]
# install metadrive
pip install -e .[metadrive]
# install all in once
pip install -e .[all]

How to use DSRL

DSRL uses the Gymnasium API. Tasks are created via the gymnasium.make function. Each task is associated with a fixed offline dataset, which can be obtained with the env.get_dataset() method. This method returns a dictionary with:

  • observations: An N × obs_dim array of observations.
  • next_observations: An N × obs_dim of next observations.
  • actions: An N × act_dim array of actions.
  • rewards: An N dimensional array of rewards.
  • costs: An N dimensional array of costs.
  • terminals: An N dimensional array of episode termination flags. This is true when episodes end due to termination conditions such as falling over.
  • timeouts: An N dimensional array of termination flags. This is true when episodes end due to reaching the maximum episode length.

The usage is similar to D4RL. Here is an example code:

import gymnasium as gym
import dsrl

# Create the environment
env = gym.make('OfflineCarCircle-v0')

# Each task is associated with a dataset
# dataset contains observations, next_observatiosn, actions, rewards, costs, terminals, timeouts
dataset = env.get_dataset()
print(dataset['observations']) # An N x obs_dim Numpy array of observations

# dsrl abides by the OpenAI gym interface
obs, info = env.reset()
obs, reward, terminal, timeout, info = env.step(env.action_space.sample())
cost = info["cost"]

# Apply dataset filters [optional]
# dataset = env.pre_process_data(dataset, filter_cfgs)

Datasets are automatically downloaded to the ~/.dsrl/datasets directory when get_dataset() is called. If you would like to change the location of this directory, you can set the $DSRL_DATASET_DIR environment variable to the directory of your choosing, or pass in the dataset filepath directly into the get_dataset method.

You can use run the following example scripts to play with the offline dataset of all the supported environments:

python examples/run_mujoco.py --agent [your_agent] --task [your_task]
python examples/run_bullet.py --agent [your_agent] --task [your_task]
python examples/run_metadrive.py --road [your_road] --traffic [your_traffic] 

Normalizing Scores

  • Set target cost by using env.set_target_cost(target_cost) function, where target_cost is the undiscounted sum of costs of an episode
  • You can use the env.get_normalized_score(return, cost_return) function to compute a normalized reward and cost for an episode, where returns and cost_returns are the undiscounted sum of rewards and costs respectively of an episode.
  • The individual min and max reference returns are stored in dsrl/infos.py for reference.

License

All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsrl-0.1.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

dsrl-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file dsrl-0.1.0.tar.gz.

File metadata

  • Download URL: dsrl-0.1.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for dsrl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 69bc8b3c6130285a0dc82a77521af7b9a47af80c62b2c47b51a0007f78e3f8c1
MD5 2d7f713bdb3f68d04d85e26db56c2f42
BLAKE2b-256 275d995bbda71be452e99f871c2ad37f3eb1cfac0098434a2c97859e2b8c19a8

See more details on using hashes here.

File details

Details for the file dsrl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dsrl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for dsrl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d2f4b019fb884b29a783e53eb03a765e4f080e809c9055a367beab9b875baea
MD5 03f201f30084cccb78ec3560694d9c8a
BLAKE2b-256 0b0778733a92f5181a50bc6cbfc0fc0ca112d3b0f57a3c1f54ec5ef5d8aac331

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page