Skip to main content

ReplayBuffer for Reinforcement Learning written by C++ and Cython

Project description

img img img img img

img

Overview

cpprb is a python (CPython) module providing replay buffer classes for reinforcement learning.

Major target users are researchers and library developers.

You can build your own reinforcement learning algorithms together with your favorite deep learning library (e.g. TensorFlow, PyTorch).

cpprb forcuses speed, flexibility, and memory efficiency.

By utilizing Cython, complicated calculations (e.g. segment tree for prioritized experience replay) are offloaded onto C++. (The name cpprb comes from "C++ Replay Buffer".)

In terms of API, initially cpprb referred to OpenAI Baselines' implementation. The current version of cpprb has much more flexibility. Any NumPy compatible types of any numbers of values can be stored (as long as memory capacity is sufficient). For example, you can store the next action and the next next observation, too.

Installation

cpprb requires following softwares before installation.

  • C++17 compiler (for installation from source)
  • Python 3
  • pip

Cuurently, clang, which is a default Xcode C/C++ compiler at Apple macOS, cannot compile cpprb.

If you are macOS user, you need to install GCC and set environment values of CC and CXX to g++, or just use virtual environment (e.g. Docker). Step by step installation is described here.

Additionally, here are user's good feedbacks for installation at macOS and Ubuntu. (Thanks!)

Install from PyPI (Recommended)

The following command installs cpprb together with other dependancies.

pip install cpprb

Depending on your environment, you might need sudo or --user flag for installation.

On supported platflorms (Linux x86-64 and Windows amd64), binary packages hosted on PyPI can be used, so that you don't need C++ compiler. On the other platforms, such as macOS, and 32bit or arm-architectured Linux and Windows, you cannot install from binary, and you need to compile by yourself. Please be patient, we plan to support wider platforms in future.

If you have any troubles to install from binary, you can fall back to source installation by passing --no-binary option to the above pip command. (In order to avoid NumPy source installation, it is better to install NumPy beforehand.)

pip install numpy
pip install --no-binary cpprb

Install from source code

First, download source code manually or clone the repository;

git clone https://gitlab.com/ymd_h/cpprb.git

Then you can install in the same way;

cd cpprb
pip install .

For this installation, you need to convert extended Python (.pyx) to C++ (.cpp) during installation, it takes longer time than installation from PyPI.

Usage

Basic Usage

Basic usage is following step;

  1. Create replay buffer (ReplayBuffer.__init__)
  2. Add transitions (ReplayBuffer.add)
    1. Reset at episode end (ReplayBuffer.on_episode_end)
  3. Sample transitions (ReplayBuffer.sample)

Example Code

Here is a simple example for storing standard environment (aka. obs, act, rew, next_obs, and done).

from cpprb import ReplayBuffer

buffer_size = 256
obs_shape = 3
act_dim = 1
rb = ReplayBuffer(buffer_size,
		  env_dict ={"obs": {"shape": obs_shape},
			     "act": {"shape": act_dim},
			     "rew": {},
			     "next_obs": {"shape": obs_shape},
			     "done": {}})

obs = np.ones(shape=(obs_shape))
act = np.ones(shape=(act_dim))
rew = 0
next_obs = np.ones(shape=(obs_shape))
done = 0

for i in range(500):
    rb.add(obs=obs,act=act,rew=rew,next_obs=next_obs,done=done)

    if done:
	# Together with resetting environment, call ReplayBuffer.on_episode_end()
	rb.on_episode_end()

batch_size = 32
sample = rb.sample(batch_size)
# sample is a dictionary whose keys are 'obs', 'act', 'rew', 'next_obs', and 'done'

Construction Parameters

(See also API reference)

Name Type Optional Discription
size int No Buffer size
env_dict dict Yes (but unusable) Environment definition (See here)
next_of str or array-like of str Yes Memory compression (See here)
stack_compress str or array-like of str Yes Memory compression (See here)
default_dtype numpy.dtype Yes Fall back data type
Nstep dict Yes Nstep configuration (See here)
mmap_prefix str Yes mmap file prefix (See here)

Notes

Flexible environment values are defined by env_dict when buffer creation. The detail is described at document.

Since stored values have flexible name, you have to pass to ReplayBuffer.add member by keyword.

Features

cpprb provides buffer classes for building following algorithms.

Algorithms cpprb class Paper
Experience Replay ReplayBuffer L. J. Lin
Prioritized Experience Replay PrioritizedReplayBuffer T. Schaul et. al.
Multi-step (Nstep) Learning ReplayBuffer, PrioritizedReplayBuffer  
Multiprocess Learning (Ape-X) MPReplayBuffer MPPrioritizedReplayBuffer D. Horgan et. al.
Large Batch Experience Replay (LaBER) LaBERmean, LaBERlazy, LaBERmax T. Lahire et al.

cpprb features and its usage are described at following pages:

Design

Column-oriented and Flexible

One of the most distinctive design of cpprb is column-oriented flexibly defined transitions. As far as we know, other replay buffer implementations adopt row-oriented flexible transitions (aka. array of transition class) or column-oriented non-flexible transitions.

In deep reinforcement learning, sampled batch is divided into variables (i.e. obs, act, etc.). If the sampled batch is row-oriented, users (or library) need to convert it into column-oriented one. (See doc, too)

Batch Insertion

cpprb can accept addition of multiple transitions simultaneously. This design is convenient when batch transitions are moved from local buffers to a global buffer. Moreover it is more efficient because of not only removing pure-Python for loop but also suppressing unnecessary priority updates for PER. (See doc, too)

Minimum Dependancy

We try to minimize dependancy. Only NumPy is required during its execution. Small dependancy is always prefarable to avoid dependancy hell.

Contributing to cpprb

Any contribution are very welcome!

Making Community Larger

Bigger commumity makes development more active and improve cpprb.

Q & A at Forum

When you have any problems or requests, you can check Discussions on GitHub.com. If you still cannot find any information, you can post your own.

We keep issues on GitLab.com and users are still allowed to open issues, however, we mainly use the place as development issue tracker.

Merge Request (Pull Request)

cpprb follows local rules:

  • Branch Name
    • "HotFix***" for bug fix
    • "Feature***" for new feature implementation
  • docstring
  • Unit Test
    • Put test code under "test/" directory
    • Can test by python -m unittest <Your Test Code> command
    • Continuous Integration on GitLab CI configured by .gitlab-ci.yaml
  • Open an issue and associate it to Merge Request

Step by step instruction for beginners is described at here.

Links

cpprb sites

cpprb users' repositories

Example usage at Kaggle competition

Japanese Documents

Lisence

cpprb is available under MIT lisence.

MIT License

Copyright (c) 2019 Yamada Hiroyuki

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Citation

We would be very happy if you cite cpprb in your papers.

@misc{Yamada_cpprb_2019,
author = {Yamada, Hiroyuki},
month = {1},
title = {{cpprb}},
url = {https://gitlab.com/ymd_h/cpprb},
year = {2019}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpprb-10.2.0.tar.gz (402.4 kB view details)

Uploaded Source

Built Distributions

cpprb-10.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

cpprb-10.2.0-cp38-cp38-win_amd64.whl (324.5 kB view details)

Uploaded CPython 3.8Windows x86-64

cpprb-10.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

cpprb-10.2.0-cp37-cp37m-win_amd64.whl (311.7 kB view details)

Uploaded CPython 3.7mWindows x86-64

cpprb-10.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

cpprb-10.2.0-cp36-cp36m-win_amd64.whl (310.8 kB view details)

Uploaded CPython 3.6mWindows x86-64

cpprb-10.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file cpprb-10.2.0.tar.gz.

File metadata

  • Download URL: cpprb-10.2.0.tar.gz
  • Upload date:
  • Size: 402.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for cpprb-10.2.0.tar.gz
Algorithm Hash digest
SHA256 c4d8c1b65c8c042b3d5729afd2054e0c6b5738f02913b27d25a5c9f972a1102c
MD5 eaad4b01f3b7d4e210bd70cacf594ef0
BLAKE2b-256 66e8a4acc883bafdfebe9c2950d60b028f2bac4d2bd1885bdca268bb3673941f

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpprb-10.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 1789323be85da05acbf433c73b4e3aa05322a2b739f99881b742c60c40b9c473
MD5 d30cf093d1dd7858416045ae9e183357
BLAKE2b-256 40752d215ae2301926a304c06fc8cd545697fe0af0508a6db79f7b00a7093383

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: cpprb-10.2.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 324.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for cpprb-10.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 9e164f3815e140db986b15c1fe7f8b1ebfe88bc5d1f6234c35267bdd772fe5ec
MD5 2674f2e76368fc01210ff8b0cb8fc911
BLAKE2b-256 21388357ad39b05df74bced6c7ef1e10fe8509e00a3c770f120681cbf1da9c9b

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpprb-10.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 edf6f69d1c619edc90e5558e5e5ce2affa36c72d7492fd4a13d264a379341d4a
MD5 fac9990d08f242ffb4af2226dcd99a9c
BLAKE2b-256 43ea9d66f245cd3deda26bc3aea8a612175d2a658aa5053e359cb38909beebed

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: cpprb-10.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 311.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for cpprb-10.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 85cc82796c45eb41810c5bf7a15b15e0bab7319958438936c3ef481dc65570b4
MD5 b682ae0dc140e2550a3d1ecc21287af5
BLAKE2b-256 adb9c13ad972799b96f46a9f876a800953c2ff63fd1260d7a13d98f4cf1318c5

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpprb-10.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2a92c631b19dc1ccc481ac392357fef9e574e401ab2a4dba999077ac1dbdf1ec
MD5 db652b5eb9df820461357f0f239d230a
BLAKE2b-256 d51633b0386c89ff3b8ba6643ec4244fa42b4cc8f46cd66406b93be8b4777122

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: cpprb-10.2.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 310.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for cpprb-10.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 21425251e7fd279e1a63fe3c8e6b2fa11dd19300ce75132223db5dd7696f901a
MD5 f28bc5a61bd653d5c620c1e6712ec25e
BLAKE2b-256 048ad7f02a08d72f579c9372b0745a2a6a9ba299bb67b0a197c77deb1da6c177

See more details on using hashes here.

File details

Details for the file cpprb-10.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for cpprb-10.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fe12ea85bed7cb0622da9ce4f0be40cef7583c66f77fc8ef398f311f49b13bec
MD5 e623367420ab8f1d6c25f45cac17a26e
BLAKE2b-256 216f756c4a57fd65ea932fc8e6f433d19975e1fe400916b3be994c3171b591ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page