Extra buffer classes for Stable-Baselines3, reduce memory usage with minimal overhead.

These details have not been verified by PyPI

Project links

Project description

PyPI - License PyPI - Implementation

sb3-extra-buffers

Unofficial implementation of extra Stable-Baselines3 buffer classes. Aims to reduce memory usage drastically with minimal overhead.

Links:

Main Goal: Reduce the memory consumption of memory buffers in Reinforcement Learning while adding minimal overhead.

TO-DO List:

Compression Methods (Essential)
- rle (with numpy)
  - allergic-to-for-loops version (faster implementation)
- rle-jit (with numba jit compilation)
  - include initialization of jit in buffer dtype calculation
- gzip (with gzip)
- igzip (with isal.igzip)
Compressed Buffers (Essential)
- Compressed Rollout Buffer
- Compressed Replay Buffer
Compressed Buffers (Extras)
- Other buffers in SB3
  - DictRolloutBuffer
  - DictReplayBuffer
  - NStepReplayBuffer
- Buffers in SB3-Contrib
Compressed Buffer Tests (via pytest)
- Parallel testing (pytest-xdist) support
Compressed Array (maybe can make porting easier)
- Essential np.ndarray operations
- np.ndarray slicing / view operations
- Full np.ndarray operations coverage??
Recording Buffers for game episodes
- Compressed Recording Buffers
Buffer warm-up and model evaluation utility functions
Example train / eval scripts with compressed buffers
- Atari
- ViZDoom
Report results for example train / eval scripts
- Atari
- ViZDoom
Report memory saving
- Atari
- ViZDoom
Documentation & better readme
Define a standard bytes-out (compress) bytes-in (decompress) interface and store compressed obs in np.ndarray[bytes]

Motivation: Reinforcement Learning is quite memory-hungry due to massive buffer sizes, so let's try to tackle it by not storing raw frame buffers in full np.float32 or np.uint8 directly and find something smaller instead. For any input data that are sparse and containing large contiguous region of repeating values, lossless compression techniques can be applied to reduce memory footprint.

Applicable Input Types:

Semantic Segmentation masks (1 color channel)
Color Palette game frames from retro video games
Grayscale observations
RGB (Color) observations
For noisy input with a lot of variation (mostly RGB), using gzip1 or igzip0 is recommended, run-length encoding won't work as great and can potentially even increase memory usage.

Implemented Compression Methods:

rle Vectorized Run-Length Encoding for compression.
rle-jit JIT-compiled version of rle, uses numba library.
gzip Gzip compression via gzip.
igzip Intel accelerated variant via isal.igzip, uses python-isal library.
none No compression other than casting to elem_type and storing as bytes.

gzip supports 0-9 compress levels, 0 is no compression, 1 is least compression

igzip supports 0-3 compress levels, 0 is least compression

Shorthands are supported, i.e. igzip3 = igzip at level 3

Installation

Install via PyPI:

pip install "sb3-extra-buffers[fast,extra]"

Other install options:

pip install "sb3-extra-buffers"          # only installs minimum requirements
pip install "sb3-extra-buffers[extra]"   # installs extra dependencies for SB3
pip install "sb3-extra-buffers[fast]"    # installs python-isal and numba
pip install "sb3-extra-buffers[isal]"    # only installs python-isal
pip install "sb3-extra-buffers[numba]"   # only installs numba
pip install "sb3-extra-buffers[vizdoom]" # installs vizdoom

Current Project Structure

sb3_extra_buffers
    |- compressed
    |    |- CompressedRolloutBuffer: RolloutBuffer with compression
    |    |- CompressedReplayBuffer: ReplayBuffer with compression
    |    |- CompressedArray: Compressed numpy.ndarray subclass
    |
    |- recording
    |    |- RecordBuffer: A buffer for recording game states
    |    |- FramelessRecordBuffer: RecordBuffer but not recording game frames
    |    |- DummyRecordBuffer: Dummy RecordBuffer, records nothing
    |
    |- training_utils
         |- eval_model: Evaluate models in vectorized environment
         |- warmup: Perform buffer warmup for off-policy algorithms

Example Scripts

Example scripts have been included and tested to ensure working properly.

Evaluation results for example training scripts:

PPO on PongNoFrameskip-v4, trained for 10M steps using rle-jit, framestack: None

(Best ) Evaluated 10000 episodes, mean reward: 21.0 +/- 0.00
Q1:   21 | Q2:   21 | Q3:   21 | Relative IQR: 0.00 | Min: 21 | Max: 21
(Final) Evaluated 10000 episodes, mean reward: 21.0 +/- 0.02
Q1:   21 | Q2:   21 | Q3:   21 | Relative IQR: 0.00 | Min: 20 | Max: 21

PPO on MsPacmanNoFrameskip-v4, trained for 10M steps using rle-jit, framestack: 4

(Best ) Evaluated 10000 episodes, mean reward: 

(Final) Evaluated 10000 episodes, mean reward:

DQN on MsPacmanNoFrameskip-v4, trained for 10M steps using rle-jit, framestack: 4

(Best ) Evaluated 10000 episodes, mean reward: 3300.0 +/- 770.79
Q1: 2490 | Q2: 4020 | Q3: 4020 | Relative IQR: 0.38 | Min: 2460 | Max: 4020
(Final) Evaluated 10000 episodes, mean reward: 3379.2 +/- 453.78
Q1: 2690 | Q2: 3400 | Q3: 3880 | Relative IQR: 0.35 | Min: 1230 | Max: 4090

Pytest

Make sure pytest and optionally pytest-xdist are already installed. Tests are compatible with pytest-xdist since DummyVecEnv is used for all tests.

# pytest
pytest tests -v --durations=0 --tb=short
# pytest-xdist
pytest tests -n auto -v --durations=0 --tb=short

Compressed Buffers

Defined in sb3_extra_buffers.compressed

JIT Before Multi-Processing: When using rle-jit, remember to trigger JIT compilation before any multi-processing code is executed via find_buffer_dtypes or init_jit.

# Code for other stuffs...

# Get observation space from environment
obs = make_env(env_id=ATARI_GAME, n_envs=1, framestack=4).observation_space

# Get the buffer datatype settings via find_buffer_dtypes
compression = "rle-jit"
buffer_dtypes = find_buffer_dtypes(obs_shape=obs.shape, elem_dtype=obs.dtype, compression_method=compression)

# Now, safe to initialize multi-processing environments!
env = SubprocVecEnv(...)

Example Usage:

from stable_baselines3 import PPO
from stable_baselines3.common.utils import get_linear_fn
from stable_baselines3.common.callbacks import EvalCallback
from sb3_extra_buffers.compressed import CompressedRolloutBuffer, find_buffer_dtypes
from sb3_extra_buffers.training_utils.atari import make_env

ATARI_GAME = "MsPacmanNoFrameskip-v4"

if __name__ == "__main__":
    obs = make_env(env_id=ATARI_GAME, n_envs=1, framestack=4).observation_space
    compression = "rle-jit"
    buffer_dtypes = find_buffer_dtypes(obs_shape=obs.shape, elem_dtype=obs.dtype, compression_method=compression)

    env = make_env(env_id=ATARI_GAME, n_envs=8, framestack=4)
    eval_env = make_env(env_id=ATARI_GAME, n_envs=10, framestack=4)

    # Create PPO model using CompressedRolloutBuffer
    model = PPO("CnnPolicy", env, verbose=1, learning_rate=get_linear_fn(2.5e-4, 0, 1), n_steps=128,
                batch_size=256, clip_range=get_linear_fn(0.1, 0, 1), n_epochs=4, ent_coef=0.01, vf_coef=0.5,
                seed=1970626835, device="mps", rollout_buffer_class=CompressedRolloutBuffer,
                rollout_buffer_kwargs=dict(dtypes=buffer_dtypes, compression_method=compression))

    # Evaluation callback (optional)
    eval_callback = EvalCallback(eval_env, n_eval_episodes=20, eval_freq=8192, log_path=f"./logs/{ATARI_GAME}/ppo/eval",
                                 best_model_save_path=f"./logs/{ATARI_GAME}/ppo/best_model")

    # Training
    model.learn(total_timesteps=10_000_000, callback=eval_callback, progress_bar=True)

    # Save the final model
    model.save("ppo_MsPacman_4.zip")

    # Cleanup
    env.close()
    eval_env.close()

Recording Buffers

Defined in sb3_extra_buffers.recording Mainly used in combination with SegDoom to record stuff.

WIP

Training Utils

Defined in sb3_extra_buffers.training_utils Buffer warm-up and model evaluation

WIP

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.3

Aug 6, 2025

0.4.2

Aug 6, 2025

0.4.1

Jul 22, 2025

0.4.0

Jul 22, 2025

0.3.2

Jul 21, 2025

This version

0.3.1

Jul 20, 2025

0.2.3

Jul 20, 2025

0.2.2

Jul 19, 2025

0.2.1 yanked

Jul 19, 2025