Skip to main content

Pytorch version of Stable Baselines, implementations of reinforcement learning algorithms.

Project description

Stable Baselines3

Stable Baselines3 is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines.

These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. We also hope that the simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.

Links

Repository: https://github.com/DLR-RM/stable-baselines3

Blog post: https://araffin.github.io/post/sb3/

Documentation: https://stable-baselines3.readthedocs.io/en/master/

RL Baselines3 Zoo: https://github.com/DLR-RM/rl-baselines3-zoo

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

SBX (SB3 + Jax): https://github.com/araffin/sbx

Quick example

Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms using Gym.

Here is a quick example of how to train and run PPO on a cartpole environment:

import gymnasium

from stable_baselines3 import PPO

env = gymnasium.make("CartPole-v1", render_mode="human")

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

vec_env = model.get_env()
obs = vec_env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = vec_env.step(action)
    vec_env.render()
    # VecEnv resets automatically
    # if done:
    #   obs = vec_env.reset()

Or just train a model with a one liner if the environment is registered in Gymnasium and if the policy is registered:

from stable_baselines3 import PPO

model = PPO("MlpPolicy", "CartPole-v1").learn(10_000)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stable_baselines3-2.9.0.tar.gz (221.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stable_baselines3-2.9.0-py3-none-any.whl (187.6 kB view details)

Uploaded Python 3

File details

Details for the file stable_baselines3-2.9.0.tar.gz.

File metadata

  • Download URL: stable_baselines3-2.9.0.tar.gz
  • Upload date:
  • Size: 221.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stable_baselines3-2.9.0.tar.gz
Algorithm Hash digest
SHA256 92b46c6099a0e8f99163ff09e26729e4d0a68b33dc8598626ca13ade3c0b3a61
MD5 7327fd03dffd20e169290c09c2fbce8e
BLAKE2b-256 89f89c1901a42e55b21b9e9559a720cb33dffdeb4c4215b1dfd10d63224b19c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_baselines3-2.9.0.tar.gz:

Publisher: pypi-publish.yml on DLR-RM/stable-baselines3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stable_baselines3-2.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for stable_baselines3-2.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 95f39a473dce081d1abe31acaf7cee446dcf223dcd74093d6ea460bc37e2e748
MD5 04d41ad2153f00f90433a5f12aae1b81
BLAKE2b-256 a670d18c8278223d4928378c4af822b66d3c87e19e73bcb011e05e2f0b43f4e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for stable_baselines3-2.9.0-py3-none-any.whl:

Publisher: pypi-publish.yml on DLR-RM/stable-baselines3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page