Skip to main content

Soft Q-target reinforcement learning algorithms for Stable-Baselines3

Project description

sb3-soft

sb3-soft provides reinforcement learning algorithms with soft Q-targets, implemented on top of Stable-Baselines3.

Current scope:

  • Discrete action spaces
  • SQL (Soft Q-Learning)
  • SDSAC (Stable Discrete Soft Actor-Critic)

Why sb3-soft?

  • Familiar SB3-style API (learn, predict, save, load)
  • Drop-in usage for Gymnasium discrete environments
  • Strong algorithm-focused implementation with clean class-level docstrings

Installation

Install from PyPI:

pip install sb3-soft
# or
uv add sb3-soft

Install the latest development version:

pip install git+https://github.com/miki-yuasa/sb3-soft.git
# or
uv add git+https://github.com/miki-yuasa/sb3-soft.git

Quick Start

SQL

from sb3_soft import SQL
from stable_baselines3.common.env_util import make_vec_env

env = make_vec_env("CartPole-v1", n_envs=1)

model = SQL(
    "MlpPolicy",
    env,
    learning_rate=1e-4,
    buffer_size=100_000,
    verbose=1,
)
model.learn(total_timesteps=100_000)
model.save("sql_cartpole")

SDSAC

from sb3_soft import SDSAC
from stable_baselines3.common.env_util import make_vec_env

env = make_vec_env("CartPole-v1", n_envs=1)

model = SDSAC(
    "MlpPolicy",
    env,
    learning_rate=3e-4,
    buffer_size=100_000,
    batch_size=256,
    verbose=1,
)
model.learn(total_timesteps=100_000)
model.save("sdsac_cartpole")

Algorithms

SQL (Soft Q-Learning)

  • Entropy-regularized Bellman backups via soft value targets
  • Boltzmann (softmax) sampling over Q-values
  • Optional automatic entropy-coefficient tuning

SDSAC (Stable Discrete SAC)

  • Categorical actor + twin critics for discrete actions
  • Double-average Q-learning (mean twin target)
  • Q-clip critic loss and entropy-penalty term for stability
  • Replay buffers that store per-transition old policy entropy

Documentation

Build docs locally:

uv sync --group dev
cd docs
uv run sphinx-build -b html . _build/html

Development

Set up a local development environment:

uv sync --group dev --group lint

Run tests:

uv run pytest

Citation

@misc{yuasa2026sb3soft,
  author = {Yuasa, Mikihisa},
  title = {sb3-soft},
  year = {2026},
  howpublished = {\url{https://github.com/miki-yuasa/sb3-soft}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sb3_soft-0.1.1.tar.gz (120.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sb3_soft-0.1.1-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file sb3_soft-0.1.1.tar.gz.

File metadata

  • Download URL: sb3_soft-0.1.1.tar.gz
  • Upload date:
  • Size: 120.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.8

File hashes

Hashes for sb3_soft-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ae19dc9265934e7f9c46b7fd1b0aa05a60e6eb63a9192521b31a119e239a1c2b
MD5 6c67926505c43820c23ce1115fbc96f2
BLAKE2b-256 325e220f8717b6e7e7f0d1bb9e7762dbfa283efb9b9cc6093d8b7a9fa2d29c20

See more details on using hashes here.

File details

Details for the file sb3_soft-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sb3_soft-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.8

File hashes

Hashes for sb3_soft-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0b7f255cd243e5013f9dd0d215a4eca8a069eaa58fc339d1847a863601e6e7dc
MD5 eea971212eb0a8ece67e6abebee3966e
BLAKE2b-256 e26ff8de401d84be65a1f2e80a23831dbe821954ab9c1ce656ed0bb857212691

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page