Soft Q-target reinforcement learning algorithms for Stable-Baselines3
Project description
sb3-soft
sb3-soft provides reinforcement learning algorithms with soft Q-targets,
implemented on top of
Stable-Baselines3.
Current scope:
- Discrete action spaces
- SQL (Soft Q-Learning)
- SDSAC (Stable Discrete Soft Actor-Critic)
Why sb3-soft?
- Familiar SB3-style API (
learn,predict,save,load) - Drop-in usage for Gymnasium discrete environments
- Strong algorithm-focused implementation with clean class-level docstrings
Installation
Install from PyPI:
pip install sb3-soft
# or
uv add sb3-soft
Install the latest development version:
pip install git+https://github.com/miki-yuasa/sb3-soft.git
# or
uv add git+https://github.com/miki-yuasa/sb3-soft.git
Quick Start
SQL
from sb3_soft import SQL
from stable_baselines3.common.env_util import make_vec_env
env = make_vec_env("CartPole-v1", n_envs=1)
model = SQL(
"MlpPolicy",
env,
learning_rate=1e-4,
buffer_size=100_000,
verbose=1,
)
model.learn(total_timesteps=100_000)
model.save("sql_cartpole")
SDSAC
from sb3_soft import SDSAC
from stable_baselines3.common.env_util import make_vec_env
env = make_vec_env("CartPole-v1", n_envs=1)
model = SDSAC(
"MlpPolicy",
env,
learning_rate=3e-4,
buffer_size=100_000,
batch_size=256,
verbose=1,
)
model.learn(total_timesteps=100_000)
model.save("sdsac_cartpole")
Algorithms
SQL (Soft Q-Learning)
- Entropy-regularized Bellman backups via soft value targets
- Boltzmann (softmax) sampling over Q-values
- Optional automatic entropy-coefficient tuning
SDSAC (Stable Discrete SAC)
- Categorical actor + twin critics for discrete actions
- Double-average Q-learning (mean twin target)
- Q-clip critic loss and entropy-penalty term for stability
- Replay buffers that store per-transition old policy entropy
Documentation
- API and usage docs: https://miki-yuasa.github.io/sb3-soft/
- Documentation is generated from in-code docstrings using Sphinx.
Build docs locally:
uv sync --group dev
cd docs
uv run sphinx-build -b html . _build/html
Development
Set up a local development environment:
uv sync --group dev --group lint
Run tests:
uv run pytest
Citation
@misc{yuasa2026sb3soft,
author = {Yuasa, Mikihisa},
title = {sb3-soft},
year = {2026},
howpublished = {\url{https://github.com/miki-yuasa/sb3-soft}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sb3_soft-0.1.1.tar.gz.
File metadata
- Download URL: sb3_soft-0.1.1.tar.gz
- Upload date:
- Size: 120.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae19dc9265934e7f9c46b7fd1b0aa05a60e6eb63a9192521b31a119e239a1c2b
|
|
| MD5 |
6c67926505c43820c23ce1115fbc96f2
|
|
| BLAKE2b-256 |
325e220f8717b6e7e7f0d1bb9e7762dbfa283efb9b9cc6093d8b7a9fa2d29c20
|
File details
Details for the file sb3_soft-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sb3_soft-0.1.1-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b7f255cd243e5013f9dd0d215a4eca8a069eaa58fc339d1847a863601e6e7dc
|
|
| MD5 |
eea971212eb0a8ece67e6abebee3966e
|
|
| BLAKE2b-256 |
e26ff8de401d84be65a1f2e80a23831dbe821954ab9c1ce656ed0bb857212691
|