Skip to main content

Soft Actor Critic - Pytorch

Project description

SAC (Soft Actor Critic) - Pytorch (wip)

Implementation of Soft Actor Critic and some of its improvements in Pytorch. Interest comes from watching this lecture

import torch
from SAC_pytorch import (
  SAC,
  Actor,
  Critic,
  MultipleCritics
)

critic1 = Critic(
  dim_state = 5,
  num_cont_actions = 2,
  num_discrete_actions = (5, 5),
  num_quantiles = 3
)

critic2 = Critic(
  dim_state = 5,
  num_cont_actions = 2,
  num_discrete_actions = (5, 5),
  num_quantiles = 3
)

actor = Actor(
  dim_state = 5,
  num_cont_actions = 2,
  num_discrete_actions = (5, 5)
)

agent = SAC(
  actor = actor,
  critics = [
    dict(dim_state = 5, num_cont_actions = 2, num_discrete_actions = (5, 5)),
    dict(dim_state = 5, num_cont_actions = 2, num_discrete_actions = (5, 5)),
  ],
  quantiled_critics = False
)

state = torch.randn(3, 5)
cont_actions, discrete, cont_logprob, discrete_logprob = actor(state, sample = True)

agent(
  states = state,
  cont_actions = cont_actions,
  discrete_actions = discrete,
  rewards = torch.randn(1),
  done = torch.zeros(1).bool(),
  next_states = state + 1
)

Citations

@article{Haarnoja2018SoftAA,
    title   = {Soft Actor-Critic Algorithms and Applications},
    author  = {Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and G. Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and P. Abbeel and Sergey Levine},
    journal = {ArXiv},
    year    = {2018},
    volume  = {abs/1812.05905},
    url     = {https://api.semanticscholar.org/CorpusID:55703664}
}
@article{Hiraoka2021DropoutQF,
    title   = {Dropout Q-Functions for Doubly Efficient Reinforcement Learning},
    author  = {Takuya Hiraoka and Takahisa Imagawa and Taisei Hashimoto and Takashi Onishi and Yoshimasa Tsuruoka},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2110.02034},
    url     = {https://api.semanticscholar.org/CorpusID:238353966}
}
@inproceedings{ObandoCeron2024MixturesOE,
    title   = {Mixtures of Experts Unlock Parameter Scaling for Deep RL},
    author  = {Johan S. Obando-Ceron and Ghada Sokar and Timon Willi and Clare Lyle and Jesse Farebrother and Jakob Foerster and Gintare Karolina Dziugaite and Doina Precup and Pablo Samuel Castro},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:267637059}
}
@inproceedings{Kumar2023MaintainingPI,
    title   = {Maintaining Plasticity in Continual Learning via Regenerative Regularization},
    author  = {Saurabh Kumar and Henrik Marklund and Benjamin Van Roy},
    year    = {2023},
    url     = {https://api.semanticscholar.org/CorpusID:261076021}
}
@inproceedings{Kuznetsov2020ControllingOB,
    title   = {Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics},
    author  = {Arsenii Kuznetsov and Pavel Shvechikov and Alexander Grishin and Dmitry P. Vetrov},
    booktitle = {International Conference on Machine Learning},
    year    = {2020},
    url     = {https://api.semanticscholar.org/CorpusID:218581840}
}
@article{Zagoruyko2017DiracNetsTV,
    title   = {DiracNets: Training Very Deep Neural Networks Without Skip-Connections},
    author={Sergey Zagoruyko and Nikos Komodakis},
    journal = {ArXiv},
    year    = {2017},
    volume  = {abs/1706.00388},
    url     = {https://api.semanticscholar.org/CorpusID:1086822}
}
@article{Abbas2023LossOP,
    title  = {Loss of Plasticity in Continual Deep Reinforcement Learning},
    author = {Zaheer Abbas and Rosie Zhao and Joseph Modayil and Adam White and Marlos C. Machado},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2303.07507},
    url     = {https://api.semanticscholar.org/CorpusID:257504763}
}
@article{Nauman2024BiggerRO,
    title   = {Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control},
    author  = {Michal Nauman and Mateusz Ostaszewski and Krzysztof Jankowski and Piotr Milo's and Marek Cygan},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2405.16158},
    url     = {https://api.semanticscholar.org/CorpusID:270063045}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sac_pytorch-0.0.3.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

sac_pytorch-0.0.3-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file sac_pytorch-0.0.3.tar.gz.

File metadata

  • Download URL: sac_pytorch-0.0.3.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for sac_pytorch-0.0.3.tar.gz
Algorithm Hash digest
SHA256 269b7ab35cff6e38b178c43b8249f2829b287a0fb11942306bd5cd3e6ca6e986
MD5 ed443eea208df346084d424973482c25
BLAKE2b-256 85306a9cc1f2d5c3cfb6992a21bc1a5526058086ad17222c74f1ebfdb0d1bf0b

See more details on using hashes here.

File details

Details for the file sac_pytorch-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: sac_pytorch-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for sac_pytorch-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7f3121d644318b989a2ff54be098e4732a876f03a076dfc33e68aa1aaef2ce9b
MD5 41a3b718a51cfd2c66944933d88d4db5
BLAKE2b-256 252e35f4263c1c4bb01fde4952af0e009a8175141041f33acc4a9bf876a99e29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page