Skip to main content

Maximum a Posteriori Policy Optimization and Related Algorithms

Project description

DMPO (wip)

Implementation and explorations into MPO / DMPO

Citations

@article{Haarnoja_2024,
    title     = {Learning agile soccer skills for a bipedal robot with deep reinforcement learning},
    volume    = {9},
    ISSN      = {2470-9476},
    url       = {http://dx.doi.org/10.1126/scirobotics.adi8022},
    DOI       = {10.1126/scirobotics.adi8022},
    number    = {89},
    journal   = {Science Robotics},
    publisher = {American Association for the Advancement of Science (AAAS)},
    author    = {Haarnoja, Tuomas and Moran, Ben and Lever, Guy and Huang, Sandy H. and Tirumala, Dhruva and Humplik, Jan and Wulfmeier, Markus and Tunyasuvunakool, Saran and Siegel, Noah Y. and Hafner, Roland and Bloesch, Michael and Hartikainen, Kristian and Byravan, Arunkumar and Hasenclever, Leonard and Tassa, Yuval and Sadeghi, Fereshteh and Batchelor, Nathan and Casarini, Federico and Saliceti, Stefano and Game, Charles and Sreendra, Neil and Patel, Kushal and Gwira, Marlon and Huber, Andrea and Hurley, Nicole and Nori, Francesco and Hadsell, Raia and Heess, Nicolas},
    year      = {2024},
    month     = {Apr}
}
@misc{abdolmaleki2018maximumposterioripolicyoptimisation,
    title         = {Maximum a Posteriori Policy Optimisation},
    author        = {Abbas Abdolmaleki and Jost Tobias Springenberg and Yuval Tassa and Remi Munos and Nicolas Heess and Martin Riedmiller},
    year          = {2018},
    eprint        = {1806.06920},
    archivePrefix = {arXiv},
    primaryClass  = {cs.LG},
    url           = {https://arxiv.org/abs/1806.06920}
}
@misc{song2019vmpoonpolicymaximumposteriori,
    title         = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
    author        = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin Riedmiller and Matthew M. Botvinick},
    year          = {2019},
    eprint        = {1909.12238},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/1909.12238}
}
@InProceedings{pmlr-v235-li24z,
    title     = {Value-Evolutionary-Based Reinforcement Learning},
    author    = {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages     = {27875--27889},
    year      = {2024},
    editor    = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume    = {235},
    series    = {Proceedings of Machine Learning Research},
    month     = {21--27 Jul},
    publisher = {PMLR},
    pdf       = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
    url       = {https://proceedings.mlr.press/v235/li24z.html}
}
@article{kaddour2026target,
    title   = {Target Policy Optimization},
    author  = {Kaddour, Jean},
    journal = {arXiv preprint arXiv:2604.06159},
    year    = {2026}
}
@misc{qu2026listwisepolicyoptimizationgroupbased,
    title   = {Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex},
    author  = {Yun Qu and Qi Wang and Yixiu Mao and Heming Zou and Yuhang Jiang and Yingyue Li and Wutong Xu and Lizhou Cai and Weijie Liu and Clive Bai and Kai Yang and Yangkun Chen and Saiyong Yang and Xiangyang Ji},
    year    = {2026},
    eprint  = {2605.06139},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2605.06139},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmpo-0.0.5.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmpo-0.0.5-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file dmpo-0.0.5.tar.gz.

File metadata

  • Download URL: dmpo-0.0.5.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.5.tar.gz
Algorithm Hash digest
SHA256 87f52ffa19f1bc3c94bb6cfdfdf62a29310847cc9a4b2e170b82bb96df701266
MD5 cbf39a192f8ce13428b5f3fa9c9a2347
BLAKE2b-256 2e20f7c06946518cdd474f5a0068a4677e3f317dc9ea0d4e6bfa8043ca9d1d83

See more details on using hashes here.

File details

Details for the file dmpo-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: dmpo-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b2b6c384f98d11bedceb26d91849bf3278cb092d0d486fed9f3d70539d7c5851
MD5 3af8fa659bd478c41aa442ca73c4c828
BLAKE2b-256 537a1b2e38b78ae013906ea1c4a2b8ba81c036b0b4e856b4dacae0be3e320e3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page