Maximum a Posteriori Policy Optimization and Related Algorithms
Project description
DMPO (wip)
Implementation and explorations into MPO / DMPO
Citations
@article{Haarnoja_2024,
title = {Learning agile soccer skills for a bipedal robot with deep reinforcement learning},
volume = {9},
ISSN = {2470-9476},
url = {http://dx.doi.org/10.1126/scirobotics.adi8022},
DOI = {10.1126/scirobotics.adi8022},
number = {89},
journal = {Science Robotics},
publisher = {American Association for the Advancement of Science (AAAS)},
author = {Haarnoja, Tuomas and Moran, Ben and Lever, Guy and Huang, Sandy H. and Tirumala, Dhruva and Humplik, Jan and Wulfmeier, Markus and Tunyasuvunakool, Saran and Siegel, Noah Y. and Hafner, Roland and Bloesch, Michael and Hartikainen, Kristian and Byravan, Arunkumar and Hasenclever, Leonard and Tassa, Yuval and Sadeghi, Fereshteh and Batchelor, Nathan and Casarini, Federico and Saliceti, Stefano and Game, Charles and Sreendra, Neil and Patel, Kushal and Gwira, Marlon and Huber, Andrea and Hurley, Nicole and Nori, Francesco and Hadsell, Raia and Heess, Nicolas},
year = {2024},
month = {Apr}
}
@misc{abdolmaleki2018maximumposterioripolicyoptimisation,
title = {Maximum a Posteriori Policy Optimisation},
author = {Abbas Abdolmaleki and Jost Tobias Springenberg and Yuval Tassa and Remi Munos and Nicolas Heess and Martin Riedmiller},
year = {2018},
eprint = {1806.06920},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/1806.06920}
}
@misc{song2019vmpoonpolicymaximumposteriori,
title = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
author = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin Riedmiller and Matthew M. Botvinick},
year = {2019},
eprint = {1909.12238},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/1909.12238}
}
@InProceedings{pmlr-v235-li24z,
title = {Value-Evolutionary-Based Reinforcement Learning},
author = {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {27875--27889},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
url = {https://proceedings.mlr.press/v235/li24z.html}
}
@article{kaddour2026target,
title = {Target Policy Optimization},
author = {Kaddour, Jean},
journal = {arXiv preprint arXiv:2604.06159},
year = {2026}
}
@misc{qu2026listwisepolicyoptimizationgroupbased,
title = {Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex},
author = {Yun Qu and Qi Wang and Yixiu Mao and Heming Zou and Yuhang Jiang and Yingyue Li and Wutong Xu and Lizhou Cai and Weijie Liu and Clive Bai and Kai Yang and Yangkun Chen and Saiyong Yang and Xiangyang Ji},
year = {2026},
eprint = {2605.06139},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2605.06139},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dmpo-0.0.2.tar.gz
(8.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
dmpo-0.0.2-py3-none-any.whl
(8.2 kB
view details)
File details
Details for the file dmpo-0.0.2.tar.gz.
File metadata
- Download URL: dmpo-0.0.2.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
706807569e480931dd09171d2c854a1a040a0b9f14a13108582f08767235d82f
|
|
| MD5 |
e61b6f52a3a51494356856f1b4e7db78
|
|
| BLAKE2b-256 |
8dd8c339258c6803d5a6cd45c9b5afda4c244efea983ccb2c94a488b9595173e
|
File details
Details for the file dmpo-0.0.2-py3-none-any.whl.
File metadata
- Download URL: dmpo-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c83c0710e2bae0785788724858da8704574792cdd2bdccc3f9279ae008d43311
|
|
| MD5 |
4a38f285f47c0c51001fd0fa8b849860
|
|
| BLAKE2b-256 |
ffd3dc8a844be3df001a419b34cc65e6edc324cedfdf810bd4d6ef15b0f2db57
|