Skip to main content

Maximum a Posteriori Policy Optimization and Related Algorithms

Project description

DMPO (wip)

Implementation and explorations into MPO / DMPO

Citations

@article{Haarnoja_2024,
    title     = {Learning agile soccer skills for a bipedal robot with deep reinforcement learning},
    volume    = {9},
    ISSN      = {2470-9476},
    url       = {http://dx.doi.org/10.1126/scirobotics.adi8022},
    DOI       = {10.1126/scirobotics.adi8022},
    number    = {89},
    journal   = {Science Robotics},
    publisher = {American Association for the Advancement of Science (AAAS)},
    author    = {Haarnoja, Tuomas and Moran, Ben and Lever, Guy and Huang, Sandy H. and Tirumala, Dhruva and Humplik, Jan and Wulfmeier, Markus and Tunyasuvunakool, Saran and Siegel, Noah Y. and Hafner, Roland and Bloesch, Michael and Hartikainen, Kristian and Byravan, Arunkumar and Hasenclever, Leonard and Tassa, Yuval and Sadeghi, Fereshteh and Batchelor, Nathan and Casarini, Federico and Saliceti, Stefano and Game, Charles and Sreendra, Neil and Patel, Kushal and Gwira, Marlon and Huber, Andrea and Hurley, Nicole and Nori, Francesco and Hadsell, Raia and Heess, Nicolas},
    year      = {2024},
    month     = {Apr}
}
@misc{abdolmaleki2018maximumposterioripolicyoptimisation,
    title         = {Maximum a Posteriori Policy Optimisation},
    author        = {Abbas Abdolmaleki and Jost Tobias Springenberg and Yuval Tassa and Remi Munos and Nicolas Heess and Martin Riedmiller},
    year          = {2018},
    eprint        = {1806.06920},
    archivePrefix = {arXiv},
    primaryClass  = {cs.LG},
    url           = {https://arxiv.org/abs/1806.06920}
}
@misc{song2019vmpoonpolicymaximumposteriori,
    title         = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
    author        = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin Riedmiller and Matthew M. Botvinick},
    year          = {2019},
    eprint        = {1909.12238},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/1909.12238}
}
@InProceedings{pmlr-v235-li24z,
    title     = {Value-Evolutionary-Based Reinforcement Learning},
    author    = {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages     = {27875--27889},
    year      = {2024},
    editor    = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume    = {235},
    series    = {Proceedings of Machine Learning Research},
    month     = {21--27 Jul},
    publisher = {PMLR},
    pdf       = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
    url       = {https://proceedings.mlr.press/v235/li24z.html}
}
@article{kaddour2026target,
    title   = {Target Policy Optimization},
    author  = {Kaddour, Jean},
    journal = {arXiv preprint arXiv:2604.06159},
    year    = {2026}
}
@misc{qu2026listwisepolicyoptimizationgroupbased,
    title   = {Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex},
    author  = {Yun Qu and Qi Wang and Yixiu Mao and Heming Zou and Yuhang Jiang and Yingyue Li and Wutong Xu and Lizhou Cai and Weijie Liu and Clive Bai and Kai Yang and Yangkun Chen and Saiyong Yang and Xiangyang Ji},
    year    = {2026},
    eprint  = {2605.06139},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2605.06139},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmpo-0.0.2.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmpo-0.0.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file dmpo-0.0.2.tar.gz.

File metadata

  • Download URL: dmpo-0.0.2.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.2.tar.gz
Algorithm Hash digest
SHA256 706807569e480931dd09171d2c854a1a040a0b9f14a13108582f08767235d82f
MD5 e61b6f52a3a51494356856f1b4e7db78
BLAKE2b-256 8dd8c339258c6803d5a6cd45c9b5afda4c244efea983ccb2c94a488b9595173e

See more details on using hashes here.

File details

Details for the file dmpo-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: dmpo-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c83c0710e2bae0785788724858da8704574792cdd2bdccc3f9279ae008d43311
MD5 4a38f285f47c0c51001fd0fa8b849860
BLAKE2b-256 ffd3dc8a844be3df001a419b34cc65e6edc324cedfdf810bd4d6ef15b0f2db57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page