Skip to main content

Maximum a Posteriori Policy Optimization and Related Algorithms

Project description

DMPO (wip)

Implementation and explorations into MPO / DMPO

Citations

@article{Haarnoja_2024,
    title     = {Learning agile soccer skills for a bipedal robot with deep reinforcement learning},
    volume    = {9},
    ISSN      = {2470-9476},
    url       = {http://dx.doi.org/10.1126/scirobotics.adi8022},
    DOI       = {10.1126/scirobotics.adi8022},
    number    = {89},
    journal   = {Science Robotics},
    publisher = {American Association for the Advancement of Science (AAAS)},
    author    = {Haarnoja, Tuomas and Moran, Ben and Lever, Guy and Huang, Sandy H. and Tirumala, Dhruva and Humplik, Jan and Wulfmeier, Markus and Tunyasuvunakool, Saran and Siegel, Noah Y. and Hafner, Roland and Bloesch, Michael and Hartikainen, Kristian and Byravan, Arunkumar and Hasenclever, Leonard and Tassa, Yuval and Sadeghi, Fereshteh and Batchelor, Nathan and Casarini, Federico and Saliceti, Stefano and Game, Charles and Sreendra, Neil and Patel, Kushal and Gwira, Marlon and Huber, Andrea and Hurley, Nicole and Nori, Francesco and Hadsell, Raia and Heess, Nicolas},
    year      = {2024},
    month     = {Apr}
}
@misc{abdolmaleki2018maximumposterioripolicyoptimisation,
    title         = {Maximum a Posteriori Policy Optimisation},
    author        = {Abbas Abdolmaleki and Jost Tobias Springenberg and Yuval Tassa and Remi Munos and Nicolas Heess and Martin Riedmiller},
    year          = {2018},
    eprint        = {1806.06920},
    archivePrefix = {arXiv},
    primaryClass  = {cs.LG},
    url           = {https://arxiv.org/abs/1806.06920}
}
@misc{song2019vmpoonpolicymaximumposteriori,
    title         = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
    author        = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin Riedmiller and Matthew M. Botvinick},
    year          = {2019},
    eprint        = {1909.12238},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/1909.12238}
}
@InProceedings{pmlr-v235-li24z,
    title     = {Value-Evolutionary-Based Reinforcement Learning},
    author    = {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages     = {27875--27889},
    year      = {2024},
    editor    = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume    = {235},
    series    = {Proceedings of Machine Learning Research},
    month     = {21--27 Jul},
    publisher = {PMLR},
    pdf       = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
    url       = {https://proceedings.mlr.press/v235/li24z.html}
}
@article{kaddour2026target,
    title   = {Target Policy Optimization},
    author  = {Kaddour, Jean},
    journal = {arXiv preprint arXiv:2604.06159},
    year    = {2026}
}
@misc{qu2026listwisepolicyoptimizationgroupbased,
    title   = {Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex},
    author  = {Yun Qu and Qi Wang and Yixiu Mao and Heming Zou and Yuhang Jiang and Yingyue Li and Wutong Xu and Lizhou Cai and Weijie Liu and Clive Bai and Kai Yang and Yangkun Chen and Saiyong Yang and Xiangyang Ji},
    year    = {2026},
    eprint  = {2605.06139},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2605.06139},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmpo-0.0.3.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmpo-0.0.3-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file dmpo-0.0.3.tar.gz.

File metadata

  • Download URL: dmpo-0.0.3.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.3.tar.gz
Algorithm Hash digest
SHA256 4be26446c7374f2b916bf9a10a91ae300d1f45ce97611303443cd42542826433
MD5 2652a7f150dd97e11e208cfd5fdab662
BLAKE2b-256 3ab25a5ac7aa5bb93a416bd991ee3e51f3e47025dfa159b48d42bb1bcd3acafa

See more details on using hashes here.

File details

Details for the file dmpo-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: dmpo-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 84919fa467e2a85708c7af07a7d59ad48d6fc1218463e6ebc6a29c8f66f59cc6
MD5 51be118187cc1a9b65f346cd4af79bc4
BLAKE2b-256 9a3be3ad0a87048eb48dc3898b853d652ec54da493db99c1c1bbc2795e0f70fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page