dmpo · PyPI

Maximum a Posteriori Policy Optimization and Related Algorithms

These details have not been verified by PyPI

Project links

Repository

Project description

DMPO (wip)

Implementation and explorations into MPO / DMPO

Citations

@article{Haarnoja_2024,
    title     = {Learning agile soccer skills for a bipedal robot with deep reinforcement learning},
    volume    = {9},
    ISSN      = {2470-9476},
    url       = {http://dx.doi.org/10.1126/scirobotics.adi8022},
    DOI       = {10.1126/scirobotics.adi8022},
    number    = {89},
    journal   = {Science Robotics},
    publisher = {American Association for the Advancement of Science (AAAS)},
    author    = {Haarnoja, Tuomas and Moran, Ben and Lever, Guy and Huang, Sandy H. and Tirumala, Dhruva and Humplik, Jan and Wulfmeier, Markus and Tunyasuvunakool, Saran and Siegel, Noah Y. and Hafner, Roland and Bloesch, Michael and Hartikainen, Kristian and Byravan, Arunkumar and Hasenclever, Leonard and Tassa, Yuval and Sadeghi, Fereshteh and Batchelor, Nathan and Casarini, Federico and Saliceti, Stefano and Game, Charles and Sreendra, Neil and Patel, Kushal and Gwira, Marlon and Huber, Andrea and Hurley, Nicole and Nori, Francesco and Hadsell, Raia and Heess, Nicolas},
    year      = {2024},
    month     = {Apr}
}

@misc{abdolmaleki2018maximumposterioripolicyoptimisation,
    title         = {Maximum a Posteriori Policy Optimisation},
    author        = {Abbas Abdolmaleki and Jost Tobias Springenberg and Yuval Tassa and Remi Munos and Nicolas Heess and Martin Riedmiller},
    year          = {2018},
    eprint        = {1806.06920},
    archivePrefix = {arXiv},
    primaryClass  = {cs.LG},
    url           = {https://arxiv.org/abs/1806.06920}
}

@misc{song2019vmpoonpolicymaximumposteriori,
    title         = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
    author        = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin Riedmiller and Matthew M. Botvinick},
    year          = {2019},
    eprint        = {1909.12238},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/1909.12238}
}

@InProceedings{pmlr-v235-li24z,
    title     = {Value-Evolutionary-Based Reinforcement Learning},
    author    = {Li, Pengyi and Hao, Jianye and Tang, Hongyao and Zheng, Yan and Barez, Fazl},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages     = {27875--27889},
    year      = {2024},
    editor    = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume    = {235},
    series    = {Proceedings of Machine Learning Research},
    month     = {21--27 Jul},
    publisher = {PMLR},
    pdf       = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24z/li24z.pdf},
    url       = {https://proceedings.mlr.press/v235/li24z.html}
}

@article{kaddour2026target,
    title   = {Target Policy Optimization},
    author  = {Kaddour, Jean},
    journal = {arXiv preprint arXiv:2604.06159},
    year    = {2026}
}

@misc{qu2026listwisepolicyoptimizationgroupbased,
    title   = {Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex},
    author  = {Yun Qu and Qi Wang and Yixiu Mao and Heming Zou and Yuhang Jiang and Yingyue Li and Wutong Xu and Lizhou Cai and Weijie Liu and Clive Bai and Kai Yang and Yangkun Chen and Saiyong Yang and Xiangyang Ji},
    year    = {2026},
    eprint  = {2605.06139},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2605.06139},
}

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.0.5

Jun 4, 2026

This version

0.0.3

Jun 4, 2026

0.0.2

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmpo-0.0.3.tar.gz (8.9 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dmpo-0.0.3-py3-none-any.whl (8.3 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file dmpo-0.0.3.tar.gz.

File metadata

Download URL: dmpo-0.0.3.tar.gz
Upload date: Jun 4, 2026
Size: 8.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`4be26446c7374f2b916bf9a10a91ae300d1f45ce97611303443cd42542826433`
MD5	`2652a7f150dd97e11e208cfd5fdab662`
BLAKE2b-256	`3ab25a5ac7aa5bb93a416bd991ee3e51f3e47025dfa159b48d42bb1bcd3acafa`

See more details on using hashes here.

File details

Details for the file dmpo-0.0.3-py3-none-any.whl.

File metadata

Download URL: dmpo-0.0.3-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for dmpo-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84919fa467e2a85708c7af07a7d59ad48d6fc1218463e6ebc6a29c8f66f59cc6`
MD5	`51be118187cc1a9b65f346cd4af79bc4`
BLAKE2b-256	`9a3be3ad0a87048eb48dc3898b853d652ec54da493db99c1c1bbc2795e0f70fe`

See more details on using hashes here.

dmpo 0.0.3

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DMPO (wip)

Citations

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes