The Cross-Entropy Method for either rare-event sampling or optimization.
Project description
The Cross Entropy Method
The Cross Entropy Method (CE or CEM) is an approach for optimization or rare-event sampling in a given class of distributions {D_p} and a score function R(x).
- In its sampling version, it is given a reference p0 and aims to sample from the tail of the distribution x ~ (D_p0 | R(x)<q), where q is defined as either a numeric value q or a quantile alpha (where q=q_alpha(R)).
- In its optimization version, it aims to find argmin_x{R(x)}.
The exact implementation of the CEM depends on the distributions family {D_p} as defined in the problem.
This repo provides a general implementation as an abstract class, where a concrete use requires writing a simple, small inherited class.
The attached tutorial.ipynb
provides a more detailed background on the CEM and on this package, along with usage examples.
Installation: pip install cross-entropy-method
.
CEM for sampling (left): the mean of the sample distribution (green) shifts from the mean of the original distribution (blue) towards its 10%-tail (orange). CEM for optimization (right): the mean of the sample distribution aims to be minimized. (images from tutorial.ipynb ) |
Supporting non-stationary score functions
On top of the standard CEM, we also support a non-stationary score function R. This affects the reference distribution of scores and thus the quantile threshold q (if specified as a quantile). Thus, we have to repeatedly re-estimate q, using importance-sampling correction to compensate for the CEM distributional shift.
Application to risk-averse reinforcement learning
In our separate work (available in code and as a NeurIPS paper, with Yinlam Chow, Mohammad Ghavamzadeh and Shie Mannor), we demonstrate the use of the CEM for the more realistic problem of sampling high-risk environment-conditions in risk-averse reinforcement learning. There, D_p determines the distribution of the environment-conditions, p0 corresponds to the original distribution (or test distribution), and R(x; agent) is the return function of the agent given the conditions x. Note that since the agent evolves with the training, the score function is indeed non-stationary.
Cite us
This repo: non-stationary cross entropy method
@misc{cross_entropy_method,
title={Cross Entropy Method with Non-stationary Score Function},
author={Ido Greenberg},
howpublished={\url{https://pypi.org/project/cross-entropy-method/}},
year={2022}
}
Application to risk-averse reinforcement learning
@inproceedings{cesor,
title={Efficient Risk-Averse Reinforcement Learning},
author={Ido Greenberg and Yinlam Chow and Mohammad Ghavamzadeh and Shie Mannor},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cross-entropy-method-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c957a62aa5d93deea794e3005e458875354ff39e742d59dc7de15d4cbc2f4334 |
|
MD5 | 2d2df40d6067a7d7513ba39c02a342cb |
|
BLAKE2b-256 | 6b27142da0bc94f77d37bad63d51078699079d29fffe99e88f3bfe91118def55 |
Hashes for cross_entropy_method-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d49c6c4bbb413190691aea0627315078167efa605c9f976ad711bc1d159f05e9 |
|
MD5 | e7c63158db8daa633b66d529b66e4059 |
|
BLAKE2b-256 | eed1f91d1ff364a796640d21eaf7a082494a4af7949070edb0c4585fe8c7de8e |