The multi-armed bandit by Thompson Sampling, UCB-Upper confidence Bound, and randomized sampling.
Project description
Multi-armed bandit
- Thompson is Python package to evaluate the multi-armed bandit problem. In addition to thompson, Upper Confidence Bound (UCB) algorithm, and randomized results are also implemented.
- In probability theory, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration-exploitation tradeoff dilemma wikipedia.
- In the problem, each machine provides a random reward from a probability distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls. The crucial tradeoff the gambler faces at each trial is between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also faced in machine learning. In practice, multi-armed bandits have been used to model problems such as managing research projects in a large organization like a science foundation or a pharmaceutical company wikipedia.
Contents
Installation
- Install thompson from PyPI (recommended). thompson is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- Distributed under the MIT license.
Requirements
pip install matplotlib numpy pandas
Quick Start
pip install thompson
- Alternatively, install thompson from the GitHub source:
git clone https://github.com/erdogant/thompson.git
cd thompson
python setup.py install
Import thompson package
import thompson as mab
Load example data:
df = mab.example_data()
Compute multi-armed bandit using thompson
out = mab.thompson(df)
fig = mab.plot(out)
Compute multi-armed bandit using UCB-Upper confidence Bound
out = mab.UCB(df)
fig = mab.plot(out)
Compute multi-armed bandit using randomized data
out = mab.UCB_random(df)
fig = mab.plot(out)
df looks like this:
Ad 1 Ad 2 Ad 3 Ad 4 Ad 5 Ad 6 Ad 7 Ad 8 Ad 9 Ad 10
0 1 0 0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ...
9995 0 0 1 0 0 0 0 1 0 0
9996 0 0 0 0 0 0 0 0 0 0
9997 0 0 0 0 0 0 0 0 0 0
9998 1 0 0 0 0 0 0 1 0 0
9999 0 1 0 0 0 0 0 0 0 0
[10000 rows x 10 columns]
Citation
Please cite thompson in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019thompson,
title={thompson},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/thompson}},
}
References
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- All kinds of contributions are welcome!
© Copyright
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
thompson-0.1.2.tar.gz
(49.5 kB
view hashes)
Built Distribution
thompson-0.1.2-py3-none-any.whl
(31.6 kB
view hashes)