Maximum likelihood estimation and model selection of EMMs

# exp_mixture_model

Maximum likelihood estimation and model selection for the exponential mixture model (i.e., mixture of exponential distributions)

When you use this code, please cite the following paper:

Makoto Okada, Kenji Yamanishi, Naoki Masuda. Long-tailed distributions of inter-event times as mixtures of exponential distributions. arXiv:19xx.xxxxx

## Installation

`exp_mixture_model` is hosted on PyPI. So, one can install it by running

```pip install exp_mixture_model
```

If you want to install from source, clone the exp_mixture_model git repository by running

```git clone https://github.com/naokimas/exp_mixture_model.git
```

Then, navigate to the top-level of the cloned directory and run

```python setup.py install
```

You can test our code by running

```python setup.py test
```

If you use Anaconda, you may install required packages by running

```conda install --file requirements.txt
```

## Quick Use

Fit an EMM to data stored in a file (e.g. sample.dat) by running

```python emmfit.py -f sample.dat -k 10
```
• '10' is the initial number of components.
• 'sample.dat' is provided as part of this package. It is synthetic data that we generated by running
```from emm import generate_emm
x = generate_emm(1000, 10)
```

To select the best model among EMMs with different numbers of components, don't specify 'k' and instead specify the model selection criterion using '-c' as follows.

```python emmfit.py -f sample.dat -c DNML
```
• One can specify either 'marginal_log_likelihood', 'joint_log_likelihood', 'AIC', 'BIC', 'AIC_LVC', 'BIC_LVC', 'NML_LVC', or 'DNML' as the argument of '-c'.
• Default is 'DNML'.

Check details of the usage by running

```python emmfit.py --help
```

## Usage

Fit an EMM to data by running

```from exp_mixture_model import EMM
x = [1.5, 2.3, ...]  # data can be either a list or numpy array
#
# Alternatively, one can load data from a file using numpy as follows.
#
# import numpy as np
#
model = EMM()
pi, mu = model.fit(x)  # estimate the parameters
model.print_result()  # print 'k_final' (i.e., the estimated effective number of components) and the estimated parameters
model.plot_survival_probability()  # plot the survival probability (= cumulative complementary distribution function) for the estimated EMM and the given data 'x'.
```

Select the number of components based on a model selection criterion by running

```from exp_mixture_model import EMMs
x = [1.5, 2.3, ...]
emms = EMMs()
emms.fit(x)  # fit EMMs with different values of 'k', i.e., the number of components. The default uses 13 values of 'k'. This process is computationally heavy.
best_model = emms.select('DNML')  # select the best number of components under the 'DNML' criterion. One can specify either 'marginal_log_likelihood', 'joint_log_likelihood', 'AIC', 'BIC', 'AIC_LVC', 'BIC_LVC', 'NML_LVC', or 'DNML' as the argument of 'emms.select'.
emms.print_result_table()  # print the values of 'k_final', likelihoods, and 'DNML' for each 'k' value
best_model.print_result() # print 'k_final' and the estimated parameter values of the selected EMM
```

## Project details

This version 1.0.0