SPRT-TANDEM for sequential density ratio estimation to simultaneously optimize both speed and accuracy of early-classification.
Project description
SPRT-TANDEM-PyTorch
This repository contains the official PyTorch implementation of SPRT-TANDEM (ICASSP2023, ICML2021, and ICLR2021). SPRT-TANDEM is a neuroscience-inspired sequential density ratio estimation (SDRE) algorithm that estimates log-likelihood ratios of two or more hypotheses for fast and accurate sequential data classification. For an intuitive understanding, please refer to the SPRT-TANDEM tutorial.
Quickstart
- To create a new SDRE dataset, run the Generate_sequential_Gaussian_as_LMDB.ipynb notebook.
- Edit the user editable block of config_definition.py. Specify path to the dataset file created in step 1. Other frequently used entries include SUBPROJECT_NAME_PREFIX (to tag your experiment) and EXP_PHASE (to specify whether you are trying, tuning, or running statistics. See Hyperparameter Tuning for details).
- Execute sprt_tandem_main.py.
Tested Environment
python 3.8.10
torch 2.0.0
notebook 6.5.3
optuna 3.1.0
Supported Network Architectures
We support the two major architectures for processing time series data: Long short-term memory (LSTM, [1]) and Transformer [2]. To avoid the likelihood ratio saturation problem and approach asymptotic optimality (for details, see Ebihara+, ICASSP2023), we developed two novel models based on these architectures: B2Bssqrt-TANDEM (based on LSTM) and TANDEMformer (based on Transformer).
LSTM (B2Bsqrt-TANDEM, ICASSP2023)
The LSTM with the back-to-back square root (B2Bsqrt) activation function can be used by setting the following variables:
- MODEL_BACKBONE: "LSTM"
- ACTIVATION_OUTPUT: "B2Bsqrt"
It's important to note that setting ACTIVATION_OUTPUT to "tanh" will result in a vanilla LSTM. The B2Bsqrt function was introduced in the ICASSP2023 paper as a way to precisely avoid the likelihood ratio saturation problem in SDRE.
\begin{align} f_{\mathrm{B2Bsqrt}}(x) := \mathrm{sign}(x)(\sqrt{\alpha+|x|}-\sqrt{\alpha}) \end{align}
Where $\alpha$ is a hyperparameter.
Transformer (TANDEMformer, ICASSP2023)
The Transformer is equipped with the Normalized Summation Pooling (NSP) layer, which is incorporated by default.
Let $X_i^{(t, t+w)}$ be subtokens sampled with a sliding window of size $w \in [N]$, and let $Z_i^{(t, t+w)}:={z_i^{(s)}}^{t+w}_{s=t}$ be the subtokens mixed with self-attention. Given the Markov order $N$, the \texttt{NSP} layer is defined as:
\begin{align} NSP(Z_i^{(t, t+w)}) := \sum_{s=t}^{t+w}\frac{z_i^{(s)}}{N+1}. \end{align}
To use it, set the following variable:
- MODEL_BACKBONE: "Transformer"
Supported Loss Functions for SDRE
SPRT-TANDEM uses both the loss for sequential likelihood ratio estimation (SDRE) and (multiplet-) cross-entropy loss (ICLR2021). The two functions, LSEL and LLLR, are supported loss function for SDRE. To choose the loss function, set the following variables:
- LLLR_VERSION: "LSEL" or "LLLR"
Additionally, modify the values of PARAM_LLR_LOSS and PARAM_MULTIPLET_LOSS to achieve the desired balance between likelihood estimation and cross-entropy loss.
Log-sum exponential loss (LSEL, ICML2021)
\begin{align} \hat{L}_{\mathrm{\text{LSEL}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \log \left( 1 + \sum_{l(\neq k)} e^{ -\hat{\lambda}_{k,l} ( X_i^{(1,t)}; \theta) }\right) \right] \end{align}
Loss for log-likelihood ratio estimation (LLLR, ICLR2021)
\begin{align} \hat{L}_{\mathrm{\text{LLLR}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \left| y - \sigma\left( \hat{\lambda}_{k,l} ( X_i^{(1,t)}; \theta) \right) \right| \right] \end{align}
Order N of Markov assumption
The Markov order $N$ is used to determine the length of the sliding window that extracts a subset from the entire feature vector of a time series. $N$ is a convenient hyperparameter that incorporates prior knowledge of the time series. An optimal $N$ can be found either based on the \textit{specific time scale} or through hyperparameter tuning. The specific time scale characterizes the data class, e.g., long temporal action such as UCF101 has a long specific time scale, while a spoofing attack such as SiW has a short specific time scale (because one frame can have sufficient information of the attack). Setting $N$ equal to the specific time scale usually works best. Alternatively, $N$ can be objectively chosen using a hyperparameter tuning algorithm such as Optuna, just like other hyperparameters. Because $N$ is only related to the temporal integrator after feature extraction, optimizing it is not computationally expensive.
The log-likelihood ratio is estimated from a subset of the feature vectors extracted using a sliding window of size $N$. This estimation is classification-based. Specifically, the temporal integrator is trained to output class logits, which are then used to update the log-likelihood ratio at each time step based on the TANDEM formula.
TANDEM formula (ICLR2021)
\begin{align} &\ \log \left( \frac{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=1)}{p(x^{(1)},x^{(2)}, ..., x^{(t)}| y=0)} \right)\nonumber \newline = &\sum_{s=N+1}^{t} \log \left( \frac{ p(y=1| x^{(s-N)}, ...,x^{(s)}) }{ p(y=0| x^{(s-N)}, ...,x^{(s)}) } \right) - \sum_{s=N+2}^{t} \log \left( \frac{ p(y=1| x^{(s-N)}, ...,x^{(s-1)}) }{ p(y=0| x^{(s-N)}, ...,x^{(s-1)}) } \right) \nonumber \newline & - \log\left( \frac{p(y=1)}{p(y=0)} \right) \end{align}
Experiment Phases
EXP_PHASE must be set as one of the followings:
- try: All the hyperparameters are fixed as defined in config_definition.py. Use it for debugging purposes.
- tuning: Enter hyperparameter tuning mode. Hyperparameters with corresponding search spaces will be overwritten with suggested parameters. See the Hyperparameter Tuning section for more details.
- stat: All the hyperparameters are fixed as defined in config_definition.py. Repeat training for the specified number of times with NUM_TRIALS to test reproducibility (e.g., plot error bars, run a statistical test). The subproject name will be suffixed with the EXP_PHASE to prevent contamination of results from different phases.
Hyperparameter Tuning
Our project supports Optuna [3] for hyperparameter tuning. To begin, edit the following variables in the config_definition.py:
- EXP_PHASE: set as "tuning" to enter hyperparameter tuning mode.
- NUM_TRIALS: set an integer that specifies the number of hyperparameter sets to experiment with.
- PRUNER_NAME (optional): select a pruner supported by Optuna, or set it to "None."
Also, set PRUNER_STARTUP_TRIALS, PRUNER_WARMUP_STEPS, and PRUNER_INTERVAL STEPS. For details, see the official Optuna docs.
Next, customize the hyperparameter space defined with variables that have prefix "SPACE_". For example, config_definition.py contains an entry like this:
"SPACE_ORDER_SPRT": {
"PARAM_SPACE": "int",
"LOW": 0,
"HIGH": 5, # 10
"STEP": 1,
"LOG": False,
}
The above entry specifies the search space of a hyperparameter "ORDER_SPRT." The key "PARAM_SPACE" must be one of the followings:
- float: use suggest_float to suggest a float of range [LOW, HIGH], separated by STEP. If LOG=True, a float is sampled from logspace. However, if LOG=True, set STEP=None.
- int: use suggest_int to suggest an integer of range [LOW, HIGH], separated by STEP. STEP should be divisor of the range; otherwise, HIGH will be automatically modified. If LOG=True, an int is sampled from logspace. However, if LOG=True, set STEP=None.
- categorical: use suggest_categorical to select one category from CATEGORY_SET. Note that if the parameter is continuous (e.g., 1, 2, 3, ..., or 1.0, 0.1, 0.001, ...), it is advisable to use float or int space because suggest_categorical treats each category independently.
For more informatin, please refer to the official Optuna docs.
To select specific values for a hyperparameter, use entries that start with "SPACE_". These values will be assigned to the hyperparameter whose name is defined after "SPACE_" (for example, in the above example, "ORDER_SPRT").
Command-line Arguments
Frequently-used variables can be overwritten by specifying command-line arguments.
options:
-h, --help show this help message and exit
-g GPU, --gpu set GPU, gpu number
-t NUM_TRIALS, --num_trials
set NUM_TRIALS, number of trials
-i NUM_ITER, --num_iter
set NUM_ITER, number of iterations
-e EXP_PHASE, --exp_phase EXP_PHASE
phase of an experiment, "try," "tuning," or "stat"
-m MODEL, --model MODEL
set model backbone, "LSTM", or "Transformer"
-o OPTIMIZE, --optimize OPTIMIZE
set optimization target: "MABS", "MacRec", "ausat_confmx", or "ALL"
-n NAME, --name NAME set the subproject name
--flip_memory_loading
set a boolean flag indicating whether to load onto memory
Logging
Under the logs folder, you will see a subfolder like this:
{SUBPROJECT_SUFFIX}_offset{DATA_SEPARATION}_optim{OPTIMIZATION_TARGET}_{EXP_PHASE}
inside of which the following four folders will be created.
- Optuna_databases: Optuna .db file is stored here.
- TensorBoard_events: TensorBard event files are saved here.
- checkpoints: trained parameters are saved as .py files when the best optimation target value is updated.
- stdout_logs: standard output strings are saved as .log files.
The plot below shows an example image saved in a TensorBoard event file. Note that you can avoid saving figures by setting IS_SAVE_FIGURE=False.
Note that "Class $a$ vs. $b$ at $y=a$" indicates that the plotted LLR shows $\log{p(X|y=a) / p(X|y=b)}$, when the ground truth label is $y=a$.
Citation
Please cite the orignal paper(s) if you use the whole or a part of our codes.
# ICASSP2023
@inproceedings{saturation_problem,
title = {Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for Early Classification},
author = {Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
year = {2023},
}
# ICML2021
@inproceedings{MSPRT-TANDEM,
title = {The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for Speed-Accuracy Optimization},
author = {Miyagawa, Taiki and Ebihara, Akinori F},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
pages = {7792--7804},
year = {2021},
url = {http://proceedings.mlr.press/v139/miyagawa21a.html}
}
# ICLR2021
@inproceedings{SPRT-TANDEM,
title={Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy},
author={Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=Rhsu5qD36cL}
}
References
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
- T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in KDD, 2019, p. 2623–2631.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, vol. 30, pp. 5998–6008.
Contacts
SPRT-TANDEM marks its 4th anniversary. What started as a small project has now become a huge undertaking that we never imagined. Due to its complexity, it is difficult for me to explain all the details in this README section. Please feel free to reach out to me anytime if you have any questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SPRT-TANDEM-0.1.3.tar.gz
.
File metadata
- Download URL: SPRT-TANDEM-0.1.3.tar.gz
- Upload date:
- Size: 61.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0506e22d71ae8edfb28181a5034fdfcc394dc71706b8ab7e1c64014ba2546db5 |
|
MD5 | 5eaa1b18b9b789e6a9016cd68ee120ef |
|
BLAKE2b-256 | 288b0b0a72ec49514702b019303d3cf2f650c35a38b6e6610e31db27502e8986 |
Provenance
File details
Details for the file SPRT_TANDEM-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: SPRT_TANDEM-0.1.3-py3-none-any.whl
- Upload date:
- Size: 67.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb1475c309587f249dd7c80367bb30ce77bb9154bcc1ca8e7962762b28fa2737 |
|
MD5 | a45bafb063b743248b08e29feb44a50f |
|
BLAKE2b-256 | 03107ffc44de3f46aa973944fbd6e66631fd3505ecdabfea2762ef0a1b42c5d3 |