SPRTTANDEM for sequential density ratio estimation to simultaneously optimize both speed and accuracy of earlyclassification.
Project description
SPRTTANDEMPyTorch
This repository contains the official PyTorch implementation of SPRTTANDEM (ICASSP2023, ICML2021, and ICLR2021). SPRTTANDEM is a neuroscienceinspired sequential density ratio estimation (SDRE) algorithm that estimates loglikelihood ratios of two or more hypotheses for fast and accurate sequential data classification. For an intuitive understanding, please refer to the SPRTTANDEM tutorial.
Quickstart
 To create a new SDRE dataset, run the Generate_sequential_Gaussian_as_LMDB.ipynb notebook.
 Edit the user editable block of config_definition.py. Specify path to the dataset file created in step 1. Other frequently used entries include SUBPROJECT_NAME_PREFIX (to tag your experiment) and EXP_PHASE (to specify whether you are trying, tuning, or running statistics. See Hyperparameter Tuning for details).
 Execute sprt_tandem_main.py.
Tested Environment
python 3.8.10
torch 2.0.0
notebook 6.5.3
optuna 3.1.0
Supported Network Architectures
We support the two major architectures for processing time series data: Long shortterm memory (LSTM, [1]) and Transformer [2]. To avoid the likelihood ratio saturation problem and approach asymptotic optimality (for details, see Ebihara+, ICASSP2023), we developed two novel models based on these architectures: B2BssqrtTANDEM (based on LSTM) and TANDEMformer (based on Transformer).
LSTM (B2BsqrtTANDEM, ICASSP2023)
The LSTM with the backtoback square root (B2Bsqrt) activation function can be used by setting the following variables:
 MODEL_BACKBONE: "LSTM"
 ACTIVATION_OUTPUT: "B2Bsqrt"
It's important to note that setting ACTIVATION_OUTPUT to "tanh" will result in a vanilla LSTM. The B2Bsqrt function was introduced in the ICASSP2023 paper as a way to precisely avoid the likelihood ratio saturation problem in SDRE.
\begin{align} f_{\mathrm{B2Bsqrt}}(x) := \mathrm{sign}(x)(\sqrt{\alpha+x}\sqrt{\alpha}) \end{align}
Where $\alpha$ is a hyperparameter.
Transformer (TANDEMformer, ICASSP2023)
The Transformer is equipped with the Normalized Summation Pooling (NSP) layer, which is incorporated by default.
Let $X_i^{(t, t+w)}$ be subtokens sampled with a sliding window of size $w \in [N]$, and let $Z_i^{(t, t+w)}:={z_i^{(s)}}^{t+w}_{s=t}$ be the subtokens mixed with selfattention. Given the Markov order $N$, the \texttt{NSP} layer is defined as:
\begin{align} NSP(Z_i^{(t, t+w)}) := \sum_{s=t}^{t+w}\frac{z_i^{(s)}}{N+1}. \end{align}
To use it, set the following variable:
 MODEL_BACKBONE: "Transformer"
Supported Loss Functions for SDRE
SPRTTANDEM uses both the loss for sequential likelihood ratio estimation (SDRE) and (multiplet) crossentropy loss (ICLR2021). The two functions, LSEL and LLLR, are supported loss function for SDRE. To choose the loss function, set the following variables:
 LLLR_VERSION: "LSEL" or "LLLR"
Additionally, modify the values of PARAM_LLR_LOSS and PARAM_MULTIPLET_LOSS to achieve the desired balance between likelihood estimation and crossentropy loss.
Logsum exponential loss (LSEL, ICML2021)
\begin{align} \hat{L}_{\mathrm{\text{LSEL}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \log \left( 1 + \sum_{l(\neq k)} e^{ \hat{\lambda}_{k,l} ( X_i^{(1,t)}; \theta) }\right) \right] \end{align}
Loss for loglikelihood ratio estimation (LLLR, ICLR2021)
\begin{align} \hat{L}_{\mathrm{\text{LLLR}}} (\mathbb{\theta}; S) := \mathbb{E} \left[ \left y  \sigma\left( \hat{\lambda}_{k,l} ( X_i^{(1,t)}; \theta) \right) \right \right] \end{align}
Order N of Markov assumption
The Markov order $N$ is used to determine the length of the sliding window that extracts a subset from the entire feature vector of a time series. $N$ is a convenient hyperparameter that incorporates prior knowledge of the time series. An optimal $N$ can be found either based on the \textit{specific time scale} or through hyperparameter tuning. The specific time scale characterizes the data class, e.g., long temporal action such as UCF101 has a long specific time scale, while a spoofing attack such as SiW has a short specific time scale (because one frame can have sufficient information of the attack). Setting $N$ equal to the specific time scale usually works best. Alternatively, $N$ can be objectively chosen using a hyperparameter tuning algorithm such as Optuna, just like other hyperparameters. Because $N$ is only related to the temporal integrator after feature extraction, optimizing it is not computationally expensive.
The loglikelihood ratio is estimated from a subset of the feature vectors extracted using a sliding window of size $N$. This estimation is classificationbased. Specifically, the temporal integrator is trained to output class logits, which are then used to update the loglikelihood ratio at each time step based on the TANDEM formula.
TANDEM formula (ICLR2021)
\begin{align} &\ \log \left( \frac{p(x^{(1)},x^{(2)}, ..., x^{(t)} y=1)}{p(x^{(1)},x^{(2)}, ..., x^{(t)} y=0)} \right)\nonumber \newline = &\sum_{s=N+1}^{t} \log \left( \frac{ p(y=1 x^{(sN)}, ...,x^{(s)}) }{ p(y=0 x^{(sN)}, ...,x^{(s)}) } \right)  \sum_{s=N+2}^{t} \log \left( \frac{ p(y=1 x^{(sN)}, ...,x^{(s1)}) }{ p(y=0 x^{(sN)}, ...,x^{(s1)}) } \right) \nonumber \newline &  \log\left( \frac{p(y=1)}{p(y=0)} \right) \end{align}
Experiment Phases
EXP_PHASE must be set as one of the followings:
 try: All the hyperparameters are fixed as defined in config_definition.py. Use it for debugging purposes.
 tuning: Enter hyperparameter tuning mode. Hyperparameters with corresponding search spaces will be overwritten with suggested parameters. See the Hyperparameter Tuning section for more details.
 stat: All the hyperparameters are fixed as defined in config_definition.py. Repeat training for the specified number of times with NUM_TRIALS to test reproducibility (e.g., plot error bars, run a statistical test). The subproject name will be suffixed with the EXP_PHASE to prevent contamination of results from different phases.
Hyperparameter Tuning
Our project supports Optuna [3] for hyperparameter tuning. To begin, edit the following variables in the config_definition.py:
 EXP_PHASE: set as "tuning" to enter hyperparameter tuning mode.
 NUM_TRIALS: set an integer that specifies the number of hyperparameter sets to experiment with.
 PRUNER_NAME (optional): select a pruner supported by Optuna, or set it to "None."
Also, set PRUNER_STARTUP_TRIALS, PRUNER_WARMUP_STEPS, and PRUNER_INTERVAL STEPS. For details, see the official Optuna docs.
Next, customize the hyperparameter space defined with variables that have prefix "SPACE_". For example, config_definition.py contains an entry like this:
"SPACE_ORDER_SPRT": {
"PARAM_SPACE": "int",
"LOW": 0,
"HIGH": 5, # 10
"STEP": 1,
"LOG": False,
}
The above entry specifies the search space of a hyperparameter "ORDER_SPRT." The key "PARAM_SPACE" must be one of the followings:
 float: use suggest_float to suggest a float of range [LOW, HIGH], separated by STEP. If LOG=True, a float is sampled from logspace. However, if LOG=True, set STEP=None.
 int: use suggest_int to suggest an integer of range [LOW, HIGH], separated by STEP. STEP should be divisor of the range; otherwise, HIGH will be automatically modified. If LOG=True, an int is sampled from logspace. However, if LOG=True, set STEP=None.
 categorical: use suggest_categorical to select one category from CATEGORY_SET. Note that if the parameter is continuous (e.g., 1, 2, 3, ..., or 1.0, 0.1, 0.001, ...), it is advisable to use float or int space because suggest_categorical treats each category independently.
For more informatin, please refer to the official Optuna docs.
To select specific values for a hyperparameter, use entries that start with "SPACE_". These values will be assigned to the hyperparameter whose name is defined after "SPACE_" (for example, in the above example, "ORDER_SPRT").
Commandline Arguments
Frequentlyused variables can be overwritten by specifying commandline arguments.
options:
h, help show this help message and exit
g GPU, gpu set GPU, gpu number
t NUM_TRIALS, num_trials
set NUM_TRIALS, number of trials
i NUM_ITER, num_iter
set NUM_ITER, number of iterations
e EXP_PHASE, exp_phase EXP_PHASE
phase of an experiment, "try," "tuning," or "stat"
m MODEL, model MODEL
set model backbone, "LSTM", or "Transformer"
o OPTIMIZE, optimize OPTIMIZE
set optimization target: "MABS", "MacRec", "ausat_confmx", or "ALL"
n NAME, name NAME set the subproject name
flip_memory_loading
set a boolean flag indicating whether to load onto memory
Logging
Under the logs folder, you will see a subfolder like this:
{SUBPROJECT_SUFFIX}_offset{DATA_SEPARATION}_optim{OPTIMIZATION_TARGET}_{EXP_PHASE}
inside of which the following four folders will be created.
 Optuna_databases: Optuna .db file is stored here.
 TensorBoard_events: TensorBard event files are saved here.
 checkpoints: trained parameters are saved as .py files when the best optimation target value is updated.
 stdout_logs: standard output strings are saved as .log files.
The plot below shows an example image saved in a TensorBoard event file. Note that you can avoid saving figures by setting IS_SAVE_FIGURE=False.
Note that "Class $a$ vs. $b$ at $y=a$" indicates that the plotted LLR shows $\log{p(Xy=a) / p(Xy=b)}$, when the ground truth label is $y=a$.
Citation
Please cite the orignal paper(s) if you use the whole or a part of our codes.
# ICASSP2023
@inproceedings{saturation_problem,
title = {Toward Asymptotic Optimality: Sequential Unsupervised Regression of Density Ratio for Early Classification},
author = {Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
year = {2023},
}
# ICML2021
@inproceedings{MSPRTTANDEM,
title = {The Power of LogSumExp: Sequential Density Ratio Matrix Estimation for SpeedAccuracy Optimization},
author = {Miyagawa, Taiki and Ebihara, Akinori F},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
pages = {77927804},
year = {2021},
url = {http://proceedings.mlr.press/v139/miyagawa21a.html}
}
# ICLR2021
@inproceedings{SPRTTANDEM,
title={Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy},
author={Akinori F Ebihara and Taiki Miyagawa and Kazuyuki Sakurai and Hitoshi Imaoka},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=Rhsu5qD36cL}
}
References
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
 T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A nextgeneration hyperparameter optimization framework,” in KDD, 2019, p. 2623–2631.
 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017, vol. 30, pp. 5998–6008.
Contacts
SPRTTANDEM marks its 4th anniversary. What started as a small project has now become a huge undertaking that we never imagined. Due to its complexity, it is difficult for me to explain all the details in this README section. Please feel free to reach out to me anytime if you have any questions.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for SPRT_TANDEM0.1.11py3noneany.whl
Algorithm  Hash digest  

SHA256  547acbb90281d499638995b57b25cb701ded631f56d8e6129a932d0634bf3d76 

MD5  da03c946a7c2b98fe695e211da831969 

BLAKE2b256  07f67862e755b489406078e1573b7111ef9577f2069da3696c44d7c572c7d975 