TSRBench: EVT-based noise injection toolkit for evaluating time series robustness
Project description
TSRBench
EVT-based noise injection toolkit for evaluating time series forecasting robustness
TSRBench generates realistic corrupted time series datasets by injecting level shifts and exponential spikes whose amplitudes are calibrated using Extreme Value Theory (EVT). This produces noise that respects the statistical properties of each series, providing a principled benchmark for evaluating how robust time series forecasting models are to data corruption.
From left to right: Original signal, Level Shift corruption, Exponential Spike corruption, Combined corruption
Key Features
- EVT-calibrated amplitudes — noise magnitudes are derived from the SPOT algorithm (Streaming Peaks-over-Threshold), ensuring corruptions are statistically grounded in the tail behavior of each time series
- Two corruption types — level shifts (sustained deviations) and exponential spikes (transient peaks), plus their combination
- 5 severity levels — progressively increasing frequency, duration, and amplitude for systematic robustness evaluation
- Column-wise processing — each column is corrupted independently with its own EVT thresholds
- StandardScaler pipeline — data is normalized before injection and inverse-transformed after, preserving original scale
- pip-installable —
pip install -e .for instant use as a library or CLI tool - Supports 4 SPOT variants — SPOT, biSPOT, dSPOT, bidSPOT for different data characteristics
Installation
git clone https://github.com/dongbeank/TSRBench.git
cd TSRBench
pip install -e .
Dependencies
numpy,pandas,scikit-learn,matplotlibads-evt(>=0.0.4) — Extreme Value Theory implementation
Quick Start
Generate benchmark corruptions (one command)
# Generate all 5 severity levels for a single dataset
python -m tsrbench \
--data-path ETTh1.csv \
--root-path ./dataset/ETT-small/ \
--output-path ./dataset/ETT-small/ETTh1_noise/
This produces 15 files (5 levels x 3 types):
ETTh1_level_1_type_shift.csv
ETTh1_level_1_type_spike.csv
ETTh1_level_1_type_combined.csv
...
ETTh1_level_5_type_shift.csv
ETTh1_level_5_type_spike.csv
ETTh1_level_5_type_combined.csv
Python API (5 lines)
from tsrbench import CollectiveNoise
import numpy as np
cn = CollectiveNoise(seed=2025)
signal = np.random.randn(10000) # your 1D time series
shift_noise = cn.inject_level_shift(signal, noise_level=3)
corrupted = signal + shift_noise
Reproduce paper benchmarks
bash scripts/generate_benchmark.sh
Generates corrupted data for all 6 datasets: ETTm1, ETTm2, ETTh1, ETTh2, Electricity, Weather.
How It Works
TSRBench injects two types of realistic corruptions into time series data. The key innovation is using Extreme Value Theory (EVT) to calibrate noise amplitudes, so corruptions are proportional to the statistical extremes of each individual series.
Pipeline Overview
Original CSV
│
▼
StandardScaler (fit + transform)
│
▼
For each column:
├── SPOT algorithm → EVT thresholds (upper/lower bounds)
├── Poisson process → anomaly occurrence times
├── Geometric distribution → anomaly durations
│
├── Level Shift injection (sustained deviations)
├── Exp Spike injection (transient peaks)
└── Combined (max of |shift|, |spike| at each point)
│
▼
StandardScaler (inverse_transform)
│
▼
Corrupted CSV (same format as input)
Step 1: EVT Amplitude Calibration (SPOT)
The SPOT (Streaming Peaks-over-Threshold) algorithm analyzes the tail distribution of each time series column to determine realistic anomaly thresholds. Given a risk parameter q (the amp parameter), SPOT finds threshold values that would be exceeded with probability q.
- For unidirectional variants (SPOT, dSPOT): computes upper thresholds only
- For bidirectional variants (biSPOT, bidSPOT): computes both upper and lower thresholds, enabling both positive and negative corruptions
The EVT thresholds are computed per-column and cached across severity levels for efficiency.
Step 2: Anomaly Occurrence (Poisson Process)
The number and location of anomalies are determined by a Poisson process:
N ~ Poisson(freq × T)
where freq controls the anomaly rate and T = 2L - 1 (with L being the series length). A steady-state mechanism filters start points to ensure anomalies are distributed across the second half of the time window.
Step 3: Anomaly Duration (Geometric Distribution)
Each anomaly's duration is drawn from a geometric distribution:
- Level shift:
d ~ Geometric(1/(dur-1)) + 1 - Exponential spike: two durations
d1, d2 ~ Geometric(2/dur)for the ascending and descending phases
Step 4: Noise Injection
Level Shift: A sustained deviation where the signal is shifted by the EVT threshold value for the anomaly's duration.
Exponential Spike: A transient peak shaped by an exponential curve:
╱╲
╱ ╲
╱ ╲
╱ ╲
─────╱ ╲─────
← d1 →← d2 →
The peak height equals the EVT threshold at the peak position, and the curve decays exponentially on both sides.
Combined: At each time step, the corruption with the larger absolute value is selected:
combined[t] = spike[t] if |spike[t]| > |shift[t]| else shift[t]
Step 5: Bidirectional Noise
For bidirectional SPOT variants (biSPOT, bidSPOT), each anomaly is randomly assigned as positive (upward) or negative (downward) with equal probability, using the appropriate upper or lower threshold.
Noise Parameters
The 5 default severity levels use the following parameters:
| Level | freq |
dur |
amp (SPOT q) |
Description |
|---|---|---|---|---|
| 1 | 0.002 | 6 | 0.0016 | Minimal — rare, short, conservative amplitude |
| 2 | 0.004 | 9 | 0.0016 | Mild — more frequent, slightly longer |
| 3 | 0.004 | 12 | 0.0004 | Moderate — longer duration, more extreme amplitude |
| 4 | 0.008 | 12 | 0.0004 | Strong — frequent, long, extreme |
| 5 | 0.008 | 15 | 0.0001 | Severe — most frequent, longest, most extreme |
Note: Lower
ampvalues in SPOT correspond to more extreme thresholds (lower exceedance probability = more extreme quantile).
Parameter Interpretation
freq: Controlslambdain the Poisson process. Higher = more anomaly events.dur: Controls the geometric distribution parameter. Higher = longer anomalies.amp: The SPOT risk parameterq. Lower = more extreme EVT threshold = larger noise amplitude.
Custom Dataset Guide
CSV Format
Your CSV must have:
- First column: Timestamps or index (string/numeric, not used for injection)
- Remaining columns: Numeric time series values
date,temperature,humidity,pressure
2020-01-01 00:00,21.3,65.2,1013.2
2020-01-01 01:00,20.8,66.1,1013.5
...
One-Command Generation
python -m tsrbench \
--data-path my_data.csv \
--root-path ./my_dataset/ \
--output-path ./my_dataset/noisy/
Or use the generic script:
bash scripts/generate_noise.sh ./my_dataset/ my_data.csv ./my_dataset/noisy/
SPOT Parameter Tuning
| Parameter | Default | When to Adjust |
|---|---|---|
--spot-type |
bidspot |
Use bispot for short series (<1000 pts); use dspot/bidspot for non-stationary data |
--spot-n-points |
8 |
Increase (10-20) for noisy data; decrease (4-6) for clean data |
--spot-depth |
0.01 |
Increase (0.02-0.05) for highly non-stationary series |
--spot-init-points |
0.05 |
Increase if SPOT fails to converge; decrease for very long series |
--spot-init-level |
0.98 |
Lower (0.95) for more conservative thresholds |
--zero-clip |
False |
Set True for non-negative data (e.g., electricity consumption) |
Custom Noise Definitions
from tsrbench import CollectiveNoise
# Define your own severity levels
custom_shift = {
1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}
custom_spike = {
1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}
cn = CollectiveNoise(
seed=2025,
level_shift_args=custom_shift,
exp_spike_args=custom_spike,
)
Visualization
TSRBench includes visualization utilities for inspecting corruptions:
from tsrbench import plot_corruption_comparison, plot_severity_levels, plot_noise_only
# Side-by-side: Original | Shift | Spike | Combined
plot_corruption_comparison(
"dataset/ETT-small/ETTh1.csv",
"dataset/ETT-small/ETTh1_noise/",
column="HUFL", level=3,
save_path="figures/comparison.png"
)
# All 5 severity levels for one noise type
plot_severity_levels(
"dataset/ETT-small/ETTh1.csv",
"dataset/ETT-small/ETTh1_noise/",
column="HUFL", noise_type="combined",
save_path="figures/severity.png"
)
# Isolated noise signal (corrupted - original)
plot_noise_only(
"dataset/ETT-small/ETTh1.csv",
"dataset/ETT-small/ETTh1_noise/",
column="HUFL", level=3,
save_path="figures/noise_only.png"
)
See examples/visualize_corruptions.py for a complete example.
API Reference
CollectiveNoise
from tsrbench import CollectiveNoise
cn = CollectiveNoise(
seed=2025, # Random seed
level_shift_args=None, # Dict {level: {freq, dur, amp}} or None for defaults
exp_spike_args=None, # Dict {level: {freq, dur, amp}} or None for defaults
spot_args=None, # Dict {type, n_points, depth, init_points, init_level} or None for defaults
)
Methods
| Method | Description |
|---|---|
inject_level_shift(X, noise_level) |
Inject level shift noise into 1D signal X at the given severity level (1-5). Returns noise array. |
inject_exp_spike(X, noise_level) |
Inject exponential spike noise into 1D signal X. Returns noise array. |
inject_noise(X, noise_level) |
Inject both shift and spike noise. Returns (shift_noise, spike_noise). |
custom_inject_level_shift(X, freq, dur, amp) |
Inject level shift with custom parameters. |
custom_inject_exp_spike(X, freq, dur, amp) |
Inject exponential spike with custom parameters. |
make_noise_datasets(args) |
Generate all corrupted CSVs from an input dataset. See CLI args for the args object fields. |
SPOT Algorithm Variants
| Variant | Class | Handles Non-Stationarity | Bidirectional | Best For |
|---|---|---|---|---|
| SPOT | SPOT |
No | No | Stationary, one-sided data |
| biSPOT | biSPOT |
No | Yes | Stationary, symmetric data |
| dSPOT | dSPOT |
Yes | No | Non-stationary, one-sided data |
| bidSPOT | bidSPOT |
Yes | Yes | Non-stationary, symmetric data (default) |
- Non-stationarity handling (dSPOT, bidSPOT): Uses a sliding window (
depthparameter) to adapt thresholds to local statistics - Bidirectional (biSPOT, bidSPOT): Computes both upper and lower thresholds, allowing both positive and negative corruptions
Data Validation
For large datasets (e.g., Electricity with 321 columns), some columns may produce extreme corruptions due to unusual distributions. TSRBench includes a validation module to detect and fix these:
from tsrbench.validate import DataValidationAndRegeneration
validator = DataValidationAndRegeneration(seed=2025)
# Check for problematic columns
problems = validator.check_problematic_columns(
data_name='electricity',
dataset_path='./dataset/electricity/',
level=3,
threshold_multiplier=3
)
# Regenerate noise for problematic columns
if problems:
validator.extract_problematic_columns('electricity', './dataset/electricity/', problems)
validator.regenerate_noise_data('electricity2.csv', './dataset/electricity/')
See tsrbench/validate.py for the full API.
Citation
If you find this repo useful for your research, please cite our paper:
@inproceedings{
kim2026local,
title={Local Geometry Attention for Time Series Forecasting under Realistic Corruptions},
author={Dongbin Kim and Youngjoo Park and Woojin Jeong and Jaewook Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NCQPCxN7ds}
}
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tsrbench-0.1.0.tar.gz.
File metadata
- Download URL: tsrbench-0.1.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca0994da0a5c8cbeb5376b9af7e61fc5a2a7a9bb8c7897ca888d3f874f10940b
|
|
| MD5 |
773a375b7bf187c329f5842e994a4877
|
|
| BLAKE2b-256 |
8aa84beda59fa967cf3567a62e68e98671f3d9d718dfa1f1db0297f6a0ad1a37
|
File details
Details for the file tsrbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tsrbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
920592c9f9af41cd06c34c8b50585735d8c4be793328ef0192048655c0653598
|
|
| MD5 |
1dc461db1a96bc5609d05cf006dc8545
|
|
| BLAKE2b-256 |
a59406047fc2b5b4b3dcce99a6943f83551ee48c09dac87aaa9074ab12569b4b
|