Skip to main content

TSRBench: EVT-based noise injection toolkit for evaluating time series robustness

Project description

TSRBench

EVT-based noise injection toolkit for evaluating time series forecasting robustness

Python 3.8+ License: MIT

TSRBench generates realistic corrupted time series datasets by injecting level shifts and exponential spikes whose amplitudes are calibrated using Extreme Value Theory (EVT). This produces noise that respects the statistical properties of each series, providing a principled benchmark for evaluating how robust time series forecasting models are to data corruption.

From left to right: Original signal, Level Shift corruption, Exponential Spike corruption, Combined corruption


Key Features

  • EVT-calibrated amplitudes — noise magnitudes are derived from the SPOT algorithm (Streaming Peaks-over-Threshold), ensuring corruptions are statistically grounded in the tail behavior of each time series
  • Two corruption types — level shifts (sustained deviations) and exponential spikes (transient peaks), plus their combination
  • 5 severity levels — progressively increasing frequency, duration, and amplitude for systematic robustness evaluation
  • Column-wise processing — each column is corrupted independently with its own EVT thresholds
  • StandardScaler pipeline — data is normalized before injection and inverse-transformed after, preserving original scale
  • pip-installablepip install -e . for instant use as a library or CLI tool
  • Supports 4 SPOT variants — SPOT, biSPOT, dSPOT, bidSPOT for different data characteristics

Installation

git clone https://github.com/dongbeank/TSRBench.git
cd TSRBench
pip install -e .

Dependencies

  • numpy, pandas, scikit-learn, matplotlib
  • ads-evt (>=0.0.4) — Extreme Value Theory implementation

Quick Start

Generate benchmark corruptions (one command)

# Generate all 5 severity levels for a single dataset
python -m tsrbench \
    --data-path ETTh1.csv \
    --root-path ./dataset/ETT-small/ \
    --output-path ./dataset/ETT-small/ETTh1_noise/

This produces 15 files (5 levels x 3 types):

ETTh1_level_1_type_shift.csv
ETTh1_level_1_type_spike.csv
ETTh1_level_1_type_combined.csv
...
ETTh1_level_5_type_shift.csv
ETTh1_level_5_type_spike.csv
ETTh1_level_5_type_combined.csv

Python API (5 lines)

from tsrbench import CollectiveNoise
import numpy as np

cn = CollectiveNoise(seed=2025)
signal = np.random.randn(10000)  # your 1D time series
shift_noise = cn.inject_level_shift(signal, noise_level=3)
corrupted = signal + shift_noise

Reproduce paper benchmarks

bash scripts/generate_benchmark.sh

Generates corrupted data for all 6 datasets: ETTm1, ETTm2, ETTh1, ETTh2, Electricity, Weather.


How It Works

TSRBench injects two types of realistic corruptions into time series data. The key innovation is using Extreme Value Theory (EVT) to calibrate noise amplitudes, so corruptions are proportional to the statistical extremes of each individual series.

Pipeline Overview

Original CSV
    │
    ▼
StandardScaler (fit + transform)
    │
    ▼
For each column:
    ├── SPOT algorithm → EVT thresholds (upper/lower bounds)
    ├── Poisson process → anomaly occurrence times
    ├── Geometric distribution → anomaly durations
    │
    ├── Level Shift injection (sustained deviations)
    ├── Exp Spike injection (transient peaks)
    └── Combined (max of |shift|, |spike| at each point)
    │
    ▼
StandardScaler (inverse_transform)
    │
    ▼
Corrupted CSV (same format as input)

Step 1: EVT Amplitude Calibration (SPOT)

The SPOT (Streaming Peaks-over-Threshold) algorithm analyzes the tail distribution of each time series column to determine realistic anomaly thresholds. Given a risk parameter q (the amp parameter), SPOT finds threshold values that would be exceeded with probability q.

  • For unidirectional variants (SPOT, dSPOT): computes upper thresholds only
  • For bidirectional variants (biSPOT, bidSPOT): computes both upper and lower thresholds, enabling both positive and negative corruptions

The EVT thresholds are computed per-column and cached across severity levels for efficiency.

Step 2: Anomaly Occurrence (Poisson Process)

The number and location of anomalies are determined by a Poisson process:

N ~ Poisson(freq × T)

where freq controls the anomaly rate and T = 2L - 1 (with L being the series length). A steady-state mechanism filters start points to ensure anomalies are distributed across the second half of the time window.

Step 3: Anomaly Duration (Geometric Distribution)

Each anomaly's duration is drawn from a geometric distribution:

  • Level shift: d ~ Geometric(1/(dur-1)) + 1
  • Exponential spike: two durations d1, d2 ~ Geometric(2/dur) for the ascending and descending phases

Step 4: Noise Injection

Level Shift: A sustained deviation where the signal is shifted by the EVT threshold value for the anomaly's duration.

Exponential Spike: A transient peak shaped by an exponential curve:

         ╱╲
        ╱  ╲
       ╱    ╲
      ╱      ╲
─────╱        ╲─────
     ← d1 →← d2 →

The peak height equals the EVT threshold at the peak position, and the curve decays exponentially on both sides.

Combined: At each time step, the corruption with the larger absolute value is selected:

combined[t] = spike[t] if |spike[t]| > |shift[t]| else shift[t]

Step 5: Bidirectional Noise

For bidirectional SPOT variants (biSPOT, bidSPOT), each anomaly is randomly assigned as positive (upward) or negative (downward) with equal probability, using the appropriate upper or lower threshold.


Noise Parameters

The 5 default severity levels use the following parameters:

Level freq dur amp (SPOT q) Description
1 0.002 6 0.0016 Minimal — rare, short, conservative amplitude
2 0.004 9 0.0016 Mild — more frequent, slightly longer
3 0.004 12 0.0004 Moderate — longer duration, more extreme amplitude
4 0.008 12 0.0004 Strong — frequent, long, extreme
5 0.008 15 0.0001 Severe — most frequent, longest, most extreme

Note: Lower amp values in SPOT correspond to more extreme thresholds (lower exceedance probability = more extreme quantile).

Parameter Interpretation

  • freq: Controls lambda in the Poisson process. Higher = more anomaly events.
  • dur: Controls the geometric distribution parameter. Higher = longer anomalies.
  • amp: The SPOT risk parameter q. Lower = more extreme EVT threshold = larger noise amplitude.

Custom Dataset Guide

CSV Format

Your CSV must have:

  • First column: Timestamps or index (string/numeric, not used for injection)
  • Remaining columns: Numeric time series values
date,temperature,humidity,pressure
2020-01-01 00:00,21.3,65.2,1013.2
2020-01-01 01:00,20.8,66.1,1013.5
...

One-Command Generation

python -m tsrbench \
    --data-path my_data.csv \
    --root-path ./my_dataset/ \
    --output-path ./my_dataset/noisy/

Or use the generic script:

bash scripts/generate_noise.sh ./my_dataset/ my_data.csv ./my_dataset/noisy/

SPOT Parameter Tuning

Parameter Default When to Adjust
--spot-type bidspot Use bispot for short series (<1000 pts); use dspot/bidspot for non-stationary data
--spot-n-points 8 Increase (10-20) for noisy data; decrease (4-6) for clean data
--spot-depth 0.01 Increase (0.02-0.05) for highly non-stationary series
--spot-init-points 0.05 Increase if SPOT fails to converge; decrease for very long series
--spot-init-level 0.98 Lower (0.95) for more conservative thresholds
--zero-clip False Set True for non-negative data (e.g., electricity consumption)

Custom Noise Definitions

from tsrbench import CollectiveNoise

# Define your own severity levels
custom_shift = {
    1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
    2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
    3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}
custom_spike = {
    1: {'freq': 0.001, 'dur': 4, 'amp': 0.002},
    2: {'freq': 0.003, 'dur': 8, 'amp': 0.001},
    3: {'freq': 0.005, 'dur': 12, 'amp': 0.0005},
}

cn = CollectiveNoise(
    seed=2025,
    level_shift_args=custom_shift,
    exp_spike_args=custom_spike,
)

Visualization

TSRBench includes visualization utilities for inspecting corruptions:

from tsrbench import plot_corruption_comparison, plot_severity_levels, plot_noise_only

# Side-by-side: Original | Shift | Spike | Combined
plot_corruption_comparison(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", level=3,
    save_path="figures/comparison.png"
)

# All 5 severity levels for one noise type
plot_severity_levels(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", noise_type="combined",
    save_path="figures/severity.png"
)

# Isolated noise signal (corrupted - original)
plot_noise_only(
    "dataset/ETT-small/ETTh1.csv",
    "dataset/ETT-small/ETTh1_noise/",
    column="HUFL", level=3,
    save_path="figures/noise_only.png"
)

See examples/visualize_corruptions.py for a complete example.


API Reference

CollectiveNoise

from tsrbench import CollectiveNoise

cn = CollectiveNoise(
    seed=2025,                  # Random seed
    level_shift_args=None,      # Dict {level: {freq, dur, amp}} or None for defaults
    exp_spike_args=None,        # Dict {level: {freq, dur, amp}} or None for defaults
    spot_args=None,             # Dict {type, n_points, depth, init_points, init_level} or None for defaults
)

Methods

Method Description
inject_level_shift(X, noise_level) Inject level shift noise into 1D signal X at the given severity level (1-5). Returns noise array.
inject_exp_spike(X, noise_level) Inject exponential spike noise into 1D signal X. Returns noise array.
inject_noise(X, noise_level) Inject both shift and spike noise. Returns (shift_noise, spike_noise).
custom_inject_level_shift(X, freq, dur, amp) Inject level shift with custom parameters.
custom_inject_exp_spike(X, freq, dur, amp) Inject exponential spike with custom parameters.
make_noise_datasets(args) Generate all corrupted CSVs from an input dataset. See CLI args for the args object fields.

SPOT Algorithm Variants

Variant Class Handles Non-Stationarity Bidirectional Best For
SPOT SPOT No No Stationary, one-sided data
biSPOT biSPOT No Yes Stationary, symmetric data
dSPOT dSPOT Yes No Non-stationary, one-sided data
bidSPOT bidSPOT Yes Yes Non-stationary, symmetric data (default)
  • Non-stationarity handling (dSPOT, bidSPOT): Uses a sliding window (depth parameter) to adapt thresholds to local statistics
  • Bidirectional (biSPOT, bidSPOT): Computes both upper and lower thresholds, allowing both positive and negative corruptions

Data Validation

For large datasets (e.g., Electricity with 321 columns), some columns may produce extreme corruptions due to unusual distributions. TSRBench includes a validation module to detect and fix these:

from tsrbench.validate import DataValidationAndRegeneration

validator = DataValidationAndRegeneration(seed=2025)

# Check for problematic columns
problems = validator.check_problematic_columns(
    data_name='electricity',
    dataset_path='./dataset/electricity/',
    level=3,
    threshold_multiplier=3
)

# Regenerate noise for problematic columns
if problems:
    validator.extract_problematic_columns('electricity', './dataset/electricity/', problems)
    validator.regenerate_noise_data('electricity2.csv', './dataset/electricity/')

See tsrbench/validate.py for the full API.


Citation

If you find this repo useful for your research, please cite our paper:

@inproceedings{
kim2026local,
title={Local Geometry Attention for Time Series Forecasting under Realistic Corruptions},
author={Dongbin Kim and Youngjoo Park and Woojin Jeong and Jaewook Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=NCQPCxN7ds}
}

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsrbench-0.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsrbench-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file tsrbench-0.1.0.tar.gz.

File metadata

  • Download URL: tsrbench-0.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for tsrbench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ca0994da0a5c8cbeb5376b9af7e61fc5a2a7a9bb8c7897ca888d3f874f10940b
MD5 773a375b7bf187c329f5842e994a4877
BLAKE2b-256 8aa84beda59fa967cf3567a62e68e98671f3d9d718dfa1f1db0297f6a0ad1a37

See more details on using hashes here.

File details

Details for the file tsrbench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tsrbench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for tsrbench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 920592c9f9af41cd06c34c8b50585735d8c4be793328ef0192048655c0653598
MD5 1dc461db1a96bc5609d05cf006dc8545
BLAKE2b-256 a59406047fc2b5b4b3dcce99a6943f83551ee48c09dac87aaa9074ab12569b4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page