Skip to main content

A series-symbol (S2) dual-modality data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.

Project description


PyPI version License Python Downloads codestyle

Installation | Examples | Docs | Acknowledge | Citation

Based on the important perspective that time series are external manifestations of complex dynamical systems, we propose a bimodal generative mechanism for time series data that integrates both symbolic and series modalities. This mechanism enables the unrestricted generation of a vast number of complex systems represented as symbolic expressions $f(\cdot)$ and excitation time series $X$. By inputting the excitation into these complex systems, we obtain the corresponding response time series $Y=f(X)$. This method allows for the unrestricted creation of high-quality time series data for pre-training the time series foundation models.

🔥 News

[Jun. 2026] We extend the learnable white-noise-to-signal simulator family with KalmanFilterSimulator (state-space AR + Kalman filtering) and MarkovSwitchingSimulator (Markov-switching autoregression for regime-dependent dynamics).

[Feb. 2026] Since all stationary time series can be obtained by exciting a linear time-invariant system with white noise, we propose a learnable series generation method based on the ARIMA model. This method ensures the generated series is highly similar to the inputs in autocorrelation and power spectrum density.

[Sep. 2025] Our paper "Synthetic Series-Symbol Data Generation for Time Series Foundation Models" has been accepted by NeurIPS 2025, where SymTime pre-trained on the $S^2$ synthetic dataset achieved SOTA results in fine-tuning of forecasting, classification, imputation and anomaly detection tasks.

🚀 Installation

We have highly encapsulated the algorithm and uploaded the code to PyPI:

We used [NumPy](https://numpy.org/), [Pandas](https://pandas.pydata.org/), and [Scipy](https://scipy.org/) to build the data science environment, [Matplotlib](https://matplotlib.org/) for data visualization, and [Statsmodels](https://www.statsmodels.org/stable/index.html) for time series analysis and statistical processing.

✨ Usage

We provide a unified data generation interface [Generator](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/generators.py), two parameter modules [SeriesParams](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/params/series_params.py) and [SymbolParams](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/params/symbol_params.py), as well as auxiliary modules for the generation of excitation time series and complex system. We first specify the parameters or use the default parameters to create parameter objects, and then pass them into our Generator respectively. finally, we can start data generation through the run method after instantiation.

(73.5 add (x_0 mul (((9.38 mul cos((-0.092 add (-6.12 mul x_0)))) add (87.1 mul arctan((-0.965 add (0.973 mul rand))))) sub (8.89 mul exp(((4.49 mul log((-29.3 add (-86.2 mul x_0)))) add (-2.57 mul ((51.3 add (-55.6 mul x_0)))**2)))))))

The input and output dimensions of the multivariate time series and the length of the sampling sequence can be adjusted in the run method.

(-9.45 add ((((0.026 mul rand) sub (-62.7 mul cos((4.79 add (-6.69 mul x_1))))) add (-0.982 mul sqrt((4.2 add (-0.14 mul x_0))))) sub (0.683 mul x_1))) | (67.6 add ((-9.0 mul x_1) add (2.15 mul sqrt((0.867 add (-92.1 mul x_1))))))

Two symbolic expressions are connected by " | ".

🧮 Algorithm

The advantage of $S^2$ data lies in its diversity and unrestricted generation capacity. On the one hand, we can build a complex system with diversity based on binary trees (right); on the other hand, we combine 5 different methods to generate excitation series, as follows:

  • [MixedDistribution](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/excitation/mixed_distribution.py): Sampling from a mixture of distributions can show the random of time series;
  • [ARMA](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/excitation/autoregressive_moving_average.py): The sliding average and autoregressive processes can show obvious temporal dependencies;
  • [ForecastPFN](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/excitation/forecast_pfn.py) and [KernelSynth](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/excitation/kernel_synth.py): The decomposition and combination methods can reflect the dynamics of time series;
  • [IntrinsicModeFunction](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/excitation/intrinsic_mode_functions.py): The excitation generated by the modal combination method has obvious periodicity.

By generating diverse complex systems and combining multiple excitation generation methods, we can obtain high-quality, diverse time series data without any constraints. For detailed on the data generation process, please refer to our paper or documentation.

🎖️ Citation

If you find this $S^2$ data generation method helpful, please cite the following paper:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s2generator-0.0.13.tar.gz (151.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s2generator-0.0.13-py3-none-any.whl (183.9 kB view details)

Uploaded Python 3

File details

Details for the file s2generator-0.0.13.tar.gz.

File metadata

  • Download URL: s2generator-0.0.13.tar.gz
  • Upload date:
  • Size: 151.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for s2generator-0.0.13.tar.gz
Algorithm Hash digest
SHA256 8f56efdbdeeb182f1a9847db15fa92102bb4bd7b80350fd388c41f04b8fcaa9a
MD5 c051424f7ccc4c516cc332abf1c4e596
BLAKE2b-256 48923e8e9446622f2d50c536b7c0d0593a53b684470295989dc0619e78bbb1c6

See more details on using hashes here.

File details

Details for the file s2generator-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: s2generator-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 183.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for s2generator-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 ff4bc06873d6db315136856f760253c4745ea7c7345e153b3f31e9a2c96f7b3e
MD5 baa631e31d83b19b70e2412b2a294369
BLAKE2b-256 5b77c66d02e7ad535fa1074822ade1e6ec46588d2b25ac14fb8fd898cc2209e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page