Skip to main content

BeTiSe — Benchmark Time Series Generator for synthetic dataset creation

Project description

BeTiSe — Benchmark Time Series Generator

A modular Python library for generating synthetic time series datasets with rich, reproducible metadata.

License: MIT Python 3.8+ DOI

Overview

BeTiSe provides a comprehensive toolkit for generating synthetic time series data with configurable statistical properties. It is designed for researchers, data scientists, and ML practitioners who need reproducible, well-documented time series datasets for benchmarking, model training, or educational purposes.

Published Dataset

A large-scale benchmark dataset generated with this library has been published on Zenodo.

Access: https://zenodo.org/records/18513505

Installation

pip install betise

Or install from source:

git clone https://github.com/ismailguzel/betise.git
cd betise
pip install -e .

Quick Start

from betise import generate_dataframe, load_config

# In-memory — no file written
cfg = load_config(dataset={"base_series": "arma", "num_series": 5, "length_range": [300, 500]})
df, ctx = generate_dataframe(cfg)

# Save to parquet
from betise import run

cfg = load_config(dataset={
    "base_series":  "ar",
    "num_series":   10,
    "length_range": [200, 500],
    "output_dir":   "output",
    "output_name":  "ar_demo.parquet",
    "features": {
        "linear_trend": {"enabled": True, "direction": "upward"},
    },
})
run(cfg)

Load generated data

import pandas as pd

df = pd.read_parquet("output/ar_demo.parquet")
print(df[["series_id", "time", "data", "primary_category", "sub_category"]].head())

For full loading examples (numpy, sklearn, PyTorch) see examples/06_load_and_use.py.

Series Types

Category Base types
Stationary ar, ma, arma, white_noise
Stochastic random_walk, random_walk_drift, ari, ima, arima
Seasonal sarma, sarima
Volatility arch, garch, egarch, aparch

Feature overlays (trend, seasonality, anomaly, structural break) can be combined on top of any base type. See USAGE.md for the full feature reference.

Examples

examples/
├── 00_introduction.ipynb          # Interactive getting-started notebook
├── 01_quickstart.py               # In-memory generation, save to disk, feature combinations
├── 02_benchmark_dataset.py        # All base types × 3 length buckets (~495 series)
├── 03_feature_suite.py            # All base types × all feature types, phased (~4,200 series)
├── 04_pretraining_dataset.py      # Large-scale fixed-length dataset (default 75k, scalable)
├── 05_classification_dataset.py   # Balanced 7-class ML dataset (14,000 series)
├── 06_load_and_use.py             # Load parquet → numpy / sklearn / PyTorch
├── 07_feature_gallery.py          # PDF gallery: all 15 base types + all 12 features
├── 08_combinations_gallery.py     # PDF gallery: every base × feature combination (545 plots)
├── configs/
│   └── classification_config.json # Class / sub-type config for script 05
└── data/
    └── combinations.csv           # Combination definitions for script 08

Run any example:

python examples/01_quickstart.py
python examples/07_feature_gallery.py   # produces feature_gallery.pdf
python examples/08_combinations_gallery.py  # produces combinations_gallery.pdf

Project Structure

betise/
├── betise/
│   ├── __init__.py                 # Public API: run, generate_dataframe, load_config
│   ├── dataset_generation.py       # generate_dataframe() / run() pipeline
│   ├── config/
│   │   ├── __init__.py             # load_config() with deep merge
│   │   ├── dataset.json            # Default dataset settings
│   │   └── params.json             # Default process parameters
│   ├── core/
│   │   ├── generator.py            # TimeSeriesGenerator
│   │   └── metadata.py             # create_metadata_record()
│   └── utils/
│       └── helpers.py              # Internal helpers
├── examples/                       # Ready-to-run scenarios (see above)
├── tests/                          # Test suite
├── USAGE.md                        # Full feature & config reference
├── pyproject.toml
└── requirements.txt

Reproducibility

Default seed is 42. ARCH/GARCH models may show minor non-determinism (~1–2%) due to upstream library behaviour.

Dependencies

Package Min version Purpose
numpy 1.21 Array operations
pandas 1.3 DataFrame output
statsmodels 0.13 ARIMA/SARIMA generation
arch 5.0 ARCH/GARCH generation
pyarrow 7.0 Parquet I/O

Citation

If you use BeTiSe or the published dataset in your research, please cite:

@dataset{betise2026,
  author    = {Gür, Kerem and Yazıcı, Pınar Cemre and Erkaya, Pelin and Türkmen, Yağmur and Baytak, Berke and Güzel, İsmail and Karagöz, Pınar and Yozgatlıgil, Ceylan}},
  title     = {{BeTiSe: A Benchmark Time Series Dataset for Stationarity
                and Structural Analysis}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18513505},
  url       = {https://doi.org/10.5281/zenodo.18513505}
}

Funding

  • TÜBİTAK — Grant No. 124F095
  • METU Scientific Research Projects — Grant No. GAP-109-2023-11361

Contributors

Name Role
İsmail Güzel Library design, implementation & maintenance
Pınar Cemre Yazıcı Core development
Pelin Erkaya Core development
Yağmur Türkmen Core development

The broader research team (Kerem Gür, Berke Baytak, Pınar Karagöz, Ceylan Yozgatlıgil) contributed to the research project and are credited in the dataset publication.

Contact

For questions, bug reports, or collaboration inquiries:
İsmail Güzelismailgzel@gmail.com

Contributing

Issues and pull requests are welcome. See CONTRIBUTING.md.

License

MIT — see LICENSE.


Version: 0.2.0 | License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betise-0.2.0.tar.gz (33.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

betise-0.2.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file betise-0.2.0.tar.gz.

File metadata

  • Download URL: betise-0.2.0.tar.gz
  • Upload date:
  • Size: 33.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for betise-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8759ba5c0d0a57e203ffadced10d11eda6038de9ccaa887f1296e87e59257878
MD5 158ff6ce29c581caf04e4388f695fe2e
BLAKE2b-256 945432552f89a5ad535b082b9fa720b0444c9949f1bc362005eead64c2111e2f

See more details on using hashes here.

File details

Details for the file betise-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: betise-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for betise-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b6b59ceb1a8568bd659d967bfc92082d5ba9c4c41d4473c03e123bae4215bce
MD5 a528c573deb317adf1d7ef7e8b482fd7
BLAKE2b-256 38704878ae474a10dcb647c7782d04e59527a3ec0678615ef3924bebedcb2b38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page