Skip to main content

A toolkit for working with large time series network traffic datasets.

Project description

The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:

  • API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
  • Example of configuration options:
    • Data can be split into train/val/test sets. Split can be done by time series or by time periods.
    • Transforming of data with built-in transformers or with custom transformers.
    • Handling missing values built-in fillers or with custom fillers.
  • Creation and import of benchmarks, for easy reproducibility of experiments.
  • Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.

Datasets

Name CESNET-TimeSeries24 CESNET-AGG23
Published in 2025 2023
Collection duration 40 weeks 10 weeks
Collection period 9.10.2023 - 14.7.2024 25.2.2023 - 3.5.2023
Aggregation window 1 day, 1 hour, 10 min 1 min
Sources CESNET3: Institutions, Institution subnets, IP addresses CESNET2
Number of time series Institutions: 849, Institution subnets: 1644, IP addresses: 825372 1
Cite https://doi.org/10.1038/s41597-025-04603-x https://doi.org/10.23919/CNSM59352.2023.10327823
Zenodo URL https://zenodo.org/records/13382427 https://zenodo.org/records/8053021
Related papers

Installation

Install the package from pip with:

pip install cesnet-tszoo

or for editable install with:

pip install -e git+https://github.com/CESNET/cesnet-tszoo#egg=cesnet-tszoo

Examples

Initialize dataset to create train, validation, and test dataframes

Using TimeBasedCesnetDataset dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import TimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(
    ts_ids=50, # number of randomly selected time series from dataset
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Time-based datasets are configured with TimeBasedConfig.

Using [DisjointTimeBasedCesnetDataset][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset] dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import DisjointTimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Disjoint-time-based datasets are configured with [DisjointTimeBasedConfig][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig].

Using SeriesBasedCesnetDataset dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import SeriesBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedConfig(
    time_period=range(0, 250), 
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Series-based datasets are configured with SeriesBasedConfig.

Using load_benchmark

from cesnet_tszoo.benchmarks import load_benchmark

benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.

Papers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cesnet_tszoo-2.0.0.tar.gz (13.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cesnet_tszoo-2.0.0-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

cesnet_tszoo-2.0.0-1-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

File details

Details for the file cesnet_tszoo-2.0.0.tar.gz.

File metadata

  • Download URL: cesnet_tszoo-2.0.0.tar.gz
  • Upload date:
  • Size: 13.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cesnet_tszoo-2.0.0.tar.gz
Algorithm Hash digest
SHA256 324c742af81e81a874faddf7e259f0479ea5f13901fe9b23d2a39ea8a5ecfdfc
MD5 379a3ae105ea1edcc1d36f30d2488512
BLAKE2b-256 8225425f2b3c8b8aece1f01a906633a8be67a534aa27a173557cdd89cccd171b

See more details on using hashes here.

File details

Details for the file cesnet_tszoo-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: cesnet_tszoo-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cesnet_tszoo-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 826d48059b551bcf1c675662e9fdf6f077e3b583a4af173f72e2283cbc041926
MD5 99e1a6e34bcf6e7c584048db0f50cf48
BLAKE2b-256 2cbb2450eb23910e684a39d199bab61137bf2e55d9f7faed3873a31ae8b7d8d6

See more details on using hashes here.

File details

Details for the file cesnet_tszoo-2.0.0-1-py3-none-any.whl.

File metadata

  • Download URL: cesnet_tszoo-2.0.0-1-py3-none-any.whl
  • Upload date:
  • Size: 13.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for cesnet_tszoo-2.0.0-1-py3-none-any.whl
Algorithm Hash digest
SHA256 f35140d0430d2e0f7ddeb7ede4c98035805da81888a6a71d18c2d9cb21b8b3c7
MD5 277a12793ce01888e0d6bb9e5701e18b
BLAKE2b-256 511f636b0e5b658fd7e3534b5d340b329fa732e5a0775715ad629f5b17302569

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page