A toolkit for working with large time series network traffic datasets.
Project description
The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:
- API for downloading, configuring and loading CESNET-TimeSeries24, CESNET-AGG23 datasets. Each with various sources and aggregations.
- Example of configuration options:
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
- Transforming of data with built-in transformers or with custom transformers.
- Handling missing values built-in fillers or with custom fillers.
- Creation and import of benchmarks, for easy reproducibility of experiments.
- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
Datasets
| Name | CESNET-TimeSeries24 | CESNET-AGG23 |
|---|---|---|
| Published in | 2025 | 2023 |
| Collection duration | 40 weeks | 10 weeks |
| Collection period | 9.10.2023 - 14.7.2024 | 25.2.2023 - 3.5.2023 |
| Aggregation window | 1 day, 1 hour, 10 min | 1 min |
| Sources | CESNET3: Institutions, Institution subnets, IP addresses | CESNET2 |
| Number of time series | Institutions: 849, Institution subnets: 1644, IP addresses: 825372 | 1 |
| Cite | https://doi.org/10.1038/s41597-025-04603-x | https://doi.org/10.23919/CNSM59352.2023.10327823 |
| Zenodo URL | https://zenodo.org/records/13382427 | https://zenodo.org/records/8053021 |
| Related papers |
Installation
Install the package from pip with:
pip install cesnet-tszoo
or for editable install with:
pip install -e git+https://github.com/CESNET/cesnet-tszoo#egg=cesnet-tszoo
Examples
Initialize dataset to create train, validation, and test dataframes
Using TimeBasedCesnetDataset dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import TimeBasedConfig
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(
ts_ids=50, # number of randomly selected time series from dataset
train_time_period=range(0, 100),
val_time_period=range(100, 150),
test_time_period=range(150, 250),
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()
Time-based datasets are configured with TimeBasedConfig.
Using [DisjointTimeBasedCesnetDataset][cesnet_tszoo.datasets.disjoint_time_based_cesnet_dataset.DisjointTimeBasedCesnetDataset] dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import DisjointTimeBasedConfig
dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
train_time_period=range(0, 100),
val_time_period=range(100, 150),
test_time_period=range(150, 250),
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()
Disjoint-time-based datasets are configured with [DisjointTimeBasedConfig][cesnet_tszoo.configs.disjoint_time_based_config.DisjointTimeBasedConfig].
Using SeriesBasedCesnetDataset dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType
from cesnet_tszoo.configs import SeriesBasedConfig
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedConfig(
time_period=range(0, 250),
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()
Series-based datasets are configured with SeriesBasedConfig.
Using load_benchmark
from cesnet_tszoo.benchmarks import load_benchmark
benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()
Whether loaded dataset is series-based or time-based depends on the benchmark. What can be loaded corresponds to previous datasets.
Papers
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cesnet_tszoo-2.0.0.tar.gz.
File metadata
- Download URL: cesnet_tszoo-2.0.0.tar.gz
- Upload date:
- Size: 13.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
324c742af81e81a874faddf7e259f0479ea5f13901fe9b23d2a39ea8a5ecfdfc
|
|
| MD5 |
379a3ae105ea1edcc1d36f30d2488512
|
|
| BLAKE2b-256 |
8225425f2b3c8b8aece1f01a906633a8be67a534aa27a173557cdd89cccd171b
|
File details
Details for the file cesnet_tszoo-2.0.0-py3-none-any.whl.
File metadata
- Download URL: cesnet_tszoo-2.0.0-py3-none-any.whl
- Upload date:
- Size: 13.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
826d48059b551bcf1c675662e9fdf6f077e3b583a4af173f72e2283cbc041926
|
|
| MD5 |
99e1a6e34bcf6e7c584048db0f50cf48
|
|
| BLAKE2b-256 |
2cbb2450eb23910e684a39d199bab61137bf2e55d9f7faed3873a31ae8b7d8d6
|
File details
Details for the file cesnet_tszoo-2.0.0-1-py3-none-any.whl.
File metadata
- Download URL: cesnet_tszoo-2.0.0-1-py3-none-any.whl
- Upload date:
- Size: 13.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f35140d0430d2e0f7ddeb7ede4c98035805da81888a6a71d18c2d9cb21b8b3c7
|
|
| MD5 |
277a12793ce01888e0d6bb9e5701e18b
|
|
| BLAKE2b-256 |
511f636b0e5b658fd7e3534b5d340b329fa732e5a0775715ad629f5b17302569
|