Skip to main content

top-level package for netflow-stan

Project description

STAN: Synthetic Network Traffic Generation using Autoregressive Neural Models

Overview

Implementation of our submitting paper Network Traffic Data Generation usingAutoregressive Neural Models.

STAN is an autoregressive data synthesizer that can generate synthetic time-series multi-variable data. A flexible architecture supports to generate multi-variable data with any combination of continuous & discrete attributes. Tool document is [here].

  • Dependency capturing: STAN learns dependency in a time-window context rectangular, including both temporal dependency and attribute dependency.
  • Network structure: STAN uses CNN to extract dependent context features, mixture density layers to predict continuous attributes, and softmax layers to predict discrete attributes.
  • Application dataset: UGR'16: A New Dataset for the Evaluation of Cyclostationarity-Based Network IDSs [link]

STAN Structure

Installation

Download source code by

pip install stannetflow

Play with model

Data Format

STAN expects the input data to be a table given as either a numpy.ndarray or a pandas.DataFrame object with two types of columns:

  • Continuous Columns: Columns that contain numerical values and which can take any value.
  • Discrete columns: Columns that only contain a finite number of possible values, wether these are string values or not.

Standard Tabular (Simulated) data with number-based sampler.

from stannetflow import STANSynthesizer
from stannetflow.artificial.datamaker import artificial_data_generator

def test_artificial():
  adg = artificial_data_generator(weight_list=[0.9, 0.9])
  df_naive = adg.sample(row_num=100)
  X, y = adg.agg(agg=1)

  stan = STANSynthesizer(dim_in=2, dim_window=1)
  stan.fit(X, y)
  samples = stan.sample(10)
  print(samples)

Netflow data with continuous/discrete/categorical columns settings and condition-based sampler. (with delta time generation and target time length condition.) Discrete and categorical columns can be explicitly set to improve the modeling performance. Instead of using .fit() and .sample(), for large dataset use .batch_fit() and .time_series_sample(). In addition, for the Netflow data, we need NetworkTrafficTransformer().rev_transfer() to support translating the generated model output back to the real Netflow form.

from stannetflow import STANSynthesizer, STANCustomDataLoader, NetflowFormatTransformer

def test_ugr16(train_file, load_checkpoint=False):
  train_loader = STANCustomDataLoader(train_file, 6, 16).get_loader()
  ugr16_n_col, ugr16_n_agg, ugr16_arch_mode = 16, 5, 'B'
  # index of the columns that are discrete (in one-hot groups), categorical (number of types)
  # or any order if wanted
  ugr16_discrete_columns = [[11,12], [13, 14, 15]]
  ugr16_categorical_columns = {5:1670, 6:1670, 7:256, 8:256, 9:256, 10:256}
  ugr16_execute_order = [0,1,13,11,5,6,7,8,9,10,3,2,4]

  stan = STANSynthesizer(dim_in=ugr16_n_col, dim_window=ugr16_n_agg, 
          discrete_columns=ugr16_discrete_columns,
          categorical_columns=ugr16_categorical_columns,
          execute_order=ugr16_execute_order,
          arch_mode=ugr16_arch_mode
          )

  if load_checkpoint is False:
    stan.batch_fit(train_loader, epochs=2)
  else:
    stan.load_model('ep998') # checkpoint name
    # validation
    # stan.validate_loss(test_loader, loaded_ep='ep998')

  ntt = NetflowFormatTransformer()
  samples = stan.time_series_sample(864)
  df_rev = ntt.rev_transfer(samples)
  print(df_rev)
  return df_rev

Example data making and model training cases

python test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stannetflow-0.0.1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

stannetflow-0.0.1-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file stannetflow-0.0.1.tar.gz.

File metadata

  • Download URL: stannetflow-0.0.1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.9

File hashes

Hashes for stannetflow-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d94dd958c1f0a1d9f5af897dfbde1b4a5ddbb0b8187ab3190f81f86d8efa5314
MD5 07d041690131aa8769b9d77872fda394
BLAKE2b-256 88f3f81eec6ef9d12cc8f8b9180cf2fce79b88064109ff5b32f678bdba3bd47d

See more details on using hashes here.

File details

Details for the file stannetflow-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: stannetflow-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.9

File hashes

Hashes for stannetflow-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a1728cae64a452dd2f6567129156d90d302fddbcb432dadc68fccf332d0a8d7a
MD5 0b01da346d205fac0bcda992a1953b4b
BLAKE2b-256 52a047d304ec60edb072961ba0e152d44b82c649d001cd2472773bbe1087a2d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page