top-level package for netflow-stan
Project description
STAN: Synthetic Network Traffic Generation using Autoregressive Neural Models
- Documentation: https://docs.google.com/document/d/1haSCXQRti21X08B9otwk4nVeB_zRoaKYFGoGXFvP3kc/edit?usp=sharing
- Repository: https://github.com/ShengzheXu/stan
Overview
Implementation of our submitting paper Network Traffic Data Generation usingAutoregressive Neural Models.
STAN is an autoregressive data synthesizer that can generate synthetic time-series multi-variable data. A flexible architecture supports to generate multi-variable data with any combination of continuous & discrete attributes. Tool document is [here].
- Dependency capturing: STAN learns dependency in a time-window context rectangular, including both temporal dependency and attribute dependency.
- Network structure: STAN uses CNN to extract dependent context features, mixture density layers to predict continuous attributes, and softmax layers to predict discrete attributes.
- Application dataset: UGR'16: A New Dataset for the Evaluation of Cyclostationarity-Based Network IDSs [link]
STAN Structure
Installation
Download source code by
pip install stannetflow
Play with model
Data Format
STAN expects the input data to be a table given as either a numpy.ndarray
or a
pandas.DataFrame
object with two types of columns:
- Continuous Columns: Columns that contain numerical values and which can take any value.
- Discrete columns: Columns that only contain a finite number of possible values, wether these are string values or not.
Standard Tabular (Simulated) data with number-based sampler.
from stannetflow import STANSynthesizer
from stannetflow.artificial.datamaker import artificial_data_generator
def test_artificial():
adg = artificial_data_generator(weight_list=[0.9, 0.9])
df_naive = adg.sample(row_num=100)
X, y = adg.agg(agg=1)
stan = STANSynthesizer(dim_in=2, dim_window=1)
stan.fit(X, y)
samples = stan.sample(10)
print(samples)
Netflow data with continuous/discrete/categorical columns settings and condition-based sampler. (with delta time generation and target time length condition.) Discrete and categorical columns can be explicitly set to improve the modeling performance.
Instead of using .fit()
and .sample()
, for large dataset use .batch_fit()
and .time_series_sample()
. In addition, for the Netflow data, we need NetworkTrafficTransformer().rev_transfer()
to support translating the generated model output back to the real Netflow form.
from stannetflow import STANSynthesizer, STANCustomDataLoader, NetflowFormatTransformer
def test_ugr16(train_file, load_checkpoint=False):
train_loader = STANCustomDataLoader(train_file, 6, 16).get_loader()
ugr16_n_col, ugr16_n_agg, ugr16_arch_mode = 16, 5, 'B'
# index of the columns that are discrete (in one-hot groups), categorical (number of types)
# or any order if wanted
ugr16_discrete_columns = [[11,12], [13, 14, 15]]
ugr16_categorical_columns = {5:1670, 6:1670, 7:256, 8:256, 9:256, 10:256}
ugr16_execute_order = [0,1,13,11,5,6,7,8,9,10,3,2,4]
stan = STANSynthesizer(dim_in=ugr16_n_col, dim_window=ugr16_n_agg,
discrete_columns=ugr16_discrete_columns,
categorical_columns=ugr16_categorical_columns,
execute_order=ugr16_execute_order,
arch_mode=ugr16_arch_mode
)
if load_checkpoint is False:
stan.batch_fit(train_loader, epochs=2)
else:
stan.load_model('ep998') # checkpoint name
# validation
# stan.validate_loss(test_loader, loaded_ep='ep998')
ntt = NetflowFormatTransformer()
samples = stan.time_series_sample(864)
df_rev = ntt.rev_transfer(samples)
print(df_rev)
return df_rev
Example data making and model training cases
python test.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file stannetflow-0.0.1.tar.gz
.
File metadata
- Download URL: stannetflow-0.0.1.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d94dd958c1f0a1d9f5af897dfbde1b4a5ddbb0b8187ab3190f81f86d8efa5314 |
|
MD5 | 07d041690131aa8769b9d77872fda394 |
|
BLAKE2b-256 | 88f3f81eec6ef9d12cc8f8b9180cf2fce79b88064109ff5b32f678bdba3bd47d |
File details
Details for the file stannetflow-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: stannetflow-0.0.1-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1728cae64a452dd2f6567129156d90d302fddbcb432dadc68fccf332d0a8d7a |
|
MD5 | 0b01da346d205fac0bcda992a1953b4b |
|
BLAKE2b-256 | 52a047d304ec60edb072961ba0e152d44b82c649d001cd2472773bbe1087a2d5 |