Skip to main content

Online Transition-Based Feature Generation for Anomaly Detection in Concurrent Data Streams.

Project description

PyPI version

Description

The process log/event log will be used as input for the feature generator. The feature generator will generate transition matrices.

How to use

Installation

pip install tfgen    # normal install
pip install --upgrade tfgen  # update tfgen

How to use

First we need to get the observable event classes. Better save this for future use, as the change of the event classes or the change of the order of event classes will change the generated feature. The parameter will be an array or a list of attributes. Check release v0.2.1 for datasets we will use below.

from tfgen.observe_event_classes import get_observable_ec

data_for_ec = pd.read_csv('test_data_for_ec.csv')
ec = get_observable_ec(data_for_ec[['Flags', 'S/C']])  # Flags and S/C are the attributes

Now we can create the TFGen object. The first parameter is the list of all possible event classes. The second parameter is the window size.

from tfgen import TFGen
tfgen = TFGen(ec, window_size=500)

Now we load the data for feature generation. Make sure the data are already in chronological order. Each case needs to end with EOT marking, and it needs to be placed under each attribute. Without EOT, the TET will keep growing. Something like this:

Example of input data.

Case_ID

Flags

S/C

13

000.ACK.FIN.

C

13

000.ACK.

S

14

000.SYN.

C

13

000.ACK.RST.

S

13

EOT

EOT

14

000.ACK.SYN.

S

data_for_feature = pd.read_csv('test_data_with_eot.csv')

We can load the dataset in an offline mode, or we can load the dataset in an online streaming mode. The method for loading the dataset in offline mode is:

tfgen.load_from_dataframe(data_for_feature, case_id_col='Case_ID', attributes_cols=['Flags', 'S/C'])
output = tfgen.get_output_list()  # this will return a list of data.

Note that the output is a list (or other iterable) of tuples (case_id, transition_table), case_id comes from the last processed event and it can be used for labelling the data for supervised learning or validation. get_output_list() can only be used in offline mode.

Use the generator as an input for the online streaming.

# replace this generator with your own generator
def replace_with_the_actual_generator():
    while True:
        for rows in data_for_feature.values:
            case_id = rows[0]
            event_attrs = rows[[2, 3]]

            yield case_id, event_attrs  # event_attr is an iterable with multiple attributes.

# Use the generator as an input for the online streaming.
tfgen.load_from_generator(replace_with_the_actual_generator)
out = tfgen.get_output_generator()  # this will return a generator as the output.

get_output_generator() can only be used with load_from_dataframe() or load_from_generator().

We can feed the data into TFGen one by one. Note that the output is not guaranteed as TFGen needs several events to initialise. Handel the exception if you want to use this method.

import queue
data_for_feature_array = data_for_feature.values
for sample in data_for_feature_array:
    case_id = sample[0]
    event_attrs = sample[[2, 3]]

    # tfgen.load_next(<your data sample>). The sample is a tuple of (case_id, event_attrs)
    # and event_attrs is an iterable with multiple attributes.
    tfgen.load_next(case_id, event_attrs)
    try:
        print(tfgen.get_output_next())
    except InitialisingException:
        continue

get_output_next() is compatible with all input methods.

Varieties

Currently two classes are available. The Classic and the ClassicLargeSparse methods. By default, the Classic method is used. For event logs that has many event classes, use the ClassicLargeSparse method.

from tfgen import TFGen
tfgen = TFGen(ec, window_size=500, method=TFGen.ClassicLargeSparse)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfgen-0.5.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

tfgen-0.5.3-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file tfgen-0.5.3.tar.gz.

File metadata

  • Download URL: tfgen-0.5.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for tfgen-0.5.3.tar.gz
Algorithm Hash digest
SHA256 c3f45e7d4936f10617c3e2c3c8deec74f80c6c048ca7ee625be67a670ba7cc30
MD5 16e7ec8d61a234338d0124b2747cc7e9
BLAKE2b-256 cf2c960a29c506e70b9ad40b267a14b859a422b36d25a4d48fbc07429cf6396d

See more details on using hashes here.

File details

Details for the file tfgen-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: tfgen-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for tfgen-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4dbde963b2d0a15877f2ca478098abd2a52bba6944a3933ef3794f0f28cac8fb
MD5 fe4c1aa08167cdddd9831a65618d8114
BLAKE2b-256 da5511d863c6574822212c4d82848427324c3fbc225bd6bbc5a3f962fecb6aa4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page