Skip to main content

Online Transition-Based Feature Generation for Anomaly Detection in Concurrent Data Streams.

Project description

PyPI version

Description

The process log/event log will be used as input for the feature generator. The feature generator will generate transition matrices.

How to use

Installation

pip install tfgen    # normal install
pip install --upgrade tfgen  # update tfgen

How to use

The observable event classes are first required, and they can be acquired using the method “get observable ec()”. The generated features will change if the event classes or the order of the event classes are changed, so it is preferable to save the observable event classes for later use. A dataframe/list/array of attributes should be the method’s first parameter. The datasets we’ll use for the following code examples are available in release v0.2.1 on Github.

from tfgen.observe_event_classes import get_observable_ec

data_for_ec = pd.read_csv('test_data_for_ec.csv')
ec = get_observable_ec(data_for_ec[['Flags', 'S/C']])  # Flags and S/C are the attributes

We can now start creating the TFGen object. A list of observable event classes that we obtained in the previous step is the first parameter. The window size is the second variable.

from tfgen import TFGen
tfgen = TFGen(ec, window_size=500)

We are currently loading the event log data to create features. Before sending the data to TFGen, make sure it is already in chronological order. EOT marking is required at the ending of each case, and it must be done under each attribute. Without EOT, the TET will continue to expand. Possibly like this:

Example of input data.

Case_ID

Flags

S/C

13

000.ACK.FIN.

C

13

000.ACK.

S

14

000.SYN.

C

13

000.ACK.RST.

S

13

EOT

EOT

14

000.ACK.SYN.

S

data_for_feature = pd.read_csv('test_data_with_eot.csv')

We can load the offline dataset or load the dataset in an online streaming mode. The method for loading the dataset in offline mode is:

tfgen.load_from_dataframe(data_for_feature, case_id_col='Case_ID', attributes_cols=['Flags', 'S/C'])
output = tfgen.get_output_list()  # this will return a list of data.

Note that the output is a list (or other iterable) of tuples where each tuple contains two variables (case_id, transition_table). The case_id comes from the last processed event, and it can be used for labelling the data for supervised learning or validation. “get_output_list()” can only be used in offline mode.

The following example uses the generator as input for online streaming mode.

# replace this generator with the actual generator
def replace_with_the_actual_generator():
    while True:
        for rows in data_for_feature.values:
            case_id = rows[0]
            event_attrs = rows[[2, 3]]

            yield case_id, event_attrs  # event_attr is an iterable with multiple attributes.

# Use the generator as an input for online streaming.
tfgen.load_from_generator(replace_with_the_actual_generator)
out = tfgen.get_output_generator()  # this will return a generator as the output.

Only the input methods “load_from_dataframe()” or “load_from_generator()” can be used with the output method “get_output_generator()”.

The data can be entered one at a time into TFGen. Due to the fact that TFGen requires several events to initialise, the output is not guaranteed. To use this method, handle the InitialisingException exception.

from tfgen import InitialisingException

data_for_feature_array = data_for_feature.values
for sample in data_for_feature_array:
    case_id = sample[0]
    event_attrs = sample[[2, 3]]

    # tfgen.load_next(<your data sample>). The sample is a tuple of (case_id, event_attrs)
    # and event_attrs is an iterable with multiple attributes.
    tfgen.load_next(case_id, event_attrs)
    try:
        print(tfgen.get_output_next())
    except InitialisingException:
        continue

The output method “get_output_next()” is compatible with all input methods.

Methods

Currently, the “Classic” and the “ClassicLargeSparse” methods for feature generation are available. The “Classic”method is employed by default. The “ClassicLargeSparse” method can be used to output Scipy sparse matrices for event logs that contain a larger number of event classes.

from tfgen import TFGen

tfgen = TFGen(ec, window_size=500, method=TFGen.ClassicLargeSparse)

Implementing New Methods

By deriving from the “BaseMethod” located in “tfgen/methods/base method.py,” one can extend the existing methods by creating new method classes. All classes must be placed under “tfgen/methods/” directory. The next event sample must be obtained using method “self.get_next_data(),” and the generated feature must be sent to the output using method “self.send_data()”. “self.finished” will become “True” if the input stream reaches the end.

from tfgen.methods.base_method import BaseMethod

class NewMethod(BaseMethod):
def __init__(self, ec_lookup_table, window_size, input_stream, output_stream):
    super().__init__(ec_lookup_table, window_size, input_stream, output_stream)

# entry
def start_processing(self):
    while True:
        # event is a tuple of (case_id, event_attrs)
        event = self.get_next_data()
        # do something
        self.send_data(processed_data)
        if self.finished:
            break

Then include the new method in the “TFGen” class found in “tfgen/tfgen.py”. Two locations are required to be modified.

class TFGen:
    METHOD_CLASSIC = 101
    METHOD_CLASSIC_LARGE_SPARSE = 102
    METHOD_NEW_METHOD = 103  # The first location. New method class

    def _select_method(self, method):
    if self.method == TFGen.METHOD_CLASSIC:
        return Classic(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    elif self.method == TFGen.METHOD_CLASSIC_LARGE_SPARSE:
        return ClassicLargeSparse(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    # The second location. The instance to the new method class
    elif self.method == TFGen.METHOD_NEW_METHOD:
        return NewMethod(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
    else:
        raise Exception("Method not supported")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfgen-0.6.1.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

tfgen-0.6.1-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file tfgen-0.6.1.tar.gz.

File metadata

  • Download URL: tfgen-0.6.1.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for tfgen-0.6.1.tar.gz
Algorithm Hash digest
SHA256 e269e82fec477fb4330f10a3c479ffdc89795475b06f250952bc707ef1e8486c
MD5 e50d9d80d0378e76a6113bf3603f750c
BLAKE2b-256 8a4ea84b4318bb0f9d2b819dee0f3e7b2dc027e76ac55b12bc3035e51c4d6f9a

See more details on using hashes here.

File details

Details for the file tfgen-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: tfgen-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for tfgen-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 625b79467cac75380426591c6ffbcaddbad30011ebed98b32679320565945387
MD5 b5a50af8de5b18ddc43067d27e1b93f3
BLAKE2b-256 cb4683da2aeeff491bd2700475c7ec7f79360a680d557bb16f51859f3dced4b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page