Online Transition-Based Feature Generation for Anomaly Detection in Concurrent Data Streams.
Project description
Description
The process log/event log will be used as input for the feature generator. The feature generator will generate transition matrices.
How to use
Installation
pip install tfgen # normal install
pip install --upgrade tfgen # update tfgen
How to use
The observable event classes are first required, and they can be acquired using the method “get observable ec()”. The generated features will change if the event classes or the order of the event classes are changed, so it is preferable to save the observable event classes for later use. A dataframe/list/array of attributes should be the method’s first parameter. The datasets we’ll use for the following code examples are available in release v0.2.1 on Github.
from tfgen.observe_event_classes import get_observable_ec
data_for_ec = pd.read_csv('test_data_for_ec.csv')
ec = get_observable_ec(data_for_ec[['Flags', 'S/C']]) # Flags and S/C are the attributes
We can now start creating the TFGen object. A list of observable event classes that we obtained in the previous step is the first parameter. The window size is the second variable.
from tfgen import TFGen
tfgen = TFGen(ec, window_size=500)
We are currently loading the event log data to create features. Before sending the data to TFGen, make sure it is already in chronological order. EOT marking is required at the ending of each case, and it must be done under each attribute. Without EOT, the TET will continue to expand. Possibly like this:
Case_ID |
Flags |
S/C |
---|---|---|
… |
… |
… |
13 |
000.ACK.FIN. |
C |
13 |
000.ACK. |
S |
14 |
000.SYN. |
C |
13 |
000.ACK.RST. |
S |
13 |
EOT |
EOT |
14 |
000.ACK.SYN. |
S |
… |
… |
… |
data_for_feature = pd.read_csv('test_data_with_eot.csv')
We can load the offline dataset or load the dataset in an online streaming mode. The method for loading the dataset in offline mode is:
tfgen.load_from_dataframe(data_for_feature, case_id_col='Case_ID', attributes_cols=['Flags', 'S/C'])
output = tfgen.get_output_list() # this will return a list of data.
Note that the output is a list (or other iterable) of tuples where each tuple contains two variables (case_id, transition_table). The case_id comes from the last processed event, and it can be used for labelling the data for supervised learning or validation. “get_output_list()” can only be used in offline mode.
The following example uses the generator as input for online streaming mode.
# replace this generator with the actual generator
def replace_with_the_actual_generator():
while True:
for rows in data_for_feature.values:
case_id = rows[0]
event_attrs = rows[[2, 3]]
yield case_id, event_attrs # event_attr is an iterable with multiple attributes.
# Use the generator as an input for online streaming.
tfgen.load_from_generator(replace_with_the_actual_generator)
out = tfgen.get_output_generator() # this will return a generator as the output.
Only the input methods “load_from_dataframe()” or “load_from_generator()” can be used with the output method “get_output_generator()”.
The data can be entered one at a time into TFGen. Due to the fact that TFGen requires several events to initialise, the output is not guaranteed. To use this method, handle the InitialisingException exception.
from tfgen import InitialisingException
data_for_feature_array = data_for_feature.values
for sample in data_for_feature_array:
case_id = sample[0]
event_attrs = sample[[2, 3]]
# tfgen.load_next(<your data sample>). The sample is a tuple of (case_id, event_attrs)
# and event_attrs is an iterable with multiple attributes.
tfgen.load_next(case_id, event_attrs)
try:
print(tfgen.get_output_next())
except InitialisingException:
continue
The output method “get_output_next()” is compatible with all input methods.
Methods
Currently, the “Classic” and the “ClassicLargeSparse” methods for feature generation are available. The “Classic”method is employed by default. The “ClassicLargeSparse” method can be used to output Scipy sparse matrices for event logs that contain a larger number of event classes.
from tfgen import TFGen
tfgen = TFGen(ec, window_size=500, method=TFGen.ClassicLargeSparse)
Implementing New Methods
By deriving from the “BaseMethod” located in “tfgen/methods/base method.py,” one can extend the existing methods by creating new method classes. All classes must be placed under “tfgen/methods/” directory. The next event sample must be obtained using method “self.get_next_data(),” and the generated feature must be sent to the output using method “self.send_data()”. “self.finished” will become “True” if the input stream reaches the end.
from tfgen.methods.base_method import BaseMethod
class NewMethod(BaseMethod):
def __init__(self, ec_lookup_table, window_size, input_stream, output_stream):
super().__init__(ec_lookup_table, window_size, input_stream, output_stream)
# entry
def start_processing(self):
while True:
# event is a tuple of (case_id, event_attrs)
event = self.get_next_data()
# do something
self.send_data(processed_data)
if self.finished:
break
Then include the new method in the “TFGen” class found in “tfgen/tfgen.py”. Two locations are required to be modified.
class TFGen:
METHOD_CLASSIC = 101
METHOD_CLASSIC_LARGE_SPARSE = 102
METHOD_NEW_METHOD = 103 # The first location. New method class
def _select_method(self, method):
if self.method == TFGen.METHOD_CLASSIC:
return Classic(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
elif self.method == TFGen.METHOD_CLASSIC_LARGE_SPARSE:
return ClassicLargeSparse(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
# The second location. The instance to the new method class
elif self.method == TFGen.METHOD_NEW_METHOD:
return NewMethod(self.ec_lookup, self.window_size, self.input_stream, self.output_stream)
else:
raise Exception("Method not supported")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tfgen-0.6.1.tar.gz
.
File metadata
- Download URL: tfgen-0.6.1.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e269e82fec477fb4330f10a3c479ffdc89795475b06f250952bc707ef1e8486c |
|
MD5 | e50d9d80d0378e76a6113bf3603f750c |
|
BLAKE2b-256 | 8a4ea84b4318bb0f9d2b819dee0f3e7b2dc027e76ac55b12bc3035e51c4d6f9a |
File details
Details for the file tfgen-0.6.1-py3-none-any.whl
.
File metadata
- Download URL: tfgen-0.6.1-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 625b79467cac75380426591c6ffbcaddbad30011ebed98b32679320565945387 |
|
MD5 | b5a50af8de5b18ddc43067d27e1b93f3 |
|
BLAKE2b-256 | cb4683da2aeeff491bd2700475c7ec7f79360a680d557bb16f51859f3dced4b7 |