Skip to main content

Parsing Log Files With User Defined Templates

Project description

template-log-parser : Log Files into Tabular Data


template-log-parser is designed to streamline the log analysis process by pulling relevant information into DataFrame columns by way of user designed templates. parse and pandas perform the heavy lifting. Full credit to those well-designed projects.

This project offers some flexibility in how you can process your log files. You can utilize built-in template functions (PiHole, Omada Controller, Open Media Vault, or Synology DSM) or build your own workflow.

Getting Started


pip install template-log-parser

The foundational principle in this project is designing templates that fit repetitive log file formats.

Example log line:

my_line = '2024-06-13T15:09:35 server_15 login_authentication[12345] rejected login from user[user_1].'

Example template:

template = '{time} {server_name} {service_process}[{service_id}] {result} login from user[{username}].'

The words within the braces will eventually become column names in a DataFrame. You can capture as much or as little data from the line as you see fit. For instance, you could opt to omit {result} from the template and thus look to match only rejected logins for this example.

Note that templates will be looking for an exact match. Items like timestamps, time elapsed, and data usage should be captured as they are unique to that log line instance.

Template Dictionaries


After creating templates, they should be added to a dictionary with the following format:

ex_dict = {'search_string': [template_name, expected_values, 'event_type'], ...}

Using the example template:

my_template_dict = {'login from': [template, 6, 'login_attempt'], ...}
  • 'search_string' will be text that was NOT enclosed in braces {}. The parsing function will first check if this text is present within the log line before attempting to check the template against it.
  • template_name is the user defined template
  • expected_values is the integer number of items enclosed within braces {}.
  • 'event_type' is an arbitrary name assigned to this type of occurrence

Basic Usage Examples


Parse a single event:

from template_log_parser import parse_function

event_type, parsed_info = parse_function(my_line, my_template_dict)

print(event_type)
'login_attempt' 

print(parsed_info)
    {
    'time': '2024-06-13T15:09:35',
    'server_name': 'server_15',
    'service_process': 'login_authentication', 
    'service_id': '12345',
    'result': 'rejected',
    'username': 'user_1'
    }

Parse an entire log file and return a Pandas DataFrame:

from template_log_parser import log_pre_process

df = log_pre_process('log_file.log', my_template_dict)

print(df.columns)
Index(['event_data', 'event_type', 'parsed_info'])

This is just a tabular data form of many single parsed events.

  • event_data column holds the raw string data for each log line
  • event_type column value is determined based on the matching template
  • parsed_info column holds a dictionary of the parsed details

Note: Events that do not match a template will be returned as event_type ('Other') with a parsed_info dictionary: {'unparsed_text': (original log file line)}

Granular Log Processing


Essentially, each key from the parsed_info dictionary will become its own column populated with the associated value.

By default, this procedure returns a dictionary of Pandas DataFrames, formatted as {'event_type': df}.

from template_log_parser import process_log

my_df_dict = process_log('log_file.log', my_template_dict)

print(my_df_dict.keys())
dict_keys(['login_attempt', 'event_type_2', 'event_type_3', ...])

Alternatively as one large DataFrame:

from template_log_parser import process_log

my_df = process_log('log_file.log', my_template_dict, dict_format=False)

print(my_df.columns)
Index(['event_type', 'time', 'server_name', 'service_process', 'service_id', 'result', 'username'])
Some Notes

  • By default drop_columns=True instructs process_log() to discard 'event_data' and 'parsed_info' along with any other columns modified by column functions (SEE NEXT).
  • (OPTIONAL ARGUMENT) additional_column_functions allows user to apply functions to specific columns. These functions will create a new column, or multiple columns if so specified. The original column will be deleted if drop_columns=True. This argument takes a dictionary formatted as:
add_col_func = {column_to_run_function_on: [function, new_column_name_OR_list_of_new_colum_names]}
  • (OPTIONAL ARGUMENT) merge_dictionary allows user to concatenate DataFrames that are deemed to be related. Original DataFrames will be discarded, and the newly merged DF will be available within the dictionary by its new key. when dict_format=False, this argument has no effect. This argument takes a dictionary formatted as:
my_merge_dict = {'new_df_key': [df_1_key, df_2_key, ...], ...}
  • (OPTIONAL ARGUMENT) datetime_columns takes a list of columns that should be converted using pd.to_datetime()
  • (OPTIONAL ARGUMENT) localize_time_columns takes a list of columns whose timezone should be eliminated (column must also be included in the datetime_columns argument).

Built-Ins

This project includes log process functions for PiHole, Omada Controller, Open Media Vault, and Synology DSM. These are still being actively developed as not all event types have been accounted for. As a general philosophy, this project aims to find middle ground between useful categorization of log events and sheer number of templates. Submissions for improvement are welcome.

from template_log_parser.pihole import pihole_process_log

my_pihole_log_dict = pihole_process_log('pihole.log')
from template_log_parser.omada import omada_process_log

my_omada_log_dict = omada_process_log('omada.log')
from template_log_parser.omv import omv_process_log

my_omv_log_dict = omv_process_log('omv.log')
from template_log_parser.synology import synology_process_log

my_synology_log_dict = synology_process_log('synology.log')

As both PiHole and Open Media Vault can run on Debian, their templates are combined with a Debian template dictionary. This can be used separately if desired. However, at the moment it serves as only a cursory classification mechanism for some basic events since PiHole and Open Media Vault are the focus.

from template_log_parser.debian import debian_process_log

my_debian_log_dict = debian_process_log('debian.log')

DISCLAIMER

This project is in no way affiliated with the products mentioned (PiHole, Omada, Open Media Vault, Synology, or Debian). Any usage of their services is subject to their respective terms of use. This project does not undermine or expose their source code, but simply aims to ease the consumption of their log files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

template_log_parser-0.3.tar.gz (35.6 kB view details)

Uploaded Source

Built Distribution

template_log_parser-0.3-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file template_log_parser-0.3.tar.gz.

File metadata

  • Download URL: template_log_parser-0.3.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for template_log_parser-0.3.tar.gz
Algorithm Hash digest
SHA256 0de102a3ef61aa4968582fd16ec4e9632f438c1c773cc7b1314d960cf938bd8a
MD5 7e5ce3a5570577e622dc013576290511
BLAKE2b-256 321df0e6b21d67e13f4cf80b2355c19e706dc157493d7a9c8d9f4fa4f33c7e2b

See more details on using hashes here.

File details

Details for the file template_log_parser-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for template_log_parser-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 80889c258d7d3866d8e1ee0a208cd8a788875aca46af9b9c169b6b1e21f031c5
MD5 a885c34af7a04f86d35308e3efe88ca1
BLAKE2b-256 e963560eba77644b960668d198144ea6b8d9d94333a7853bd31342f485b3d62b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page