Parsing Log Files With User Defined Templates
Project description
template-log-parser : Log Files into Tabular Data
template-log-parser
is designed to streamline the log analysis process by pulling relevant information into DataFrame columns by way of user designed templates. parse
and pandas
perform the heavy lifting. Full credit to those well-designed projects.
This project offers some flexibility in how you can process your log files. You can utilize built-in template functions (PiHole, Omada Controller, Open Media Vault, or Synology DSM) or build your own workflow.
Getting Started
pip install template-log-parser
The foundational principle in this project is designing templates that fit repetitive log file formats.
Example log line:
my_line = '2024-06-13T15:09:35 server_15 login_authentication[12345] rejected login from user[user_1].'
Example template:
template = '{time} {server_name} {service_process}[{service_id}] {result} login from user[{username}].'
The words within the braces will eventually become column names in a DataFrame. You can capture as much or as little data from the line as you see fit. For instance, you could opt to omit {result} from the template and thus look to match only rejected logins for this example.
Note that templates will be looking for an exact match. Items like timestamps, time elapsed, and data usage should be captured as they are unique to that log line instance.
Template Dictionaries
After creating templates, they should be added to a dictionary with the following format:
ex_dict = {'search_string': [template_name, expected_values, 'event_type'], ...}
Using the example template:
my_template_dict = {'login from': [template, 6, 'login_attempt'], ...}
- 'search_string' will be text that was NOT enclosed in braces {}. The parsing function will first check if this text is present within the log line before attempting to check the template against it.
- template_name is the user defined template
- expected_values is the integer number of items enclosed within braces {}.
- 'event_type' is an arbitrary name assigned to this type of occurrence
Basic Usage Examples
Parse a single event:
from template_log_parser import parse_function
event_type, parsed_info = parse_function(my_line, my_template_dict)
print(event_type)
'login_attempt'
print(parsed_info)
{
'time': '2024-06-13T15:09:35',
'server_name': 'server_15',
'service_process': 'login_authentication',
'service_id': '12345',
'result': 'rejected',
'username': 'user_1'
}
Parse an entire log file and return a Pandas DataFrame:
from template_log_parser import log_pre_process
df = log_pre_process('log_file.log', my_template_dict)
print(df.columns)
Index(['event_data', 'event_type', 'parsed_info'])
This is just a tabular data form of many single parsed events.
- event_data column holds the raw string data for each log line
- event_type column value is determined based on the matching template
- parsed_info column holds a dictionary of the parsed details
Note: Events that do not match a template will be returned as event_type ('Other') with a parsed_info dictionary: {'unparsed_text': (original log file line)}
Granular Log Processing
Essentially, each key from the parsed_info dictionary will become its own column populated with the associated value.
By default, this procedure returns a dictionary of Pandas DataFrames, formatted as {'event_type': df}.
from template_log_parser import process_log
my_df_dict = process_log('log_file.log', my_template_dict)
print(my_df_dict.keys())
dict_keys(['login_attempt', 'event_type_2', 'event_type_3', ...])
Alternatively as one large DataFrame:
from template_log_parser import process_log
my_df = process_log('log_file.log', my_template_dict, dict_format=False)
print(my_df.columns)
Index(['event_type', 'time', 'server_name', 'service_process', 'service_id', 'result', 'username'])
Some Notes
- By default
drop_columns=True
instructsprocess_log()
to discard 'event_data' and 'parsed_info' along with any other columns modified by column functions (SEE NEXT). - (OPTIONAL ARGUMENT)
additional_column_functions
allows user to apply functions to specific columns. These functions will create a new column, or multiple columns if so specified. The original column will be deleted ifdrop_columns=True
. This argument takes a dictionary formatted as:
add_col_func = {column_to_run_function_on: [function, new_column_name_OR_list_of_new_colum_names]}
- (OPTIONAL ARGUMENT)
merge_dictionary
allows user to concatenate DataFrames that are deemed to be related. Original DataFrames will be discarded, and the newly merged DF will be available within the dictionary by its new key. whendict_format=False
, this argument has no effect. This argument takes a dictionary formatted as:
my_merge_dict = {'new_df_key': [df_1_key, df_2_key, ...], ...}
- (OPTIONAL ARGUMENT)
datetime_columns
takes a list of columns that should be converted usingpd.to_datetime()
- (OPTIONAL ARGUMENT)
localize_time_columns
takes a list of columns whose timezone should be eliminated (column must also be included in thedatetime_columns
argument).
Built-Ins
This project includes log process functions for PiHole, Omada Controller, Open Media Vault, and Synology DSM. These are still being actively developed as not all event types have been accounted for. As a general philosophy, this project aims to find middle ground between useful categorization of log events and sheer number of templates. Submissions for improvement are welcome.
from template_log_parser.pihole import pihole_process_log
my_pihole_log_dict = pihole_process_log('pihole.log')
from template_log_parser.omada import omada_process_log
my_omada_log_dict = omada_process_log('omada.log')
from template_log_parser.omv import omv_process_log
my_omv_log_dict = omv_process_log('omv.log')
from template_log_parser.synology import synology_process_log
my_synology_log_dict = synology_process_log('synology.log')
As both PiHole and Open Media Vault can run on Debian, their templates are combined with a Debian template dictionary. This can be used separately if desired. However, at the moment it serves as only a cursory classification mechanism for some basic events since PiHole and Open Media Vault are the focus.
from template_log_parser.debian import debian_process_log
my_debian_log_dict = debian_process_log('debian.log')
DISCLAIMER
This project is in no way affiliated with the products mentioned (PiHole, Omada, Open Media Vault, Synology, or Debian). Any usage of their services is subject to their respective terms of use. This project does not undermine or expose their source code, but simply aims to ease the consumption of their log files.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file template_log_parser-0.3.tar.gz
.
File metadata
- Download URL: template_log_parser-0.3.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0de102a3ef61aa4968582fd16ec4e9632f438c1c773cc7b1314d960cf938bd8a |
|
MD5 | 7e5ce3a5570577e622dc013576290511 |
|
BLAKE2b-256 | 321df0e6b21d67e13f4cf80b2355c19e706dc157493d7a9c8d9f4fa4f33c7e2b |
File details
Details for the file template_log_parser-0.3-py3-none-any.whl
.
File metadata
- Download URL: template_log_parser-0.3-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80889c258d7d3866d8e1ee0a208cd8a788875aca46af9b9c169b6b1e21f031c5 |
|
MD5 | a885c34af7a04f86d35308e3efe88ca1 |
|
BLAKE2b-256 | e963560eba77644b960668d198144ea6b8d9d94333a7853bd31342f485b3d62b |