A tool to parse syslog-like messages into word sequences
Project description
log2seq is a python package to help parsing syslog-like messages into word sequences that is more suitable for further automated analysis. It is based on a customizable procedure of rules in order, using regular expressions.
Introduction
In log analysis, sometimes you may face following format of log messages:
Jan 1 12:34:56 host-device1 system[12345]: host 2001:0db8:1234::1 (interface:eth0) disconnected
This message cannot well splitted with str.split or re.split, because the usage of : is not consistent.
log2seq processes this message in multiple steps (in default):
Process message header (i.e., timestamp and source hostname)
Split message body into word sequence by standard symbol strings (e.g., spaces and brackets)
Fix words that should not be splitted later (e.g., ipv6 addr)
Split words by inconsistent symbol strings (e.g., :)
Following is a sample code:
mes = "Jan 1 12:34:56 host-device1 system[12345]: host 2001:0db8:1234::1 (interface:eth0) disconnected" import log2seq rules = log2seq.load_from_script("./default_parser.py") parser = log2seq.init_parser("rules") d = parser.process_line(mes) print(d["words"])
It outputs following sequence.
['system', '12345', 'host', '2001:0db8:1234::1', 'interface', 'eth0', 'disconnected']
You can see : in ipv6 addr is left, and other : are ignored.
To customize parsing rules, see log2seq/default_script.py .
log2seq also allows rules written in configparser (see log2seq/data/sample.conf).
Code
The source code is available at https://github.com/cpflat/log2seq
License
3-Clause BSD license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.