A tool to parse syslog-like messages into word sequences
Project description
log2seq
log2seq is a python package to help parsing syslog-like messages into word sequences that is more suitable for further automated analysis. It is based on a customizable procedure of rules in order, using regular expressions.
Introduction
In log analysis, sometimes you may face following format of log messages:
Jan 1 12:34:56 host-device1 system[12345]: host 2001:0db8:1234::1 (interface:eth0) disconnected
This message cannot well splitted with str.split or re.split, because the usage of :
is not consistent.
log2seq processes this message in multiple steps (in default):
- Process message header (i.e., timestamp and source hostname)
- Split message body into word sequence by standard symbol strings (e.g., spaces and brackets)
- Fix words that should not be splitted later (e.g., ipv6 addr)
- Split words by inconsistent symbol strings (e.g.,
:
)
Following is a sample code:
mes = "Jan 1 12:34:56 host-device1 system[12345]: host 2001:0db8:1234::1 (interface:eth0) disconnected"
import log2seq
parser = log2seq.init_parser()
d = parser.process_line(mes)
print(d["words"])
It outputs following sequence.
['system', '12345', 'host', '2001:0db8:1234::1', 'interface', 'eth0', 'disconnected']
You can see :
in ipv6 addr is left, and other :
are ignored.
Code
The source code is available at https://github.com/cpflat/log2seq
License
3-Clause BSD license
Author
Satoru Kobayashi
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.