Skip to main content

Fast logfile parsing. This is a port of Ruby logstash / grok to Python

Project description

Build Status License

korg is the python port for the ruby logstash grok regular expression patterns.

Quickstart

Logstash comes with over a 100 built in patterns for structuring unstructured data. You should definitely take advantage of this when you work with log data like like from apache, linux, haproxy, aws, and so forth. But you should also use it when working with unstructured data and you simply provide custom pattern yourself.

In this demo I quickly show you how to use it on a simple webserver log sample:

Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
127.0.0.1 - - [18/Jan/2020 10:28:19] "GET /index.html HTTP/1.1" 404 -
127.0.0.1 - - [18/Jan/2020 10:28:27] "GET /secret.txt HTTP/1.1" 200 -
...

Usually I start by putting a sample log line into Grok Debugger and develop the pattern by using the logstash patterns (like what you would do using ruby logstash). Grok patterns are structured like this: %{NAME:IDENTIFIER}. NAME is the name of the logstash pattern you want to use, IDENTIFIER is the identifier you are giving to the matched text.

webserver log pattern

webserver log pattern

Once the pattern works (should try out other log lines, too) we can automate this using korg.

>>> from korg import LineGrokker, PatternRepo
>>>
>>> pr = PatternRepo()  # use the std. logstash grok patterns
>>> lg = LineGrokker('%{IPORHOST:clientip} - - %{SYSLOG5424SD} "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)', pr)
>>>
>>> print(lg.grok('''127.0.0.1 - - [18/Jan/2020 10:28:27] "GET /secret.txt HTTP/1.1" 200 -'''))
{'clientip': '127.0.0.1', 'verb': 'GET', 'request': '/secret.txt', 'httpversion': '1.1', 'rawrequest': None, 'response': '200', 'bytes': None}

Why a logstash / grok port to Python?

I like the logstash grok approach to logfile parsing. So I want to use this in Python.

One solution would be to use the C version of logstash / grok (https://github.com/jordansissel/grok) and to write a wrapper:

Basically grok assembles regular expressions. I already know that in Python file processing with regular expressions is blazingly fast so I choose to directly port it to Python.

The pattern files are updated from the logstash grok project: https://github.com/logstash-plugins/logstash-patterns-core

A big thank you belongs to the logstash community for an awesome job maintaining the regex pattern files!

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korg-1.0.0.tar.gz (21.9 kB view details)

Uploaded Source

File details

Details for the file korg-1.0.0.tar.gz.

File metadata

  • Download URL: korg-1.0.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.8

File hashes

Hashes for korg-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f7c196a005377e7dc26411346a4e7d3de9a1a342f166e4b5d691b1675d7f31f1
MD5 c1aa419322b868b62a06013ea5b9ff99
BLAKE2b-256 e5c70855116dece95f3b0d75ff6643b6c1041409e78b6a6028a312b8fc7b45bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page