A Python library to parse strings and extract information from structured/unstructured data
Project description
A Python library to parse strings and extract information from structured/unstructured data
What can I use Grok for?
parsing and matching patterns in a string(log, message etc.)
relieving from complex regular expressions.
extracting information from structured/unstructured data
Installation
$ pip install pygrok
or download, uncompress and install pygrok from here:
$ tar zxvf pygrok-xx.tar.gz
$ cd pygrok_dir
$ sudo python setup.py install
Getting Started
from pygrok import Grok
text = 'gary is male, 25 years old and weighs 68.5 kilograms'
pattern = '%{WORD:name} is %{WORD:gender}, %{NUMBER:age} years old and weighs %{NUMBER:weight} kilograms'
grok = Grok(pattern)
print grok.match(text)
# {'gender': 'male', 'age': '25', 'name': 'gary', 'weight': '68.5'}
Pretty Cool !
Numbers can be converted from string to int or float if you use %{pattern:name:type} syntax, such as %{NUMBER:age:int}
from pygrok import Grok
text = 'gary is male, 25 years old and weighs 68.5 kilograms'
pattern = '%{WORD:name} is %{WORD:gender}, %{NUMBER:age:int} years old and weighs %{NUMBER:weight:float} kilograms'
grok = Grok(pattern)
print grok.match(text, pattern)
# {'gender': 'male', 'age': 25, 'name': 'gary', 'weight': 68.5}
Now age is of type int and weight is of type float.
Awesome !
Some of the pattern you can use are listed here:
`WORD` means \b\w+\b in regular expression. `NUMBER` means (?:%{BASE10NUM}) `BASE10NUM` means (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) other patterns such as `IP`, `HOSTNAME`, `URIPATH`, `DATE`, `TIMESTAMP_ISO8601`, `COMMONAPACHELOG`..
See All patterns here
You can also have custom pattern, see these codes.
More details
Beause python re module does not support regular expression syntax atomic grouping(?>),so pygrok requires regex to be installed.
pygrok is inspired by Grok developed by Jordan Sissel. This is not a wrapper of Jordan Sissel’s Grok and totally implemented by me.
Grok is a simple software that allows you to easily parse strings, logs and other files. With grok, you can turn unstructured log and event data into structured data.Pygrok does the same thing.
I recommend you to have a look at logstash filter grok, it explains how Grok-like thing work.
pattern files come from logstash filter grok’s pattern files
Contribute
You are encouraged to fork, improve the code, then make a pull request.
Get Help
mail:garygaowork@gmail.com twitter:@garyelephant
Contributors
Thanks to all contributors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.