Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Versatile parallel log parser

Project description

--------------------------------------------

1. This LICENSE AGREEMENT is between the Python Software Foundation
("PSF"), and the Individual or Organization ("Licensee") accessing and
otherwise using this software ("Python") in source or binary form and
its associated documentation.

2. Subject to the terms and conditions of this License Agreement, PSF
hereby grants Licensee a nonexclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display publicly,
prepare derivative works, distribute, and otherwise use Python
alone or in any derivative version, provided, however, that PSF's
License Agreement and PSF's notice of copyright, i.e., "Copyright (c)
2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights
Reserved" are retained in Python alone or in any derivative version
prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python or any part thereof, and wants to make
the derivative work available to others as provided herein, then
Licensee hereby agrees to include in any such work a brief summary of
the changes made to Python.

4. PSF is making Python available to Licensee on an "AS IS"
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.

5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

7. Nothing in this License Agreement shall be deemed to create any
relationship of agency, partnership, or joint venture between PSF and
Licensee. This License Agreement does not grant permission to use PSF
trademarks or trade name in a trademark sense to endorse or promote
products or services of Licensee, or any third party.

8. By copying, installing or otherwise using Python, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.

BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
-------------------------------------------

BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1

1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
Individual or Organization ("Licensee") accessing and otherwise using
this software in source or binary form and its associated
documentation ("the Software").

2. Subject to the terms and conditions of this BeOpen Python License
Agreement, BeOpen hereby grants Licensee a non-exclusive,
royalty-free, world-wide license to reproduce, analyze, test, perform
and/or display publicly, prepare derivative works, distribute, and
otherwise use the Software alone or in any derivative version,
provided, however, that the BeOpen Python License is retained in the
Software, alone or in any derivative version prepared by Licensee.

3. BeOpen is making the Software available to Licensee on an "AS IS"
basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
INFRINGE ANY THIRD PARTY RIGHTS.

4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

5. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

6. This License Agreement shall be governed by and interpreted in all
respects by the law of the State of California, excluding conflict of
law provisions. Nothing in this License Agreement shall be deemed to
create any relationship of agency, partnership, or joint venture
between BeOpen and Licensee. This License Agreement does not grant
permission to use BeOpen trademarks or trade names in a trademark
sense to endorse or promote products or services of Licensee, or any
third party. As an exception, the "BeOpen Python" logos available at
http://www.pythonlabs.com/logos.html may be used according to the
permissions granted on that web page.

7. By copying, installing or otherwise using the software, Licensee
agrees to be bound by the terms and conditions of this License
Agreement.

CNRI OPEN SOURCE LICENSE AGREEMENT (for Python 1.6b1)
--------------------------------------------------

IMPORTANT: PLEASE READ THE FOLLOWING AGREEMENT CAREFULLY.

BY CLICKING ON "ACCEPT" WHERE INDICATED BELOW, OR BY COPYING,
INSTALLING OR OTHERWISE USING PYTHON 1.6, beta 1 SOFTWARE, YOU ARE
DEEMED TO HAVE AGREED TO THE TERMS AND CONDITIONS OF THIS LICENSE
AGREEMENT.

1. This LICENSE AGREEMENT is between the Corporation for National
Research Initiatives, having an office at 1895 Preston White Drive,
Reston, VA 20191 ("CNRI"), and the Individual or Organization
("Licensee") accessing and otherwise using Python 1.6, beta 1
software in source or binary form and its associated documentation,
as released at the www.python.org Internet site on August 4, 2000
("Python 1.6b1").

2. Subject to the terms and conditions of this License Agreement, CNRI
hereby grants Licensee a non-exclusive, royalty-free, world-wide
license to reproduce, analyze, test, perform and/or display
publicly, prepare derivative works, distribute, and otherwise use
Python 1.6b1 alone or in any derivative version, provided, however,
that CNRIs License Agreement is retained in Python 1.6b1, alone or
in any derivative version prepared by Licensee.

Alternately, in lieu of CNRIs License Agreement, Licensee may
substitute the following text (omitting the quotes): "Python 1.6,
beta 1, is made available subject to the terms and conditions in
CNRIs License Agreement. This Agreement may be located on the
Internet using the following unique, persistent identifier (known
as a handle): 1895.22/1011. This Agreement may also be obtained
from a proxy server on the Internet using the
URL:http://hdl.handle.net/1895.22/1011".

3. In the event Licensee prepares a derivative work that is based on
or incorporates Python 1.6b1 or any part thereof, and wants to make
the derivative work available to the public as provided herein,
then Licensee hereby agrees to indicate in any such work the nature
of the modifications made to Python 1.6b1.

4. CNRI is making Python 1.6b1 available to Licensee on an "AS IS"
basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR
FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6b1
WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.

5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR
LOSS AS A RESULT OF USING, MODIFYING OR DISTRIBUTING PYTHON 1.6b1,
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY
THEREOF.

6. This License Agreement will automatically terminate upon a material
breach of its terms and conditions.

7. This License Agreement shall be governed by and interpreted in all
respects by the law of the State of Virginia, excluding conflict of
law provisions. Nothing in this License Agreement shall be deemed
to create any relationship of agency, partnership, or joint venture
between CNRI and Licensee. This License Agreement does not grant
permission to use CNRI trademarks or trade name in a trademark
sense to endorse or promote products or services of Licensee, or
any third party.

8. By clicking on the "ACCEPT" button where indicated, or by copying,
installing or otherwise using Python 1.6b1, Licensee agrees to be
bound by the terms and conditions of this License Agreement.

ACCEPT

CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
--------------------------------------------------

Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
The Netherlands. All rights reserved.

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation, and that the name of Stichting Mathematisch
Centrum or CWI not be used in advertising or publicity pertaining to
distribution of the software without specific, written prior
permission.

STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Description:
- source: https://github.com/jul/yahi
- doc: http://yahi.readthedocs.org/
- ticketting: https://github.com/jul/yahi/issues


Versatile log parser (providing default extractors for apache/lighttpd/varnish)
===============================================================================

Command line usage
------------------

Simplest usage is::

speed_shoot -g /usr/local/data/geoIP /var/www/apache/access*log


it will return a json in the form::

{
"by_date": {
"2012-5-3": 11
},
"total_line": 11,
"ip_by_url": {
"/favicon.ico": {
"192.168.0.254": 2,
"192.168.0.35": 2
},
"/": {
"74.125.18.162": 1,
"192.168.0.254": 1,
"192.168.0.35": 5
}
},
"by_status": {
"200": 7,
"404": 4
},
"by_dist": {
"unknown": 11
},
"bytes_by_ip": {
"74.125.18.162": 151,
"192.168.0.254": 489,
"192.168.0.35": 1093
},
"by_url": {
"/favicon.ico": 4,
"/": 7
},
"by_os": {
"unknown": 11
},
"week_browser": {
"3": {
"unknown": 11
}
},
"by_referer": {
"-": 11
},
"by_browser": {
"unknown": 11
},
"by_ip": {
"74.125.18.162": 1,
"192.168.0.254": 3,
"192.168.0.35": 7
},
"by_agent": {
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0,gzip(gfe) (via translate.google.com)": 1,
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0": 10
},
"by_hour": {
"9": 3,
"10": 4,
"11": 1,
"12": 3
},
"by_country": {
"": 10,
"US": 1
}
}


If you use::

speed_shoot -f csv -g /usr/local/data/geoIP /var/www/apache/access*log


Your result is::

by_date,2012-5-3,11
total_line,11
ip_by_url,/favicon.ico,192.168.0.254,2
ip_by_url,/favicon.ico,192.168.0.35,2
ip_by_url,/,74.125.18.162,1
ip_by_url,/,192.168.0.254,1
ip_by_url,/,192.168.0.35,5
by_status,200,7
by_status,404,4
by_dist,unknown,11
bytes_by_ip,74.125.18.162,151
bytes_by_ip,192.168.0.254,489
bytes_by_ip,192.168.0.35,1093
by_url,/favicon.ico,4
by_url,/,7
by_os,unknown,11
week_browser,3,unknown,11
by_referer,-,11
by_browser,unknown,11
by_ip,74.125.18.162,1
by_ip,192.168.0.254,3
by_ip,192.168.0.35,7
by_agent,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0,gzip(gfe) (via translate.google.com)",1
by_agent,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0,10
by_hour,9,3
by_hour,10,4
by_hour,11,1
by_hour,12,3
by_country,,10
by_country,US,1


Well I guess, it does not work because you first need to fetch geoIP data file::

mkdir data
wget -O- "http://www.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz" | zcat > data/GeoIP.dat

Of course, this is the geoLite database, I don't include the data in the package
since geoIP must be updated often to stay accurate.

Default path for geoIP is data/GeoIP.dat

Use as a script
---------------

speed shoot is in fact a template of how to use yahi as a module::

#!/usr/bin/env python
from archery.bow import Hankyu as _dict
from yahi import notch, shoot
from datetime import datetime


context=notch()
date_formater= lambda dt :"%s-%s-%s" % ( dt.year, dt.month, dt.day)
context.output(
shoot(
context,
lambda data : _dict({
'by_country': _dict({data['_country']: 1}),
'by_date': _dict({date_formater(data['_datetime']): 1 }),
'by_hour': _dict({data['_datetime'].hour: 1 }),
'by_os': _dict({data['_os_name']: 1 }),
'by_dist': _dict({data['_dist_name']: 1 }),
'by_browser': _dict({data['_browser_name']: 1 }),
'by_ip': _dict({data['ip']: 1 }),
'by_status': _dict({data['status']: 1 }),
'by_url': _dict({data['uri']: 1}),
'by_agent': _dict({data['agent']: 1}),
'by_referer': _dict({data['referer']: 1}),
'ip_by_url': _dict({data['uri']: _dict( {data['ip']: 1 })}),
'bytes_by_ip': _dict({data['ip']: int(data['bytes'])}),
'week_browser' : _dict({data['_datetime'].weekday():
_dict({data["_browser_name"] :1 })}),
'total_line' : 1,
}),
),
)



Installation
============

easy as::

pip install yahi

or::

easy_install yahi

Recommanded usage
=================

- for basic log aggregation, I do recommand using command line;
- for one shot metrics I recommend an interactive console (bpython or ipython);
- for specific metrics or elaborate filters I recommand using the API.

CHANGELOG
=========

0.1.3
-----

Adding varnish incomplete regexp for log parsing (I miss 2 fields)


Keywords: log,parsing
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Python Software Foundation License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for yahi, version 0.1.4
Filename, size File type Python version Upload date Hashes
Filename, size yahi-0.1.4.tar.gz (15.1 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page