Skip to main content

python daemon that munches on logs and sends their contents to logstash

Project description

python daemon that munches on logs and sends their contents to logstash

Requirements

  • Python 2.7 (untested on other versions)
  • Optional zeromq support: install libzmq (brew install zmq or apt-get install libzmq-dev) and pyzmq (pip install pyzmq==2.1.11)

Installation

Using PIP:

From Github:

pip install git+git://github.com/josegonzalez/beaver.git#egg=beaver

From PyPI:

pip install beaver==13

Usage

usage:

beaver [-h] [-m {bind,connect}] [-p PATH] [-f FILES [FILES ...]]
          [-t {rabbitmq,redis,stdout,zmq,udp}] [-c CONFIG] [-d DEBUG] [--fqdn]

optional arguments:

-h, --help            show this help message and exit
-c CONFIG, --configfile CONFIG
                      ini config file path
-d, --debug           enable debug mode
-f FILES [FILES ...], --files FILES [FILES ...]
                      space-separated filelist to watch, can include globs
                      (*.log). Overrides --path argument
--format {json,msgpack,string}
                      format to use when sending to transport
--hostname HOSTNAME   manual hostname override for source_host
-m {bind,connect}, --mode {bind,connect}
                      bind or connect mode
-p PATH, --path PATH  path to log files
-t {rabbitmq,redis,stdout,zmq,udp}, --transport {rabbitmq,redis,stdout,zmq,udp}
                      log transport method
-v, --version         output version and quit
--fqdn                use the machine's FQDN

Background

Beaver provides an lightweight method for shipping local log files to Logstash. It does this using either redis, stdin, zeromq as the transport. This means you’ll need a redis, stdin, zeromq input somewhere down the road to get the events.

Events are sent in logstash’s json_event format. Options can also be set as environment variables.

NOTE: the redis transport uses a namespace of logstash:beaver by default. You will need to update your logstash indexer to match this.

Configuration File Options

Beaver can optionally get data from a configfile using the -c flag. This file is in ini format. Global configuration will be under the beaver stanza. The following are global beaver configuration keys with their respective meanings:

  • rabbitmq_host: Defaults localhost. Host for RabbitMQ.
  • rabbitmq_port: Defaults 5672. Port for RabbitMQ.
  • rabbitmq_vhost: Default /
  • rabbitmq_username: Default guest
  • rabbitmq_password: Default guest
  • rabbitmq_queue: Default logstash-queue.
  • rabbitmq_exchange: Default direct.
  • rabbitmq_exchange_durable: Default 0.
  • rabbitmq_key: Default logstash-key.
  • rabbitmq_exchange: Default logstash-exchange.
  • redis_url: Default redis://localhost:6379/0. Redis URL
  • redis_namespace: Default logstash:beaver. Redis key namespace
  • udp_host: Default 127.0.0.1. UDP Host
  • udp_port: Default 9999. UDP Port
  • zeromq_address: Default tcp://localhost:2120. Zeromq URL
  • zeromq_bind: Default bind. Whether to bind to zeromq host or simply connect

The following are used for instances when a TransportException is thrown - Transport dependent

  • respawn_delay: Default 3. Initial respawn delay for exponential backoff
  • max_failure: Default 7. Max failures before exponential backoff terminates

The following configuration keys are for building an SSH Tunnel that can be used to proxy from the current host to a desired server. This proxy is torn down when Beaver halts in all cases.

  • ssh_key_file: Default None. Full path to id_rsa key file
  • ssh_tunnel: Default None. SSH Tunnel in the format user@host:port
  • ssh_tunnel_port: Default None. Local port for SSH Tunnel
  • ssh_remote_host: Default None. Remote host to connect to within SSH Tunnel
  • ssh_remote_port: Default None. Remote port to connect to within SSH Tunnel

The following can also be passed via argparse. Argparse will override all options in the configfile, when specified.

  • format: Default json. Options [ json, msgpack, string ]. Format to use when sending to transport
  • files: Default files. Space-separated list of files to tail.
  • path: Default /var/log. Path glob to tail.
  • transport: Default stdout. Transport to use when log changes are detected
  • fqdn: Default False. Whether to use the machine’s FQDN in transport output
  • hostname: Default None. Manually specified hostname

Examples

Example 1: Listen to all files in the default path of /var/log on standard out as json:

beaver

Example 2: Listen to all files in the default path of /var/log on standard out with msgpack:

beaver --format msgpack

Example 3: Listen to all files in the default path of /var/log on standard out as a string:

beaver --format string

Example 4: Sending logs from /var/log files to a redis list:

# /etc/beaver.conf
[beaver]
redis_url: redis://localhost:6379/0

# From the commandline
beaver  -c /etc/beaver.conf -t redis

Example 5: Use environment variables to send logs from /var/log files to a redis list:

# /etc/beaver.conf
[beaver]
redis_url: redis://localhost:6379/0

# From the commandline
beaver  -c /etc/beaver.conf -p '/var/log' -t redis

Example 6: Zeromq listening on port 5556 (all interfaces):

# /etc/beaver.conf
[beaver]
zeromq_address: tcp://*:5556

# logstash indexer config:
input {
  zeromq {
    type => 'shipper-input'
    mode => 'client'
    topology => 'pushpull'
    address => 'tcp://shipperhost:5556'
  }
}
output { stdout { debug => true } }

# From the commandline
beaver  -c /etc/beaver.conf -m bind -t zmq

Example 7: Zeromq connecting to remote port 5556 on indexer:

# /etc/beaver.conf
[beaver]
zeromq_address: tcp://indexer:5556

# logstash indexer config:
input {
  zeromq {
    type => 'shipper-input'
    mode => 'server'
    topology => 'pushpull'
    address => 'tcp://*:5556'
  }
}
output { stdout { debug => true } }

# on the commandline
beaver -c /etc/beaver.conf -m connect -t zmq

Example 8: Real-world usage of Redis as a transport:

# in /etc/hosts
192.168.0.10 redis-internal

# /etc/beaver.conf
[beaver]
redis_url: redis://redis-internal:6379/0
redis_namespace: app:unmappable

# logstash indexer config:
input {
  redis {
    host => 'redis-internal'
    data_type => 'list'
    key => 'app:unmappable'
    type => 'app:unmappable'
  }
}
output { stdout { debug => true } }

# From the commandline
beaver -c /etc/beaver.conf -f /var/log/unmappable.log -t redis

As you can see, beaver is pretty flexible as to how you can use/abuse it in production.

Example 9: RabbitMQ connecting to defaults on remote broker:

# /etc/beaver.conf
[beaver]
rabbitmq_host: 10.0.0.1

# logstash indexer config:
input { amqp {
    name => 'logstash-queue'
    type => 'direct'
    host => '10.0.0.1'
    exchange => 'logstash-exchange'
    key => 'logstash-key'
    exclusive => false
    durable => false
    auto_delete => false
  }
}
output { stdout { debug => true } }

# From the commandline
beaver -c /etc/beaver.conf -t rabbitmq

Example 10: Read config from config.ini and put to stdout:

# /etc/beaver.conf:
[/tmp/somefile]
type: mytype
tags: tag1,tag2
add_field: fieldname1,fieldvalue1[,fieldname2,fieldvalue2, ...]

[/var/log/*log]
type: syslog
tags: sys

[/var/log/{secure,messages}.log]
type: syslog
tags: sys

# From the commandline
beaver -c /etc/beaver.conf -t stdout

Example 11: UDP transport:

# /etc/beaver.conf
[beaver]
udp_host: 127.0.0.1
udp_port: 9999

# logstash indexer config:
input {
  udp {
    type => 'shipper-input'
    host => '127.0.0.1'
    port => '9999'
  }
}
output { stdout { debug => true } }

# From the commandline
beaver -c /etc/beaver.conf -t udp

Todo

  • Use python threading + subprocess in order to support usage of yield across all operating systems
  • Fix usage on non-linux platforms - file.readline() does not work as expected on OS X. See above for potential solution
  • More transports
  • ~Ability to specify files, tags, and other metadata within a configuration file~

Caveats

When using copytruncate style log rotation, two race conditions can occur:

  1. Any log data written prior to truncation which beaver has not yet read and processed is lost. Nothing we can do about that.

  2. Should the file be truncated, rewritten, and end up being larger than the original file during the sleep interval, beaver won’t detect this. After some experimentation, this behavior also exists in GNU tail, so I’m going to call this a “don’t do that then” bug :)

    Additionally, the files beaver will most likely be called upon to watch which may be truncated are generally going to be large enough and slow-filling enough that this won’t crop up in the wild.

Credits

Based on work from Giampaolo and Lusis:

Real time log files watcher supporting log rotation.

Original Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com>
http://code.activestate.com/recipes/577968-log-watcher-tail-f-log/

License: MIT

Other hacks (ZMQ, JSON, optparse, ...): lusis

Changelog

13 (2012-12-17)

  • Fixed certain environment variables. [Jose Diaz-Gonzalez]

  • SSH Tunnel Support. [Jose Diaz-Gonzalez]

    This code should allow us to create an ssh tunnel between two distinct servers for the purposes of sending and receiving data.

    This is useful in certain cases where you would otherwise need to whitelist in your Firewall or iptables setup, such as when running in two different regions on AWS.

  • Allow for initial connection lag. Helpful when waiting for an SSH proxy to connect. [Jose Diaz-Gonzalez]

  • Fix issue where certain config defaults were of an improper value. [Jose Diaz-Gonzalez]

  • Allow specifying host via flag. Closes #70. [Jose Diaz-Gonzalez]

12 (2012-12-17)

  • Reload tailed files on non-linux platforms. [Jose Diaz-Gonzalez]

    Python has an issue on OS X were the underlying C implementation of file.read() caches the EOF, therefore causing readlines() to only work once. This happens to also fail miserably when you are seeking to the end before calling readlines.

    This fix solves the issue by constantly re reading the files changed.

    Note that this also causes debug mode to be very noisy on OS X. We all have to make sacrifices…

  • Deprecate all environment variables. [Jose Diaz-Gonzalez]

    This shifts configuration management into the BeaverConfig class. Note that we currently throw a warning if you are using environment variables.

    Refs #72 Closes #60

  • Warn when using deprecated ENV variables for configuration. Refs #72. [Jose Diaz-Gonzalez]

  • Minor changes for PEP8 conformance. [Jose Diaz-Gonzalez]

11 (2012-12-16)

  • Add optional support for socket.getfqdn. [Jeremy Kitchen]

    For my setup I need to have the fqdn used at all times since my hostnames are the same but the environment (among other things) is found in the rest of the FQDN.

    Since just changing socket.gethostname to socket.getfqdn has lots of potential for breakage, and socket.gethostname doesn’t always return an FQDN, it’s now an option to explicitly always use the fqdn.

    Fixes #68

  • Check for log file truncation fixes #55. [Jeremy Kitchen]

    This adds a simple check for log file truncation and resets the watch when detected.

    There do exist 2 race conditions here: 1. Any log data written prior to truncation which beaver has not yet read and processed is lost. Nothing we can do about that. 2. Should the file be truncated, rewritten, and end up being larger than the original file during the sleep interval, beaver won’t detect this. After some experimentation, this behavior also exists in GNU tail, so I’m going to call this a “don’t do that then” bug :)

    Additionally, the files beaver will most likely be called upon to watch which may be truncated are generally going to be large enough and slow filling enough that this won’t crop up in the wild.

  • Add a version number to beaver. [Jose Diaz-Gonzalez]

10 (2012-12-15)

  • Fixed package name. [Jose Diaz-Gonzalez]
  • Regenerate CHANGES.rst on release. [Jose Diaz-Gonzalez]
  • Adding support for /path/{foo,bar}.log. [Josh Braegger]
  • Ignore file errors in unwatch method – the file might not exists. [Josh Braegger]
  • Unwatch file when encountering a stale NFS handle. When an NFS file handle becomes stale (ie, file was removed), it was crashing beaver. Need to just unwatch file. [Josh Braegger]
  • Consistency. [Chris Faulkner]
  • Pull install requirements from requirements/base.txt so they don’t get out of sync. [Chris Faulkner]
  • Include changelog in setup. [Chris Faulkner]
  • Convert changelog to RST. [Chris Faulkner]
  • Actually show the license. [Chris Faulkner]
  • Consistent casing. [Chris Faulkner]
  • Consistency. [Chris Faulkner]
  • Stating the obvious. [Chris Faulkner]
  • Grist for the mill. [Chris Faulkner]
  • Drop redundant README.txt. [Chris Faulkner]
  • Don’t use empty string for tag when no tags configured in config file. [Stylianos Modes]
  • Making ‘mode’ option work for zmqtransport. Adding setuptools and tests (use ./setup.py nosetests). Adding .gitignore. [Josh Braegger]

9 (2012-11-28)

  • More release changes. [Jose Diaz-Gonzalez]
  • Fixed deprecated warning when declaring exchange type. [Rafael Fonseca]

7 (2012-11-28)

  • Added a helper script for creating releases. [Jose Diaz-Gonzalez]
  • Partial fix for crashes caused by globbed files. [Jose Diaz-Gonzalez]
  • Removed deprecated usage of e.message. [Rafael Fonseca]
  • Fixed exception trapping code. [Rafael Fonseca]
  • Added some resiliency code to rabbitmq transport. [Rafael Fonseca]

6 (2012-11-26)

  • Fix issue where polling for files was done incorrectly. [Jose Diaz- Gonzalez]
  • Added ubuntu init.d example config. [Jose Diaz-Gonzalez]

5 (2012-11-26)

  • Try to poll for files on startup instead of throwing exceptions. Closes #45. [Jose Diaz-Gonzalez]
  • Added python 2.6 to classifiers. [Jose Diaz-Gonzalez]

4 (2012-11-26)

  • Remove unused local vars. [Jose Diaz-Gonzalez]
  • Allow rabbitmq exchange type and durability to be configured. [Jose Diaz-Gonzalez]
  • Remove unused import. [Jose Diaz-Gonzalez]
  • Formatted code to fix PEP8 violations. [Jose Diaz-Gonzalez]
  • Use alternate dict syntax for Python 2.6 support. Closes #43. [Jose Diaz-Gonzalez]
  • Fixed release date for version 3. [Jose Diaz-Gonzalez]

3 (2012-11-25)

  • Added requirements files to manifest. [Jose Diaz-Gonzalez]

  • Include all contrib files in release. [Jose Diaz-Gonzalez]

  • Revert “removed redundant README.txt” to follow pypi standards. [Jose Diaz-Gonzalez]

    This reverts commit e667f63706e0af8bc82c0eac6eac43318144e107.

  • Added bash startup script. Closes #35. [Jose Diaz-Gonzalez]

  • Added an example supervisor config for redis. closes #34. [Jose Diaz- Gonzalez]

  • Removed redundant README.txt. [Jose Diaz-Gonzalez]

  • Added classifiers to package. [Jose Diaz-Gonzalez]

  • Re-order workers. [Jose Diaz-Gonzalez]

  • Re-require pika. [Jose Diaz-Gonzalez]

  • Make zeromq installation optional. [Morgan Delagrange]

  • Formatting. [Jose Diaz-Gonzalez]

  • Added changes to changelog for version 3. [Jose Diaz-Gonzalez]

  • Timestamp in ISO 8601 format with the “Z” sufix to express UTC. [Xabier de Zuazo]

  • Adding udp support. [Morgan Delagrange]

  • Lpush changed to rpush on redis transport. This is required to always read the events in the correct order on the logstash side. See: https: //github.com/logstash/logstash/blob/6f745110671b5d9d66bf082fbfed99d145 af4620/lib/logstash/outputs/redis.rb#L4. [Xabier de Zuazo]

2 (2012-10-25)

  • Example upstart script. [Michael D’Auria]

  • Fixed a few more import statements. [Jose Diaz-Gonzalez]

  • Fixed binary call. [Jose Diaz-Gonzalez]

  • Refactored logging. [Jose Diaz-Gonzalez]

  • Improve logging. [Michael D’Auria]

  • Removed unnecessary print statements. [Jose Diaz-Gonzalez]

  • Add default stream handler when transport is stdout. Closes #26. [bear (Mike Taylor)]

  • Handle the case where the config file is not present. [Michael D’Auria]

  • Better exception handling for unhandled exceptions. [Michael D’Auria]

  • Fix wrong addfield values. [Alexander Fortin]

  • Add add_field to config example. [Alexander Fortin]

  • Add support for add_field into config file. [Alexander Fortin]

  • Minor readme updates. [Jose Diaz-Gonzalez]

  • Add support for type reading from INI config file. [Alexander Fortin]

    Add support for symlinks in config file

    Add support for file globbing in config file

    Add support for tags

    a little bit of refactoring, move type and tags check down into transport class

    create config object (reading /dev/null) even if no config file has been given via cli

    Add documentation for INI file to readme

    Remove unused json library

    Conflicts: README.rst

  • When sending data over the wire, use UTC timestamps. [Darren Worrall]

  • Support globs in file paths. [Darren Worrall]

  • Added msgpack support. [Jose Diaz-Gonzalez]

  • Use the python logging framework. [Jose Diaz-Gonzalez]

  • Fixed Transport.format() method. [Jose Diaz-Gonzalez]

  • Properly parse BEAVER_FILES env var. [Jose Diaz-Gonzalez]

  • Refactor transports. [Jose Diaz-Gonzalez]

    Fix the json import to use the fastest json module available

    Move formatting into Transport class

  • Attempt to fix defaults from env variables. [Jose Diaz-Gonzalez]

  • Fix README and beaver CLI help to reference correct RABBITMQ_HOST environment variable. [jdutton]

  • Add RabbitMQ support. [Alexander Fortin]

  • Added real-world example of beaver usage for tailing a file. [Jose Diaz-Gonzalez]

  • Removed unused argument. [Jose Diaz-Gonzalez]

  • Ensure that python-compatible readme is included in package. [Jose Diaz-Gonzalez]

  • Fix variable naming and timeout for redis transport. [Jose Diaz- Gonzalez]

  • Installation instructions. [Jose Diaz-Gonzalez]

  • Use restructured text for readme instead of markdown. [Jose Diaz- Gonzalez]

  • Removed unnecessary .gitignore. [Jose Diaz-Gonzalez]

1 (2012-08-06)

  • Moved app into python package format. [Jose Diaz-Gonzalez]
  • Moved binary beaver.py to bin/beaver, as per python packaging. [Jose Diaz-Gonzalez]
  • Moved around transports to be independent of each other. [Jose Diaz- Gonzalez]
  • Reorder transports. [Jose Diaz-Gonzalez]
  • Rewrote run_worker to throw exception if all transport options have been exhausted. [Jose Diaz-Gonzalez]
  • Rename Amqp -> Zmq to avoid confusion with RabbitMQ. [Alexander Fortin]
  • Added choices to the –transport argument. [Jose Diaz-Gonzalez]
  • Fixed derpy formatting. [Jose Diaz-Gonzalez]
  • Added usage to the readme. [Jose Diaz-Gonzalez]
  • Support usage of environment variables instead of arguments. [Jose Diaz-Gonzalez]
  • Fixed files argument parsing. [Jose Diaz-Gonzalez]
  • One does not simply license all the things. [Jose Diaz-Gonzalez]
  • Add todo to readme. [Jose Diaz-Gonzalez]
  • Added version to pyzmq. [Jose Diaz-Gonzalez]
  • Added license. [Jose Diaz-Gonzalez]
  • Reordered imports. [Jose Diaz-Gonzalez]
  • Moved all transports to beaver/transports.py. [Jose Diaz-Gonzalez]
  • Calculate current timestamp at most once per callback fired. [Jose Diaz-Gonzalez]
  • Modified transports to include proper information for ingestion in logstash. [Jose Diaz-Gonzalez]
  • Fixed package imports. [Jose Diaz-Gonzalez]
  • Removed another compiled python file. [Jose Diaz-Gonzalez]
  • Use ujson instead of simplejson. [Jose Diaz-Gonzalez]
  • Ignore compiled python files. [Jose Diaz-Gonzalez]
  • Fixed imports. [Jose Diaz-Gonzalez]
  • Fixed up readme instructions. [Jose Diaz-Gonzalez]
  • Refactor transports so that connections are no longer global. [Jose Diaz-Gonzalez]
  • Readme and License. [Jose Diaz-Gonzalez]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
Beaver-13.tar.gz (29.4 kB) Copy SHA256 hash SHA256 Source None Dec 17, 2012

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page