Skip to main content

Extract anomalies from log files

Project description

Based on success logs, logreduce highlights useful text in failed logs. The goal is to save time in finding a failure’s root cause.

On average, learning run at 2000 lines per second, and testing run at 1300 lines per seconds.

How it works

logreduce uses a model to learn successful logs and detect novelties in failed logs:

  • Random words are manually removed using regular expression

  • Then lines are converted to a matrix of token occurrences (using HashingVectorizer),

  • An unsupervised learner implements neighbor searches (using NearestNeighbors).

Caveats

This method doesn’t work when debug content is only included in failed logs. To successfully detect anomalies, failed and success logs needs to be similar, otherwise the extra informations in failed logs will be considered anomalous.

For example this happens with testr where success logs only contains ‘SUCCESS’.

Install

  • Fedora:

sudo dnf install -y python3-scikit-learn
git clone https://softwarefactory-project.io/r/logreduce
pushd logreduce
python3 setup.py develop --user
popd
  • openSUSE:

sudo zypper install python3-scikit-learn
git clone https://softwarefactory-project.io/r/logreduce
pushd logreduce
python3 setup.py develop --user
popd
  • Pip:

pip install --user logreduce

Command Line Interface Usage

Logreduce needs a baseline for success log training, and a target for the log to reduce.

Logreduce prints anomalies on the console, the log files are not modified:

"%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s"

Local file usage

  • Compare two files or directories without building a model:

$ logreduce diff testr-nodepool-01/output.good testr-nodepool-01/output.fail
0.232 | testr-nodepool-01/output.fail:0677:  File "voluptuous/schema_builder.py", line 370, in validate_mapping
0.462 | testr-nodepool-01/output.fail:0678:    raise er.MultipleInvalid(errors)
0.650 | testr-nodepool-01/output.fail:0679:  voluptuous.error.MultipleInvalid: required key not provided @ data['providers'][2]['cloud']
  • Compare two files or directories:

$ logreduce dir preprod-logs/ /var/log/
  • Or build a model first and run it separately:

$ logreduce dir-train sosreport.clf old-sosreport/ good-sosreport/
$ logreduce dir-run sosreport.clf new-sosreport/

Zuul job usage

Logreduce can query Zuul build database to train a model.

  • Extract novelty from a job logs:

$ logreduce job http://logs.openstack.org/...

# Reduce comparaison to a single project (e.g. for tox jobs)
$ logreduce job --project openstack/nova http://logs.openstack.org/...

# Compare using many baselines
$ logreduce job --count 10 http://logs.openstack.org/...

# Include job artifacts
$ logreduce job --include-path logs/ http:/logs.openstack.org/...
  • Or build a model first and run it separately:

$ logreduce job-train --job job_name job_name.clf
$ logreduce job-run job_name.clf http://logs.openstack.org/.../

Journald usage

Logreduce can look for anomaly in journald, comparing the last day/week/month to the previous one:

  • Extract novelty from last day journal:

$ logreduce journal --range day
  • Build a model using journal of last month and look for novelty in last week:

$ logreduce journal-train --range month good-journal.clf
$ logreduce journal-run --range week good-journal.clf

Filters configuration

Some content yields false positives that can be ignored through filters. Using the –config command line attribute, filters can be set for exclude_files, exclude_paths and exclude_lines. Here is an example filters configuration file:

filters:
  exclude_files:
    - "deployment-hieradata.j2.yaml"
    - "tempest.html"
  exclude_paths:
    - "group_vars/Compute"
    - "group_vars/Controller"
    - "group_vars/Undercloud"
  exclude_lines:
    # neutron dhcp interface
    - "^tap[^ ]*$"
    # IPA cookies
    - "^.*[Cc]ookie.*ipa_session="

Python Module API

Logreduce can be used as a python module for custom use-case.

First you need to create a classifier object:

from logreduce import Classifier, Tokenizer, render_html

clf = Classifier(
    # A function to normalize filename, for example to remove dates or id
    filename_to_modelname=lambda fn: fn,
    # A function to ignore some file, for example configuration file
    keep_file=lambda _: True,
    # A function to process line
    process_line=Tokenizer.process
)

Then you train the object on baseline:

clf.train(["./success-logs/"])

And you test target and create a report:

result = clf.process(["./failed-logs/"])
with open("report.html", "w") as of:
    of.write(render_html(result))

logreduce-tests

This package contains tests data for different type of log such as testr or syslog. Each tests includes a pre-computed list of the anomalies in log failures.

This package also includes a command line utility to run logreduce against all tests data and print a summary of its performance.

Test format

Each tests case is composed of:

  • A .good file (or directory) that holds the baseline

  • A .fail file (or directory)

  • A info.yaml file that describe expected output:

threshold: float # set the distance threshold for the test
anomalies:
  - optional: bool  # to define minor anomalies not considered false positive
    lines: |        # the expected lines to be highlighted
      Traceback...
      RuntimeError...

Evaluate

To run the evaluation, first install logreduce-tests:

git clone https://softwarefactory-project.io/r/logreduce-tests
pushd logreduce-tests
python3 setup.py develop --user

logreduce-tests expect tests directories as argument:

$ logreduce-tests tests/testr-zuul-[0-9]*
[testr-zuul-01]: 100.00% accuracy,  5.00% false-positive
[testr-zuul-02]:  80.00% accuracy,  0.00% false-positive
...
Summary:  90.00% accuracy,  2.50% false-positive

Add –debug to display false positive and missing chunks.

TODOs

  • Add terminal colors output

  • Add progress bar

  • Better differentiate training debug from testing debug

  • Add a starting log line and report written

  • Add tarball traversal in utils.files_iterator

  • Add logstash filter module

  • Improve tokenization tests

Roadmap

  • Discard files that are 100% anomalous

  • Report mean diviation instead of absolute distances

  • Investigate second stage model

Contribute

Contribution are most welcome, use git-review to propose a change. Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login

Code style is managed with black, run black logreduce before commit to format the source file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logreduce-0.6.1.tar.gz (56.5 kB view details)

Uploaded Source

Built Distribution

logreduce-0.6.1-py2.py3-none-any.whl (49.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file logreduce-0.6.1.tar.gz.

File metadata

  • Download URL: logreduce-0.6.1.tar.gz
  • Upload date:
  • Size: 56.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for logreduce-0.6.1.tar.gz
Algorithm Hash digest
SHA256 599464cfb79db2c6070601f04606f16ddab294e576479d3feef10c53fbb6d2b3
MD5 dc4007539e1be7ac01a6a650074b9cce
BLAKE2b-256 912806d9fdb1a1763bfa7a88e09645315ac0d599af14ac5e7b079136440d560d

See more details on using hashes here.

File details

Details for the file logreduce-0.6.1-py2.py3-none-any.whl.

File metadata

  • Download URL: logreduce-0.6.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for logreduce-0.6.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a7f9aec81a46fe16cd53d0c52b0bbde990f589575f81227d7e56ab18762a0847
MD5 12d6ad2941cbfbd0c0e84938cf1e4aee
BLAKE2b-256 b4213e277447be054c8a9a11aa469acbf1c7a10020d113643392cb6f2febb772

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page