logreduce

Extract anomalies from log files

These details have not been verified by PyPI

Project links

Homepage

Project description

Based on success logs, logreduce highlights useful text in failed logs. The goal is to save time in finding a failure’s root cause.

On average, learning run at 2000 lines per second, and testing run at 1300 lines per seconds.

How it works

logreduce uses a model to learn successful logs and detect novelties in failed logs:

Random words are manually removed using regular expression
Then lines are converted to a matrix of token occurrences (using HashingVectorizer),
An unsupervised learner implements neighbor searches (using NearestNeighbors).

Caveats

This method doesn’t work when debug content is only included in failed logs. To successfully detect anomalies, failed and success logs needs to be similar, otherwise the extra informations in failed logs will be considered anomalous.

For example this happens with testr where success logs only contains ‘SUCCESS’.

Install

Fedora:

sudo dnf install -y python3-scikit-learn
git clone https://softwarefactory-project.io/r/logreduce
pushd logreduce
python3 setup.py develop --user
popd

Pip:

pip install --user logreduce

Usage

Logreduce needs a baseline for success log training, and a target for the log to reduce.

Logreduce prints anomalies on the console, the log files are not modified:

"%(distance)f | %(log_path)s:%(line_number)d: %(log_line)s"

Local file usage

Compare two files or directories without building a model:

$ logreduce diff testr-nodepool-01/output.good testr-nodepool-01/output.fail
0.232 | testr-nodepool-01/output.fail:0677:  File "voluptuous/schema_builder.py", line 370, in validate_mapping
0.462 | testr-nodepool-01/output.fail:0678:    raise er.MultipleInvalid(errors)
0.650 | testr-nodepool-01/output.fail:0679:  voluptuous.error.MultipleInvalid: required key not provided @ data['providers'][2]['cloud']

Compare two files or directories:

$ logreduce dir preprod-logs/ /var/log/

Or build a model first and run it separately:

$ logreduce dir-train sosreport.clf old-sosreport/ good-sosreport/
$ logreduce dir-run sosreport.clf new-sosreport/

Zuul job usage

Logreduce can query Zuul build database to train a model.

Extract novelty from a job logs:

$ logreduce job http://logs.openstack.org/...

# Reduce comparaison to a single project (e.g. for tox jobs)
$ logreduce job --project openstack/nova http://logs.openstack.org/...

# Compare using many baselines
$ logreduce job --count 10 http://logs.openstack.org/...

# Include job artifacts
$ logreduce job --include-path logs/ http:/logs.openstack.org/...

Or build a model first and run it separately:

$ logreduce job-train --job job_name job_name.clf
$ logreduce job-run job_name.clf http://logs.openstack.org/.../

Journald usage

Logreduce can look for anomaly in journald, comparing the last day/week/month to the previous one:

Extract novelty from last day journal:

$ logreduce journal --range day

Build a model using journal of last month and look for novelty in last week:

$ logreduce journal-train --range month good-journal.clf
$ logreduce journal-run --range week good-journal.clf

logreduce-tests

This package contains tests data for different type of log such as testr or syslog. Each tests includes a pre-computed list of the anomalies in log failures.

This package also includes a command line utility to run logreduce against all tests data and print a summary of its performance.

Test format

Each tests case is composed of:

A .good file (or directory) that holds the baseline
A .fail file (or directory)
A info.yaml file that describe expected output:

threshold: float # set the distance threshold for the test
anomalies:
  - optional: bool  # to define minor anomalies not considered false positive
    lines: |        # the expected lines to be highlighted
      Traceback...
      RuntimeError...

Evaluate

To run the evaluation, first install logreduce-tests:

git clone https://softwarefactory-project.io/r/logreduce-tests
pushd logreduce-tests
python3 setup.py develop --user

logreduce-tests expect tests directories as argument:

$ logreduce-tests tests/testr-zuul-[0-9]*
[testr-zuul-01]: 100.00% accuracy,  5.00% false-positive
[testr-zuul-02]:  80.00% accuracy,  0.00% false-positive
...
Summary:  90.00% accuracy,  2.50% false-positive

Add –debug to display false positive and missing chunks.

TODOs

Add terminal colors output
Add progress bar
Better differentiate training debug from testing debug
Add a starting log line and report written
Add tarball traversal in utils.files_iterator
Add logstash filter module
Improve tokenization tests

Roadmap

Add daemon worker mode with MQTT event listener
Discard files that are 100% anomalous
Report mean diviation instead of absolute distances
Investigate second stage model

Contribute

Contribution are most welcome, use git-review to propose a change. Setup your ssh keys after sign in https://softwarefactory-project.io/auth/login

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.6.1

Jan 19, 2021

0.5.2

Dec 17, 2019

0.5.1

Dec 17, 2019

0.5.0

Nov 25, 2019

0.4.0

Nov 8, 2018

0.3.0

Oct 25, 2018

This version

0.2.0

Aug 27, 2018

0.1.3

Jul 4, 2018

0.1.2

Apr 26, 2018

0.1.1

Apr 4, 2018

0.1.0

Mar 2, 2018

0.0.1

Dec 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logreduce-0.2.0.tar.gz (41.8 kB view details)

Uploaded Aug 27, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

logreduce-0.2.0-py2.py3-none-any.whl (36.0 kB view details)

Uploaded Aug 27, 2018 Python 2Python 3

File details

Details for the file logreduce-0.2.0.tar.gz.

File metadata

Download URL: logreduce-0.2.0.tar.gz
Upload date: Aug 27, 2018
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.6 CPython/2.7.5

File hashes

Hashes for logreduce-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2c2fd6f3e634f54e385f0b7154dcfc0dd5830e9876a526699f5570f601f72a1b`
MD5	`976fa500275101a39f481cc02e5e4b3f`
BLAKE2b-256	`6d57030c1c4699e1ec9557714f46a3851bfbe9d989a054a9bd87867146c7e55e`

See more details on using hashes here.

File details

Details for the file logreduce-0.2.0-py2.py3-none-any.whl.

File metadata

Download URL: logreduce-0.2.0-py2.py3-none-any.whl
Upload date: Aug 27, 2018
Size: 36.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.6 CPython/2.7.5

File hashes

Hashes for logreduce-0.2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b83fcb87b28d1b53836d56b957abaa0df1f2513d69b3b30a6034bc6b8108ac48`
MD5	`58bdd4f29411a0bafbd7a3919415807c`
BLAKE2b-256	`1921492302615ab98c4129c60e56921b9edadeb452c7297ea68e58ecb899ffb8`

See more details on using hashes here.

logreduce 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

How it works

Caveats

Install

Usage

Local file usage

Zuul job usage

Journald usage

logreduce-tests

Test format

Evaluate

TODOs

Roadmap

Contribute

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes