Skip to main content

LogDelta - Go Beyond Grepping with NLP-based Log File Analysis

Project description

LogDelta

LogDelta - Go Beyond Grepping with NLP-based Log Analysis!

Textual log line level anomaly detection. Which one is anomaly? 8 different log files

See YouTube demonstrating the tool in action.

Installation and Example

We recommend using a virtual environment to ensure smooth operations.

conda create -n logdelta python=3.11
conda activate logdelta

Install logdelta.

pip install logdelta

Download source code, and navigate to demo folder

git clone https://github.com/EvoTestOps/LogDelta.git
cd LogDelta/demo

Get data

wget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1
unzip Hadoop.zip -d Hadoop

Run analysis

python -m logdelta.config_runner -c config.yml`

Observer results in LogDelta/demo/Output.

For more examples see LogDelta/demo/label_investigation and LogDelta/demo/full

LogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names. A target run represents a software run we are interested in analyzing. LogDelta uses comparison runs as a baseline. For example, the "My_passing_logs1", "My_passing_logs2", "My_passing_logs3" folders can be comparison runs, while "My_failing_logs" would be your target run that you want to analyze with respect to comparison runs.

Types of Analysis

In LogDelta, three types of analysis are available:

  1. Visualize

    • Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents.
    • Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)
  2. Measure the distance between two logs or sets of logs using:

    • Jaccard distance
    • Cosine distance
    • Containment distance
    • Compression distance
  3. Build an anomaly detection model from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :

    • KMeans (kmeans)
    • IsolationForest (IF)
    • RarityModel (RM)
    • Out-of-Vocabulary Detector (OOVD)

Levels of Analysis

Analysis can be done at four different levels:

  1. Run (folder) level, investigating the names of files without looking at their contents.
  2. Run (folder) level, investigating run contents (this is slower than what is done in 1).
  3. File level, investigating file contents (matched with the same names between runs).
  4. Line level, investigating line contents (matched with the same names between runs).

Comparison to other tools.

logai. LogDelta shares many similarities with LogAI, a tool developed by Salesforce. However, the last time we checked, LogAI was not actively maintained. With some help from the issue tracker, we wer able to get it running. Yet, Impression was that it was a bit on the slow side compared to LogDelta. LogDelta runs on top of Polars, which offers excellent performance for processing log files with more than ten million rows on a laptop computer.

angel-grinder performs statistical analysis on log files, such as calculating the average response time in the logs. This is complementary to our tool as it allows analysis to be done within a single log file. Logdelta is not really useful for single log file analysis; rather, it requires 2 to n log files.

lnav - Logfile navigator is advertised as a tool for merging, tailing, searching, filtering, and querying log files. This is a great complement to LogDelta. In fact, during our Hadoop use case, we implemented a small script for log querying, but we would likely have been much better off using lnav.

Loglizer performs anomaly detection on logs. The last commit was 18 months ago, so it might no longer be actively maintained. However, it assumes parsed log data (e.g., with Drain), whereas LogDelta accepts raw text files. Loglizer does not appear to offer any visualizations. It seems to be more focused on anomaly detection benchmarking and, in this sense, is similar to our previous tool, LogLead, which was published a year ago. LogDelta is build on top of LogLead[^1]. https://pypi.org/project/LogLead/

[^1]: Mäntylä MV, Wang Y, Nyyssölä J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logdelta-1.0.1.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

logdelta-1.0.1-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file logdelta-1.0.1.tar.gz.

File metadata

  • Download URL: logdelta-1.0.1.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for logdelta-1.0.1.tar.gz
Algorithm Hash digest
SHA256 81844e39b01d674249de334611c7ff26ab7ffce1f23a08ea378678cbf401c627
MD5 fb123b1e389717b59db46f7d7098c411
BLAKE2b-256 c2396d2e3e434fa064600b4482b5fbc228defdc05e0174516abea5e27559054e

See more details on using hashes here.

File details

Details for the file logdelta-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: logdelta-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for logdelta-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5542c6dc0040e990b3eb6e8b2cd89bcaba1ccd36d01ae9e8f0612e69871fd32
MD5 97e68e5b210307d0eb46a11ddcc8c057
BLAKE2b-256 16531b8dfbe018af5d11325952d5f792273e8d41e6a288dcb2e1863b693dea84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page