LogDelta - Go Beyond Grepping with NLP-based Log File Analysis
Project description
LogDelta
LogDelta - Go Beyond Grepping with NLP-based Log Analysis!
Textual log line level anomaly detection. Which one is anomaly?
See YouTube demonstrating the tool in action.
Installation and Example
We recommend using a virtual environment to ensure smooth operations.
conda create -n logdelta python=3.11
conda activate logdelta
Install logdelta.
pip install logdelta
Download source code, and navigate to demo folder
git clone https://github.com/EvoTestOps/LogDelta.git
cd LogDelta/demo
Get data
wget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1
unzip Hadoop.zip -d Hadoop
Run analysis
python -m logdelta.config_runner -c config.yml`
Observer results in LogDelta/demo/Output.
For more examples see LogDelta/demo/label_investigation and LogDelta/demo/full
LogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names. A target run represents a software run we are interested in analyzing. LogDelta uses comparison runs as a baseline. For example, the "My_passing_logs1", "My_passing_logs2", "My_passing_logs3" folders can be comparison runs, while "My_failing_logs" would be your target run that you want to analyze with respect to comparison runs.
Types of Analysis
In LogDelta, three types of analysis are available:
-
Visualize
- Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents.
- Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)
-
Measure the distance between two logs or sets of logs using:
- Jaccard distance
- Cosine distance
- Containment distance
- Compression distance
-
Build an anomaly detection model from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :
- KMeans (kmeans)
- IsolationForest (IF)
- RarityModel (RM)
- Out-of-Vocabulary Detector (OOVD)
Levels of Analysis
Analysis can be done at four different levels:
- Run (folder) level, investigating the names of files without looking at their contents.
- Run (folder) level, investigating run contents (this is slower than what is done in 1).
- File level, investigating file contents (matched with the same names between runs).
- Line level, investigating line contents (matched with the same names between runs).
Comparison to other tools.
logai. LogDelta shares many similarities with LogAI, a tool developed by Salesforce. However, the last time we checked, LogAI was not actively maintained. With some help from the issue tracker, we wer able to get it running. Yet, Impression was that it was a bit on the slow side compared to LogDelta. LogDelta runs on top of Polars, which offers excellent performance for processing log files with more than ten million rows on a laptop computer.
angel-grinder performs statistical analysis on log files, such as calculating the average response time in the logs. This is complementary to our tool as it allows analysis to be done within a single log file. Logdelta is not really useful for single log file analysis; rather, it requires 2 to n log files.
lnav - Logfile navigator is advertised as a tool for merging, tailing, searching, filtering, and querying log files. This is a great complement to LogDelta. In fact, during our Hadoop use case, we implemented a small script for log querying, but we would likely have been much better off using lnav.
Loglizer performs anomaly detection on logs. The last commit was 18 months ago, so it might no longer be actively maintained. However, it assumes parsed log data (e.g., with Drain), whereas LogDelta accepts raw text files. Loglizer does not appear to offer any visualizations. It seems to be more focused on anomaly detection benchmarking and, in this sense, is similar to our previous tool, LogLead, which was published a year ago. LogDelta is build on top of LogLead[^1]. https://pypi.org/project/LogLead/
[^1]: Mäntylä MV, Wang Y, Nyyssölä J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logdelta-1.0.1.tar.gz.
File metadata
- Download URL: logdelta-1.0.1.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81844e39b01d674249de334611c7ff26ab7ffce1f23a08ea378678cbf401c627
|
|
| MD5 |
fb123b1e389717b59db46f7d7098c411
|
|
| BLAKE2b-256 |
c2396d2e3e434fa064600b4482b5fbc228defdc05e0174516abea5e27559054e
|
File details
Details for the file logdelta-1.0.1-py3-none-any.whl.
File metadata
- Download URL: logdelta-1.0.1-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5542c6dc0040e990b3eb6e8b2cd89bcaba1ccd36d01ae9e8f0612e69871fd32
|
|
| MD5 |
97e68e5b210307d0eb46a11ddcc8c057
|
|
| BLAKE2b-256 |
16531b8dfbe018af5d11325952d5f792273e8d41e6a288dcb2e1863b693dea84
|