LogDelta - Go Beyond Grepping with NLP-based Log File Analysis
Project description
LogDelta
LogDelta - Go Beyond Grepping with NLP-based Log File Analysis
LogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names. A target run represents a software run we are interested in analyzing. LogDelta uses comparison runs as a baseline. For example, the "My_passing_logs1", "My_passing_logs2", "My_passing_logs3" folders can be comparison runs, while "My_failing_logs" would be your target run that you want to analyze with respect to comparison runs.
Installation and Example
Performs installation, data acquisition, and demo execution.
pip install LogDeltagit clone https://github.com/EvoTestOps/LogDelta.gitcd LogDelta/demowget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1unzip Hadoop.zip -d Hadooppython -m logdelta.config_runner -c config.yml
Observer results in LogDelta/demo/Output
For more examples see LogDelta/demo/label_investigation and LogDelta/demo/full
Types of Analysis
In LogDelta, three types of analysis are available:
-
Visualize
- Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents.
- Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)
-
Measure the distance between two logs or sets of logs using:
- Jaccard distance
- Cosine distance
- Containment distance
- Compression distance
-
Build an anomaly detection model from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :
- KMeans (kmeans)
- IsolationForest (IF)
- RarityModel (RM)
- Out-of-Vocabulary Detector (OOVD)
Levels of Analysis
Analysis can be done at four different levels:
- Run (folder) level, investigating the names of files without looking at their contents.
- Run (folder) level, investigating run contents (this is slower than what is done in 1).
- File level, investigating file contents (matched with the same names between runs).
- Line level, investigating line contents (matched with the same names between runs).
LogDelta is build on top of LogLead[^1]. https://pypi.org/project/LogLead/ [^1]: Mäntylä MV, Wang Y, Nyyssölä J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logdelta-1.0.0.tar.gz.
File metadata
- Download URL: logdelta-1.0.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68e297a0d709feb166c7a9b81905f923d322e7a153e9009f20e10d441e4c7632
|
|
| MD5 |
fb43d5fcb99be345a14bbd4d139b3277
|
|
| BLAKE2b-256 |
7ccf84d3cba503a1364b2d786369a4ce0a2057646509d9b2a5bdcc35a2b30486
|
File details
Details for the file logdelta-1.0.0-py3-none-any.whl.
File metadata
- Download URL: logdelta-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c940366a6172cf2a3320fb805b02f702a937934537faf32291eff882ddc8c322
|
|
| MD5 |
a995aa1fcf9199d1bc54b2c926f9e850
|
|
| BLAKE2b-256 |
99c5c13ccd46a76159837ef1e63eeb2de07a5b63a1848473df92b30839cc11d3
|